Staff Machine Learning Engineer, AI Serving
About this role
Accountabilities:
• Lead the design, development, and maintenance of a large-scale ML inference platform supporting low-latency, high-throughput model serving for search, ranking, and generative AI workloads.
• Architect and implement GPU-based serving systems capable of handling millions of queries per second with strong reliability and performance guarantees.
• Build and optimize end-to-end inference pipelines, including routing, caching, batching, and feature processing systems.
• Develop and maintain model export frameworks to convert trained models into optimized formats for efficient GPU inference.
• Design and improve observability systems for real-time monitoring of model performance, system health, and feature behavior.
• Lead efforts in benchmarking, performance tuning, and scalability improvements across multi-cluster cloud environments.
• Collaborate with cross-functional ML, infrastructure, and product teams to support production deployment of large-scale ML and LLM systems.
Requirements
• 7+ years of experience in Machine Learning Engineering, AI Platform Engineering, or large-scale distributed systems development.
• Strong experience operating and scaling Kubernetes-based infrastructure in production environments.
• Deep knowledge of ML serving systems, inference pipelines, and production-grade AI deployment.
• Strong programming skills in Python and/or Go, with experience in building scalable backend or ML systems.
• Hands-on experience with modern ML/AI frameworks and tooling such as PyTorch, Triton, vLLM, or similar technologies.
• Experience with cloud platforms (AWS, GCP) and infrastructure tooling such as Terraform or equivalent.
• Strong understanding of observability, monitoring, and performance tuning for real-time systems.
• Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders.
• Strong ownership mindset with a focus on scalability, reliability, and developer experience.
Benefits
• Competitive compensation package with base salary, equity (RSUs), and potential performance-based incentives.
• Comprehensive healthcare coverage including medical, dental, and vision insurance.
• Retirement plan with employer matching contributions.
• Flexible remote-first work environment.
• Generous paid time off, including vacation, holidays, and volunteer days.
• Paid parental leave and family support programs.
• Mental health support, coaching, and wellness resources.
• Learning and development support for professional growth.
• Additional benefits covering workspace support, caregiving, and family planning.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1