Principal ML Engineer, Machine Learning Platform and Systems Architecture
About this role
Accountabilities
In this role, you will be responsible for leading the design, development, and evolution of large-scale ML platform and systems architecture supporting end-to-end machine learning workflows.
• Lead architecture and delivery of core ML platform capabilities including training, deployment, evaluation, and observability systems
• Design scalable distributed systems for data processing, feature engineering, model lifecycle management, and production inference
• Own end-to-end technical outcomes for platform initiatives, from architecture design through deployment and operational support
• Develop and scale large data pipelines for structured and semi-structured datasets across distributed environments
• Define and implement frameworks for model deployment, monitoring, observability, and system reliability
• Establish data governance, lineage, and responsible data usage practices across ML infrastructure
• Drive architecture for distributed processing systems using tools such as Ray, Spark, Airflow, or equivalent technologies
• Lead incident response for critical platform issues and implement long-term system improvements
• Mentor engineers, provide technical leadership, and establish best practices for ML system design and operations
• Communicate technical strategies, tradeoffs, and architecture decisions to both technical and non-technical stakeholders
Requirements
The ideal candidate brings deep expertise in distributed systems, ML infrastructure, and large-scale platform engineering, along with strong technical leadership skills.
• 6–8+ years of experience in software engineering, ML infrastructure, platform engineering, or distributed systems
• Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience
• Strong expertise in designing and operating large-scale distributed systems and data platforms
• Advanced proficiency in Python and strong production software engineering practices
• Experience leading complex, cross-functional technical initiatives across multiple engineering teams
• Strong background in ML infrastructure including model deployment, inference systems, and observability frameworks
• Experience with large-scale data pipelines, cloud-native architectures, and distributed processing frameworks
• Ability to make architectural decisions balancing scalability, performance, reliability, and cost
• Strong communication and stakeholder management skills across technical and leadership audiences
• Preferred: experience with Kubernetes, ML orchestration tools, data lineage systems, and ML-ready data representations (graph, geometry, multimodal)
Benefits
• Competitive base salary ranging from $152,000 to $272,250 depending on experience and location
• Annual cash bonus eligibility, plus stock grants and additional incentive compensation (role dependent)
• Comprehensive health, dental, and vision insurance coverage
• Retirement and financial wellness programs
• Flexible remote work options across the United States and Canada
• Paid time off and wellness-focused benefits supporting work-life balance
• Strong learning and development support for continuous technical growth
• Inclusive, innovation-driven culture focused on collaboration and belonging
• Opportunity to build foundational ML systems powering advanced real-world applications
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1