Software Engineer, Compute Infrastructure
About this role
Accountabilities:
• Design, build, and optimize large-scale compute infrastructure systems supporting high-performance AI workloads across distributed environments.
• Develop and operate infrastructure spanning compute, networking, storage, orchestration, and cluster scheduling systems.
• Improve performance and reliability through profiling, benchmarking, and optimization of workloads across compute, memory, and network layers.
• Build automation and tooling for provisioning, monitoring, incident response, and lifecycle management of compute resources.
• Contribute to the design of developer platforms, observability tools, CaaS systems, and agent infrastructure to improve usability and efficiency.
• Collaborate with research, hardware, networking, and operations teams to ensure efficient and scalable compute capacity.
• Identify system bottlenecks and translate operational insights into durable infrastructure improvements and abstractions.
• Support the evolution of platform architecture to better support heterogeneous and large-scale compute environments.
Requirements:
• Strong software engineering background with experience in production-grade infrastructure systems.
• Experience in one or more areas such as distributed systems, high-performance computing, networking, storage systems, Kubernetes, observability, or infrastructure tooling.
• Solid understanding of system-level performance optimization, debugging, and large-scale system behavior.
• Familiarity with GPU infrastructure, RDMA, NCCL, or other high-performance communication frameworks is a plus.
• Ability to work across hardware, software, and networking layers to diagnose and resolve complex issues.
• Strong ownership mindset with the ability to operate effectively in ambiguous and fast-changing environments.
• Excellent collaboration and communication skills across multidisciplinary engineering teams.
• Motivation to build scalable infrastructure that enables advanced AI research and production systems.
Benefits:
• Competitive compensation aligned with experience and market standards.
• Comprehensive health, dental, and vision insurance coverage.
• Flexible work arrangements supporting collaboration across distributed teams.
• Opportunity to work on cutting-edge AI infrastructure at massive scale.
• High-impact role with direct contribution to frontier AI research and systems.
• Professional growth in a highly technical and research-driven engineering environment.
• Inclusive and mission-driven workplace culture focused on safety, collaboration, and innovation.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1