Senior Software Engineer - AI Inference
About this role
Accountabilities:
• Contribute features, optimizations, and fixes to open-source inference frameworks such as vLLM and SGLang
• Design and improve inference runtime components including scheduling, batching, request handling, and KV-cache optimization
• Profile and optimize performance-critical paths across Python, C++, and CUDA layers
• Enhance multi-GPU inference performance through improved parallelism, communication strategies, and resource utilization
• Develop benchmarking systems and regression tests to ensure performance stability and correctness across deployments
• Investigate and resolve bottlenecks using profiling tools, GPU analysis, and data-driven performance evaluation
• Collaborate with cross-functional teams to translate production needs into scalable, upstream-ready solutions
• Participate in code reviews, architectural discussions, and open-source community contributions
Requirements:
• 5+ years of experience in production software engineering with strong systems-level expertise
• Hands-on experience with LLM inference or serving frameworks such as vLLM, SGLang, or similar systems
• Strong programming skills in Python and C++ and/or CUDA with ability to debug and optimize performance-critical code
• Experience with performance profiling tools, benchmarking, and latency/throughput optimization techniques
• Solid understanding of distributed systems, concurrency, and multi-GPU or multi-node architectures
• Strong communication skills and experience working in or contributing to open-source projects
• Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or equivalent experience
• Strong advantage: contributions to open-source AI, ML, or systems projects such as PyTorch, Triton, NCCL, or similar ecosystems
• Strong advantage: experience with GPU memory optimization, kernel fusion, or advanced inference techniques such as quantization or speculative decoding
• Strong analytical mindset with a focus on measurement-driven engineering
Benefits:
• Competitive base salary ranging from $152,000 to $287,500 depending on level and experience
• Equity participation in addition to base compensation
• Comprehensive health, dental, and vision insurance coverage
• Flexible work arrangements supporting work-life balance
• Paid time off, holidays, and parental leave benefits
• Professional development opportunities in advanced AI and systems engineering
• Exposure to cutting-edge AI infrastructure and large-scale GPU computing systems
• Inclusive and innovation-driven engineering culture.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1