Senior LLM Agents Architect
About this role
We don't just build the hardware and software that powers the AI revolution — we are building the AI that designs the next generation of both. Our team sits at the intersection of inference software and GPU architecture, creating autonomous LLM-driven systems that reason about hardware, write high-performance CUDA, and automate the complex loops of architectural simulation, analysis, and optimization.
We are looking for a senior LLM Agents Architect to work hands-on with hardware architects, verification engineers, GPU performance experts, and software developers to build end-to-end agent flows that drive significant improvements in kernel optimization, architectural exploration, and developer efficiency.
What you'll be doing:
• Design and build agentic AI systems that generate, analyze, and optimize GPU compute kernels — targeting speed-of-light performance on NVIDIA hardware.
• Collaborate with GPU architects and performance engineers to encode domain expertise — memory hierarchy trade-offs, occupancy tuning, instruction-level reasoning — into agent workflows that rival hand-tuned optimization.
• Build automated performance forensics agents capable of ingesting large-scale simulation traces and Nsight profiler data to identify bottlenecks and propose architectural or software mitigations.
• Partner with HW architects to develop agentic flows for GPU architectural studies — enabling rapid what-if analysis across micro-architecture configurations such as cache sizing, memory controller design, and compute unit scaling.
• Explore agentic approaches to HW/SW co-design challenges, including replacing or augmenting graph-compiler functionality (e.g., TorchInductor) with LLM-driven optimization and code-generation pipelines.
• Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.
• Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.
• Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.
What we need to see:
• 7+ years in applied ML/AI or large-scale systems, with 2+ years crafting agentic or LLM-powered applications in production environments.
• B.Sc in Computer Science / Electrical Engineering.
• Solid grounding in computer architecture: memory hierarchies, parallelism models, pipelining, and cache behavior. Specific familiarity with NVIDIA GPU architecture — streaming multiprocessors, warp scheduling, shared/global memory model, and occupancy reasoning — is essential.
• Hands-on CUDA programming experience: writing, profiling, and optimizing GPU kernels — not just calling into CUDA-accelerated libraries. Comfortable with tools such as Nsight Compute, Nsight Systems, or equivalent profiling workflows.
• Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production — not just experience with off-the-shelf frameworks.
• Strong software engineering skills in Python and one systems language (C++ preferred).
• Proficient in tool use, RAG pipelines, and model adaptation techniques for building agentic systems.
• Demonstrated ability to collaborate with HW/SW domain experts and translate their heuristics into deterministic tools, constraints, and evaluation metrics.
• Excellence in communication and facilitation: aligning diverse collaborators, documenting decisions/assumptions, and influencing without authority.
• Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.
Ways to stand out from the crowd:
• Familiarity with the PyTorch compilation and lowering stack (torch.compile, TorchDynamo, TorchInductor, Triton, down to PTX), and with GPU graph compilers, kernel fusion strategies, or auto-tuning frameworks.
• Background in performance engineering for HPC or GPU-accelerated workloads, including experience with performance modeling or hardware simulators.
• Familiarity with distributed processing, multi-GPU workloads, and networking (e.g., NVLink, InfiniBand).
• Familiarity with frontier agentic coding tools (e.g., Claude Code, Codex, Cursor) — understanding their underlying architecture: tool orchestration, context management, and autonomous task execution patterns.
• Hands-on experience building a domain-specific coding agent — whether on top of frontier agentic harnesses (e.g., Claude Code, Codex SDK) or lower-level agent frameworks (e.g., LangChain/LangGraph deep agents, CrewAI). Comfortable with the design choices that make a coding agent useful in practice: task scoping, tool and context curation, evaluation, and failure recovery.
Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com.
#LI-Hybrid