Staff Software Engineer, Agentic AI — Nexus

Arlo

📍 2 Locations 📅 Posted May 22, 2026

About this role

About Arlo:

At Arlo, we're passionate about creating innovative and reliable solutions that help people protect what matters most to them. Our team is dedicated to delivering products that exceed our customers' expectations, while always pushing the boundaries of what's possible in the world of protection technology. We believe that everyone deserves to feel safe and secure, whether they're at home or away, and we're committed to providing our customers with the peace of mind they need to live their lives without worry. Arlo’s deep expertise in AI- and CV-powered analytics, cloud services, user experience, product design, and innovative wireless and RF connectivity enables the delivery of a seamless, smart security experience for Arlo users that is easy to set up and interact with every day.

Arlo is building Nexus, our next-generation agentic chat experience embedded in the Arlo app. Nexus helps customers interact with their devices, troubleshoot issues, and get more out of their Arlo ecosystem through natural conversation — backed by a growing system of LLM-powered agents, tools, and integrations.
Nexus isn't a generic chatbot — it's an agent operating over real cameras, doorbells, and sensors, with real telemetry and real customer outcomes. Engineers who've built systems that reason about physical devices will find a lot of interesting problems here.
We're hiring a Staff Software Engineer to join the team building Nexus and take ownership of expanding what our agents can do. You'll design and ship new agent capabilities, integrate Nexus with internal services and third-party systems, and raise the bar on how we test, evaluate, and observe agent behavior in production. This is a deeply technical Staff-level role with real autonomy — you'll work across the agent stack from prompt and tool design through orchestration, evals, and production hardening, and you'll set technical direction that other engineers follow.What you'll do

• Design and ship new agent capabilities for Nexus — new tools, skills, integrations, and conversational flows that meaningfully expand what users can accomplish through chat.
• Build and own production-grade Python services (FastAPI, async patterns) that power Nexus's agent runtime, tool execution, and orchestration logic.
• Extend our orchestration layer (LangGraph / LangChain or equivalent) with new agent topologies, routing logic, and tool-use patterns.
• Design tool-use and function-calling interfaces — including MCP servers — that let Nexus safely interact with Arlo platform APIs, device telemetry, and partner systems.
• Build the evals and observability that make agent behavior measurable: offline test suites, online quality metrics, trace tooling, regression detection, and dashboards engineers and PMs actually use.
• Own the testing strategy for AI experiences — design and build the test harnesses, golden datasets, scenario suites, adversarial/red-team tests, and CI gates that catch agent regressions before they reach users. Define what "good" looks like for conversational quality, tool-use correctness, and task completion.
• Partner closely with product, design, and platform teams to turn user needs into shipped agent features — and bring engineering judgment to scoping, sequencing, and tradeoffs.
• Set technical direction for agent development practices at Arlo: patterns, frameworks, code review standards, and the playbook other engineers follow when they build on Nexus.
• Mentor mid and senior engineers on LLM systems, prompt design, and production AI engineering.

What we're looking for

Must-haves

• 8+ years of software engineering experience, with at least 1-2 years building production LLM-powered systems — ideally agentic chat, copilots, or multi-step agent workflows.
• Strong production Python — FastAPI, asyncio, type hints, testing discipline. You've built and operated Python services at meaningful scale.
• Hands-on experience with LLM orchestration frameworks like LangGraph, LangChain, LlamaIndex, or equivalent — and an opinion on when to use them vs. build your own.
• Deep familiarity with tool-use / function-calling patterns. Bonus if you've built or integrated MCP (Model Context Protocol) servers, but strong tool-use experience in any framework translates.
• Experience designing multi-agent or multi-step workflows: planner/executor patterns, agent handoff, state management, error recovery, human-in-the-loop.
• A real point of view on evals and observability for LLM systems — you've built (or fought to build) the feedback loops that keep agents from regressing in production.
• Hands-on experience testing AI/LLM experiences in production — building eval datasets, scoring rubrics (LLM-as-judge, human-in-the-loop, deterministic checks), regression suites, and the discipline to know which one applies when. You understand why traditional unit tests aren't enough for non-deterministic systems and have built the testing patterns that fill the gap.
• Track record of shipping at the Staff level — you've operated as a technical leader across teams, not just an individual contributor with a senior title. The bar is delivery and influence, not slide decks.

Nice-to-haves

• Experience with RAG, vector databases, embedding pipelines, and retrieval quality tuning.
• Familiarity with Anthropic's Claude API, OpenAI's Responses API, or comparable provider SDKs at the level of tool use, structured outputs, and streaming.
• Experience instrumenting LLM systems with tools like LangSmith, Langfuse, Arize, Braintrust, or homegrown tracing.
• Experience with AI testing tooling (Braintrust, Langfuse, Patronus, DeepEval, Promptfoo, or equivalent), or having built homegrown versions of these.
• Familiarity with red-teaming, prompt injection testing, or adversarial evaluation of agent systems.
• Experience building backend systems for IoT or connected devices — reasoning about device state, telemetry streams, intermittent connectivity, command/response patterns, and the kind of real-world messiness that doesn't show up in pure SaaS backends. Bonus if you've designed APIs or agents that operate over a fleet of devices.
• Experience working with mobile clients (iOS / Android) as API consumers of an agent backend.
• Prior work on prompt engineering at scale, including prompt versioning, A/B testing, and prompt regression frameworks.

The pay range for this position reflects the minimum and maximum target for new hire salaries at commencement of employment and is expected to be between USD$175,000 - $225,000/year. However, base pay offered may vary depending on multiple factors, including role, job-related knowledge, skills, relevant education and experience.

We’re committed to inclusivity and selecting the strongest candidate—no matter their background. Even if you don’t meet every listed qualification, we encourage you to apply. We’re happy to support growth in areas essential to the role. Interested in learning more about our workplace? Visit and follow our LinkedIn, and Glassdoor pages to read employee insights and get updates of what it’s like to be part of Arlo.

Arlo is proud to be an Equal Opportunity Employer. We value inclusion and are committed to inclusive, and harassment-free workplace. We prohibit discrimination and harassment based on all legally protected statuses in all hiring and employment.

We provide reasonable accommodations to applicants and employees with disabilities, who are pregnant or have a related medical condition, or who have sincerely held religious beliefs, observances, and practices. Pursuant to applicable state and municipal Fair Chance Laws and Ordinances, the Company will consider for employment qualified applicants with arrest and conviction records.

This listing was aggregated by Perik.ai from Arlo’s public job board. Click the button above to view the full job description and apply directly.

Explore more jobs

More from Arlo Browse all AI & tech jobs