Senior Principal AI Infrastructure Architect

Nttlimited

📍 6 Locations 📅 Posted May 21, 2026

About this role

Continue to make an impact with a company that is pushing the boundaries of what is possible. At NTT DATA, we are renowned for our technical excellence, leading innovations, and making a difference for our clients and society. Our workplace embraces diversity and inclusion – it’s a place where you can continue to grow, belong, and thrive.

Your career here is about believing in yourself and seizing new opportunities and challenges. It’s about expanding your skills and expertise in your current role and preparing yourself for future advancements. That’s why we encourage you to take every opportunity to further your career within our great global team.

Your day at NTT DATA

The Senior Principal AI Infrastructure Architect is a highly skilled and advanced subject matter expert, responsible for leading the design of complex AI platform and managed-service solutions and driving the strategic vision and direction for the company's largest enterprise clients. The role sits at the centre of NTT DATA's AI Factories practice and is focused on the hardware foundations — GPU and accelerator compute, host CPU platforms, high-performance storage and AI fabric — that underpin enterprise-scale training, fine-tuning and inference workloads.

Key Responsibilities:

• Lead the end-to-end design of large, complex AI infrastructure solutions — covering accelerated compute (NVIDIA H100/H200/B200 and GB200 NVL72, AMD Instinct MI300X/MI325X, Intel Gaudi 3), CPU host platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), high-throughput storage tiers and lossless AI fabric — for enterprise, sovereign AI and AI Factory clients.

• Architect reference designs built on NVIDIA DGX/HGX SuperPOD, Dell AI Factory with NVIDIA, Cisco Nexus HyperFabric AI, HPE / Lenovo / Supermicro accelerated compute and equivalent platforms, balancing single-node performance with cluster-scale efficiency.

• Size and validate GPU clusters against real workloads — foundation-model pre-training, distributed fine-tuning, RAG, real-time and batch inference — using the right combination of NVLink/NVSwitch domains, InfiniBand NDR/XDR or Ultra Ethernet / NVIDIA Spectrum-X fabrics and tiered NVMe and parallel storage (VAST, WEKA, DDN, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale).

• Define the supporting datacenter design: high-density power (50–140 kW/rack), direct-to-chip and rear-door liquid cooling, structured cabling for AI fabrics and modular deployment models across on-prem, colo and sovereign-cloud footprints.

• Work closely with the sales team to drive the presales process for AI infrastructure pursuits — client discovery, technical workshops, proposal writing, executive presentations and bid defence.

• Translate clients' AI ambitions and business outcomes into a hardware and platform roadmap, positioning NTT DATA's end-to-end portfolio — silicon, systems, storage, fabric, MLOps stack and managed services — to land service-led AI solutions.

• Lead integration of compute, storage, networking, the AI software stack (CUDA, ROCm, Triton, NIM, NVIDIA AI Enterprise, Run:ai, Slurm, Kubernetes / Kubeflow) and managed-service operating models across multiple domains, delivery units and geographies.

• Build business cases, TCO and unit-economics models (cost per token, cost per training run, GPU-hour economics) and end-to-end transition roadmaps for cloud-to-private AI migrations and sovereign AI deployments.

• Define architectural principles for AI infrastructure — accelerator utilisation, data gravity, multi-tenancy, model lifecycle, energy efficiency — and apply them to influence architectural outcomes and governance.

• Develop As-Is, Vision, FMO and To-Be AI platform architectures, identify gaps and develop transition roadmaps.

• Synthesise current and future trends in AI silicon, memory hierarchies (HBM3e, CXL), interconnects and AI software stacks with client strategic imperatives to create compelling, evidence-based solutions.

• Contribute to NTT DATA's AI Factories knowledge base by sharing reference architectures, sizing tools and lessons learned with internal teams and clients.

Knowledge and Attributes

• Deep, hands-on knowledge of AI hardware: GPU and accelerator portfolios (NVIDIA Hopper / Blackwell, AMD MI300/MI325, Intel Gaudi 3, emerging custom silicon), host CPU platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), system topologies (HGX, DGX, MGX, OAM) and how each choice maps to specific AI workloads.

• Strong understanding of AI-class storage: parallel filesystems, all-flash NVMe platforms, S3-class object stores, checkpoint and dataset pipelines and the I/O patterns of large-scale training and inference (VAST, WEKA, DDN EXAScaler, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale).

• Solid command of AI networking — InfiniBand NDR/XDR, RoCEv2, NVIDIA Spectrum-X, Ultra Ethernet, NVLink/NVSwitch fabrics, congestion control and fabric design for rail-optimised and fat-tree topologies.

• Working knowledge of the AI software and orchestration stack: CUDA, cuDNN, NCCL, ROCm, Triton Inference Server, NIM, vLLM, TensorRT-LLM, Slurm, Kubernetes (with GPU Operator), Kubeflow, Run:ai, MLflow and NVIDIA AI Enterprise.

• Familiarity with datacenter facilities engineering for AI workloads: high-density power, liquid cooling (DLC, rear-door, immersion), PUE/WUE optimisation and the practical constraints of retrofitting existing colo space for accelerated compute.

• Excellent written and oral communication skills, with the ability to translate complex technical concepts for technical and non-technical executive audiences.

• Strong systems-thinking and strategic-thinking skills — able to capture the key elements of a system into a simple abstraction that empowers good decisions.

• Strong business financial skills, with the demonstrable ability to perform a cost-benefit analysis, build CAPEX vs OPEX comparisons and manage budgets.

• Knowledge of cloud, hybrid and sovereign AI deployment patterns, plus architectural governance for Agile, DevSecOps and MLOps.

• Significant knowledge of core Managed Service portfolio artefacts, techniques, demos, tools and deliverables, applied to AI platform operations.

Academic Qualifications and Certifications:

• Bachelor's degree or equivalent in Information Technology, Engineering, Computer Science or a related field. Master's or PhD advantageous.

• Vendor and technology certifications in AI infrastructure highly desirable — for example NVIDIA-Certified Associate / Professional (AI Infrastructure, AI Operations), Dell Technologies AI Factory, Cisco / Nutanix / HPE accelerated compute, Red Hat OpenShift AI, Run:ai — plus relevant storage and networking certifications.

• Scaled Agile certification advantageous.

Required experience:

• Significant experience in a consulting, presales or architecture role within a large-scale (preferably multi-national) technology services environment, with a track record of leading AI infrastructure pursuits.

• Demonstrable experience designing and delivering production AI platforms — from single multi-GPU servers through to multi-rack training clusters and inference factories.

• Strong working knowledge of the AI hardware vendor landscape (NVIDIA, AMD, Intel, Dell, HPE, Lenovo, Supermicro, Cisco, Pure, VAST, WEKA, DDN, NetApp) and how to position partner ecosystems competitively.

• Proven ability to translate AI workload requirements (model size, parameter count, sequence length, throughput SLOs, latency targets) into accurate hardware bills of materials and sizing justifications.

• Significant client engagement and consulting experience, including client needs assessment, change management and the ability to identify whitespace for follow-on AI infrastructure and managed-services work.

• Significant business development and presales experience on infrastructure-led deals, ideally including sovereign AI, AI Factory or regulated-industry GenAI programmes.

• Strong understanding of how AI infrastructure integrates with business processes, applications, data platforms and existing enterprise architecture.

Workplace type:

Remote Working

Equal Opportunity Employer
NTT DATA is proud to be an Equal Opportunity Employer with a global culture that embraces diversity. We are committed to providing an environment free of unfair discrimination and harassment. We do not discriminate based on age, race, colour, gender, sexual orientation, religion, nationality, disability, pregnancy, marital status, veteran status, or any other protected category. Accelerate your career with us. Apply today

This listing was aggregated by Perik.ai from Nttlimited’s public job board. Click the button above to view the full job description and apply directly.

Explore more jobs

More from Nttlimited Browse all AI & tech jobs