Senior Software Engineer - SRE
About this role
Job Description
Job Title: Senior Site Reliability Engineer
Location: Kuala Lumpur
About AirAsia MOVE
AirAsia MOVE is a leading ASEAN-focused budget travel OTA, part of the Capital A Group. We deliver customer-centric travel solutions by combining innovation with operational excellence. Our goal is to create seamless, reliable, and delightful journeys for travelers across the region.
About the Role
We’re looking for a Senior Site Reliability Engineer to help scale and stabilize our cloud infrastructure and reliability practices as we grow across multiple lines of business.
You’ll lead key initiatives around:
• Cloud architecture modernization
• Multi-region reliability
• Observability and incident response
• Reducing toil through automation and self-service
This is a hands-on technical role, where you’ll work across platform, SRE, and application teams to build scalable systems that are resilient, cost-aware, and developer-friendly.
What You’ll Do
• Design and implement secure, scalable infrastructure on Google Cloud Platform (GCP)
• Lead efforts to build and evolve MOVE’s GCP Landing Zone, including Shared VPC, org structure, IAM, and policy guardrails
• Build and improve multi-region architectures for high availability and disaster recovery
• Drive infrastructure automation using Terraform, CI/CD, and GitOps practices
• Improve observability across teams by standardizing monitoring, tracing, and alerting
• Collaborate on incident response and postmortems to reduce MTTR and build resilience
• Enforce tagging, FinOps controls, and security policies across GCP projects
• Contribute to platform engineering initiatives and developer self-service tools
What We’re Looking For
• 5+ years in SRE, DevOps, or cloud infrastructure roles
• Solid experience with GCP, Terraform, Kubernetes (GKE), or similar cloud providers
• Strong hands-on experience in automation and multi-region architecture design
• Experience in networking (VPCs, NAT, PSC), IAM, and cloud-native security
• Proven ability to debug and support production systems under pressure
• Familiarity with monitoring and tracing tools like Cloud Monitoring, OpenTelemetry, Signoz
• Exposure to using AI/anomaly detection for alert tuning or reliability insights
• Clear communicator who works well with developers, product, and other infra teams