Role Overview
You will help build the foundation that keeps our platform reliable, secure, and fast to ship. This role focuses on infrastructure, observability, automation, and operational resilience in production environments.
Responsibilities
- Design, build, and maintain secure cloud infrastructure across environments
- Automate deployment workflows, release validation, and operational tasks
- Implement monitoring, alerting, and incident response tooling
- Harden Kubernetes clusters and improve identity and access boundaries
- Improve system reliability through SLOs, operational reviews, and post-incident follow-up
Must-Have Qualifications
- Strong hands-on experience with Kubernetes and container orchestration
- Experience using infrastructure as code in production environments
- Solid Linux, networking, and systems fundamentals
- Ability to troubleshoot and debug complex distributed systems
Nice-to-Have
- Experience with service mesh technologies and zero-trust networking
- Experience with Azure landing zones or enterprise cloud governance
- Background working in regulated or compliance-heavy environments