Own your future:
Our culture isn't something people join, it's something they build and shape. We believe that every person deserves to be heard and empowered. If you're on the fence about whether you're a fit, we say go for it. Let’s build something great together.
We are looking for a Site Reliability Engineer responsible for ensuring reliability, scalability, and operational excellence of cloud-native systems.
This role combines deep software engineering skills with strong infrastructure expertise. You will design and operate highly available Kubernetes-based environments on AWS, implement automation, improve CI/CD pipelines, and ensure systems meet defined reliability targets.
The ideal candidate is comfortable writing production-grade code (Go/Python) while also owning infrastructure, deployment, monitoring, and incident response processes.
Must Haves:
- Hands-on experience with Kubernetes, including managed clusters on AWS EKS
- Solid experience working with Amazon Web Services (AWS) (core services, networking, security, compute, storage)
- Strong programming skills in Go and Python
- Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation, or similar) and best practices for managing cloud infrastructure
Nice to have
- Experience with AWS CloudFormation
- Experience with GitHub Actions
- Experience in Platform Engineering
Key Responsibilities:
- Write automation and platform tooling in Go and Python and work on the software engineering platform, ensuring high reliability, scalability, and performance
- Make technical decisions and trade-offs to reduce operational overhead, improve system stability, and enhance platform efficiency
- Design and implement observability solutions (monitoring, logging, tracing) to increase visibility into system behavior and improve user experience
- Collaborate with engineering teams to improve developer experience and platform usability