The Observability team at Cruise is looking for a Site Reliability Engineer to play a critical role in building out and improving observability systems, tools and the related codebase.
Site Reliability Engineers at Cruise bring specialized knowledge and experience to ensure the reliability, scalability, performance, efficiency, and security of our systems.
What you'll be doing:
- Using your software and systems engineering skills to contribute code, perform code reviews, and create technical designs that improve performance and reliability of observability systems.
- Partnering with Software Engineering teams to better understand use-cases and guide the engineers to use the existing tools effectively.
- Building tools to enable engineers to collect and act on observability signals.
What you must have
- Previous experience as an SRE, Production Engineer, Systems Engineer, or Software Engineer with a focus on distributed systems reliability.
- Considerable experience in working with container orchestration systems (eg. Kubernetes).
- Proficient in designing and developing complex distributed systems, with expertise in one or more high-level programming languages such as Go, Python, Rust, C/C++, or NodeJS.
- Considerable Linux experience.
- Effective collaboration skills to work closely with the team members and various engineering teams.
Bonus Points!
- Experience with Cloud Platforms such as Amazon Web Services (AWS), Microsoft Azure or Google Cloud Platform (GCP).
- Experience with OpenTelemetry instrumentation.
- Familiarity with Kubernetes, Docker, Istio and Terraform.
- Skilled in defining and instrumenting SLIs and SLOs.
- Previous experience working with Prometheus, Grafana, TSDBs and observability pipelines (e.g. either for logging or metrics or tracing).
The salary range for this position is $142,800 - $210,000. Compensation will vary depending on location, job-related knowledge, skills, and experience. You may also be offered a bonus, long-term incentives, and benefits. These ranges are subject to change.