The Machine Learning Orchestration team owns and develops Cruise’s workflow management platform. The platform provides a semantic orchestration framework for machine learning workflows and data processing at Cruise. We provide the necessary orchestration, big data, and compute layer to greatly accelerate the development cycle of AV engineers by empowering engineers to focus on improving the car’s safety and performance.
We are seeking an experienced Senior Software Engineer to lead key initiatives within our ML Orchestration team, focused on helping scale our platform, create automation and self-service tools for our users, and help us run ML pipelines efficiently at scale. The successful candidate will have experience building and running scalable distributed systems, an understanding of open-source orchestration platforms such as AirFlow / KubeFlow / MetaFlow / Flyte, will bring innovative ideas and approaches, and should have intellectual curiosity, and strong problem-solving skills.
Note: This role is for an AI/ML infrastructure engineering team, not an applied machine learning team. This team does not build ML models for specific applications. The team is focused on the infrastructure products that help our customers do machine learning and data science at scale. Experience with ML or data platforms is helpful to understand our customer use cases, but your team will not do any applied ML.
What you’ll be doing:
Use the latest cloud (GCP/Azure) technologies to own, design, implement, and test scalable distributed compute and data processing in the cloud. Champion engineering excellence by continuously improving systems and processes.
Own technical projects from start to finish, and be responsible for major technical design decisions and tradeoffs.
Effectively participate in team’s planning, code reviews, and design discussions
Consider the effects of projects across multiple teams and proactively manage conflicts. Work together with partner teams to achieve cross-departmental goals and satisfy broad requirements
Conduct technical interviews with well-calibrated standards and play an essential role in recruiting activities. Effectively onboard and mentor junior engineers and/or interns
What you must have:
7+ years experience, with work on large-scale distributed systems preferred
3+ years of experience leading and driving complex projects.
Experience building scalable infrastructure on the cloud with Python, C++, or Golang (or similar)
Experience working with relational and NoSQL databases
Experience developing and maintaining systems at scale
BS, MS, or Ph.D. in Computer Science, Electrical Engineering, Mathematics, Physics, or another relevant field; or equivalent real-world experience
Passionate about self-driving technology and its potential impact on the world
Attention to detail and a passion for truth
A track record of efficiently solving complex problems
Startup mentality - openness to dealing with unknown unknowns and wearing many hats
Bonus Points!
Experience with Google Cloud Platform, Microsoft Azure, or Amazon Web Services
Experience with open-source orchestration platforms such as Kubeflow, Flyte, Airflow, Metaflow, Prefect, Cadence, etc.
Experience with Kubernetes
Understanding of Machine Learning (ML) models/pipelines
Python/C++/Golang proficiency
Relevant publications
The salary range for this position is $152,000 - $223,500. Compensation will vary depending on location, job-related knowledge, skills, and experience. You may also be offered a bonus, long-term incentives, and benefits. These ranges are subject to change.