About this position
We are looking for a proactive, collaborative, clear minded Principal Engineer for our ML training platform team. In this role the successful candidate will lead a driven, high visibility team to drive and deliver on key engineering objectives. As the Principal Engineer you will be the accountable owner of the ML training platform. You will be responsible for developing new cross-functional processes that guide the development of the platform and speed up the ML training and evaluation. You will interact closely with infra engineering, ML team, ML eval/metrics systems. This role gives you high visibility within the company and is critical for successfully launching our autonomous driving software. The ideal candidate leverages a deep technical foundation to focus on Autonomy critical initiatives, understands business priorities, then works to find a way, resolving ambiguity and building consensus to deliver models that improve on-road AV performance.
What You'll Be Doing
- Leads and continuously develops a high performing team that demonstrates Motional’s core values.
- Inspire and motivate the team to work together as a cohesive and productive unit.
- Design, build and maintain scalable ML data processing, model training solutions in the cloud infra.
- Perform training and model performance optimization with various GPUs to improve model training speed and efficiency.
- Execute program with a focus on technical execution to achieve program milestone
- Actively and proactively manage and mitigate technical risks
What We're Looking For
- 8+ years of software development experience.
- Demonstrated problem solving skills and ability to think logically and remove bias from evaluation of problems
- Excellent programming and software design skills, including debugging, performance analysis, and test design
- Experience working on and leading large scale cross-functional projects
- Hands on experience with large cloud infrastructure. Strong cloud experience with AWS/GCP. Strong hands-on experience with Kubernetes.
- Comfortable working in a fast-paced, continuous delivery environment
- Strong understanding of machine learning approaches and algorithms
- Strong verbal and written communication skills to effectively work with various stakeholders, report insights, and rally support towards new initiatives and processes
- Proven track record of operating highly-available systems at scale
- Ability to proactively learn new concepts and technology and apply them at work
- Skilled at solving ambiguous problems
- Strong collaboration and mentorship skills
- Hands-on experience with popular ML frameworks (PyTorch or TensorFlow).
- Hands-on Experience with scaling ML systems.
Bonus Points
- Hands-on experience with using Ray in the large scale environment.
- Hands-on experience with ML data processing with large scale deep learning training.
- Hands-on experience with refactoring ML code written by ML engineers.