Staff Software Engineer, ML Serving Platform

The ML Platform team provides foundational tools and infrastructure used by hundreds of ML engineers across Pinterest, including recommendations, ads, visual search, growth/notifications, trust and safety. We aim to ensure that ML systems are healthy (production-grade quality) and fast (for modelers to iterate upon).

We are seeking a highly skilled and experienced Staff Software Engineer to join our ML Serving team and lead the technical strategy. The ML Serving team builds large scale online systems and tools for model inference, deployment, monitoring and feature fetching/logging. ML workloads are increasingly large, complex, interdependent and the efficient use of ML accelerators is critical to our success. We work on various efforts related to adoption, efficiency, performance, algorithms, UX and core infrastructure to enable the scheduling of ML workloads.

You’ll be part of the ML Platform team in Data Engineering, which aims to ensure healthy and fast ML in all of the 40+ ML use cases across Pinterest.

What you’ll do:

Design and build large-scale, reliable and efficient ML serving systems for model inference, deployment monitoring and feature logging.
Improve the productivity and iteration speed of ML engineers and data scientists.
Projects may include: high-performance inference engine with GPUs and hardware accelerators; ML monitoring and observability solutions.
Work extensively with ML engineers across Pinterest to understand their requirements, pain points, and build generalized solutions. Also work with partner teams to drive projects requiring cross-team coordination.
Provide technical guidance and coaching to more junior engineers in the team.

What we’re looking for:

Hands-on experience building large-scale ML use cases and systems in production, preferably expertise in SoTA ML inference technologies and optimizations.
Strong understanding of ML systems especially around scalability and efficiency.
Flexibility to work across different areas: online systems, model optimization, infrastructure optimization, data processing pipelines, etc.
Fluency in Python and C++, familiarity with at least one common ML framework.
Experience with GPU programming, containerization, orchestration technologies is a plus.

Relocation Statement:

This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.

#LI-REMOTE

#LI-AH2