About the role:
The Samsara ML Experience team builds end-to-end ML applications to power different product pillars at Samsara. As a Senior Machine Learning Engineer II, you will be responsible for developing ML solutions to increase the safety, efficiency and sustainability of the physical operations. You will work closely with various engineering teams across ML, full-stack, firmware as well as cross functional partners to deliver core infrastructure, services, and optimizations.
This role is open to candidates residing in the Canada and US except the San Francisco Bay Area (125 mi. radius from 1 De Haro St, San Francisco) and NYC Metro Area (50 mi. radius from 131 W 55th St, New York).
You should apply if:
- You want to impact the industries that run our world: The software, firmware, and hardware you build will result in real-world impact—helping to keep the lights on, get food into grocery stores, and most importantly, ensure workers return home safely.
- You want to build for scale: With over 2.3 million IoT devices deployed to our global customers, you will work on a range of new and mature technologies driving scalable innovation for customers across industries driving the world's physical operations.
- You are a life-long learner: We have ambitious goals. Every Samsarian has a growth mindset as we work with a wide range of technologies, challenges, and customers that push us to learn on the go.
- You believe customers are more than a number: Samsara engineers enjoy a rare closeness to the end user and you will have the opportunity to participate in customer interviews, collaborate with customer success and product managers, and use metrics to ensure our work is translating into better customer outcomes.
- You are a team player: Working on our Samsara Engineering teams requires a mix of independent effort and collaboration. Motivated by our mission, we’re all racing toward our connected operations vision, and we intend to win—together.
Click here to learn about what we value at Samsara.
In this role, you will:
- Design and implement scalable machine learning infrastructure using Ray to support model training, deployment, and inference at scale
- Leverage Kubernetes for orchestration of containerized applications, ensuring seamless deployment, scaling, and management of ML models and associated services
- Develop and maintain CI/CD pipelines for automated testing, deployment, and management of ML applications and infrastructure
- Implement robust monitoring, logging, and alerting systems to ensure high availability, performance, and security of the ML platform
- Collaborate with data scientists and ML engineers to optimize data pipelines and model performance
- Stay abreast of the latest advancements in machine learning technologies and infrastructure, and advocate for the adoption of best practices and new technologies within the team
- Provide DevOps/SRE support for the ML platform, including incident response, performance tuning, and disaster recovery planning
- Champion, role model, and embed Samsara’s cultural principles (Focus on Customer Success, Build for the Long Term, Adopt a Growth Mindset, Be Inclusive, Win as a Team) as we scale globally and across new offices
Minimum requirements for the role:
- BS or MS in Computer Science or other relevant field
- 6+ years experience as an Machine Learning Engineer or similar role
- Strong proficiency in one or more common languages (e.g., C++, Golang, Java, Python, Scala)
- Proficiency with common ML tools (e.g., Spark, TensorFlow, PyTorch)
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) and security best practices for ML platforms
An ideal candidate also has:
- Ph.D. in Computer Science or quantitative discipline (e.g., Applied Math, Physics, Statistics)
- Strong experience with Ray for distributed machine learning, Kubernetes for container orchestration, and Docker for containerization
- Solid understanding of DevOps and SRE principles, including experience with CI/CD tools (e.g., Jenkins, GitLab CI), infrastructure as code (e.g., Terraform, Ansible), and cloud services (AWS, GCP, Azure).