The Pinterest Site Reliability Engineering team is responsible for ensuring the availability of Pinterest, as well as improving the ability of engineering teams to design, build, and operate stable systems on a large scale.
The Infrastructure SRE team leverages its deep understanding of Pinterest's infrastructure and engineering systems to ensure that the foundations of our technology stack remain robust and reliable. As the manager of the Infrastructure SRE team, you will be accountable for building team health and impact, as well as providing crucial leadership to the overall SRE organization.
What you’ll do
- Manage a team of 4-10 SREs
- Foster an environment of open and honest communication, allowing team members to take risks and share their ideas.
- Create an inclusive and welcoming workplace where every team member feels valued and supported.
- Invest in each individual team member to ensure they remain engaged, motivated, and fulfilled.
- Develop an inspiring team charter and direction that align with the goals of the broader SRE organization.
- Encourage team members to embody company values and Engineering/SRE principles, leading by example.
- Collaborate with leadership and partners to ensure alignment on what success looks like and how to achieve it.
- Translate goals into actionable plans that result in meaningful outcomes.
- Recruit, retain, and develop high-performing talent within your team, the SRE org, and the Engineering org at large.
- Foster a culture of technical excellence from design to production, emphasizing the importance of quality.
- Develop strong partnerships with internal teams across infrastructure by communicating a clear and impactful vision and priorities.
What we’re looking for:
- Engineering managers with a proven ability to grow, build, and retain diverse, productive teams.
- 5+ years in software development or site reliability in a dynamic tech environment.
- Demonstrated ability to work cohesively within a team and across engineering departments.
- Leadership experience in the infrastructure or developer productivity space.
- Fluent in at least one programming language like Python, Java, Go or C++.
- Familiarity with public cloud platforms such as AWS, GCP, or Azure.
- Experience or expertise in some or all of the following, with particular focus on reliability, automation, operability, performance and risk management:
- Infrastructure technologies such as Docker, Kubernetes, Tensorflow, ElasticSearch, ZooKeeper, etc.
- Infrastructure as code (e.g. Terraform, Puppet, Chef, Ansible, Salt, Fabric, etc)
- Developer toolchains including SCM tools, Build platforms, test frameworks, CI/ CD products
- Linux/ Unix/ BSD internals and experience
- Open sourcing and familiarity with major open source software is also a plus (e.g. MySQL, Hadoop, Envoy, HAProxy)
This position is not eligible for relocation assistance.
#LI-HYBRID
#LI-JE2