As a Senior Site Reliability Engineer (SRE), you'll be instrumental in ensuring
the reliability, scalability, and performance of our systems and services for both internal and external stakeholders. We're seeking a seasoned professional who demonstrates mastery in computer science fundamentals and possesses a track record of independently implementing and delivering end-to-end, cloud-native solutions. This role requires strong backend expertise, including a deep understanding of our application's infrastructure, alongside proficiency in Site Reliability Engineering principles.
What you will do:
● Work directly with an Engineering Lead and other team members in the Platform and
engineering teams to ensure reliable system functionality and scalability.
● Lead efforts in designing, building, and maintaining highly scalable, reliable, and secure
infrastructure solutions.
● Drive initiatives to improve system reliability, performance, and scalability.
● Act as a subject matter expert in incident response, participating in on-call rotations and
resolving production issues promptly.
● Design and implement robust monitoring, alerting, and incident response mechanisms to
ensure system uptime and availability.
● Conduct post-incident reviews and implement preventive measures to mitigate future
incidents.
● Mentor junior team members and contribute to their professional development.
● Stay abreast of industry best practices and emerging technologies, advocating for their
adoption where applicable.
We are looking for people who have:
● Extensive experience in backend development and automation, with proficiency in:
Bash, Golang, SQL and Typescript.
● Strong understanding of Site Reliability Engineering principles and practices.
● Demonstrated experience in designing and implementing scalable and reliable
infrastructure solutions.
● Expertise with public cloud providers (GCP, AWS, Azure)
● Expertise with Distributed Systems, managed with Kubernetes
● Minimum of 7 years of professional software development experience, with a focus on
site reliability engineering or infrastructure operations.
● Experience with PubSub/Eventing patterns is advantageous.
● Bachelor's degree in Computer Science or related field, or equivalent practical
experience