02 Nov
Sr Site Reliability Engineer
Vacancy expired!
job summary:
Essential Duties and Responsibilities:- Manage operations aspects (performance, uptime, reliability, and resilience) of business-critical data assets.
- Own provisioning of cloud infrastructure, working with the data team to accomplish this in a self-service fashion.
- Work with data and development teams to handle operational requirements.
- Identify improvements to infrastructure and operational procedures.
- Help guide data and SRE teams in automation of manual processes.
- Help lead operational runbook creation and maintenance.
- Participate in incident response (some after-hours on-call will be required).
- 4+ years operational experience.
- Application and infrastructure monitoring.
- Cloud hosting providers such as Google Cloud Platform, AWS, Azure (Google Cloud Platform Preferred)
- Cloud orchestration and/or configuration management with Terraform.
- Creating and implementing containerization strategies using Docker.
- Microservice orchestration using Kubernetes.
- Continuous integration and delivery (CI/CD)
- Supporting databases including backup and recovery procedures, such as MySQL and MongoDB.
- At least one high-level programming language, such as Python (preferred), Ruby, Perl.
- Linux.
- Working and influencing across teams.
- Guiding the direction of one's own teams.
- 7+ years operational experience
- Experience with IaC: Terraform (preferred), Cloudformation.
- Supporting databases, including backup and recovery procedures, such as MySQL and MongoDB
- Experience with messaging services: Pub/Sub (preferred), Kafka, Spark
- Manage operations aspects (performance, uptime, reliability, and resilience) of business-critical data assets.
- Own provisioning of cloud infrastructure, working with the data team to accomplish this in a self-service fashion.
- Work with data and development teams to handle operational requirements.
- Identify improvements to infrastructure and operational procedures.
- Help guide data and SRE teams in automation of manual processes.
- Help lead operational runbook creation and maintenance.
- Participate in incident response (some after-hours on-call will be required).
- Experience level: Experienced
- Minimum 5 years of experience
- Education: Bachelors
- DevOps (5 years of experience is required)
- AWS (5 years of experience is required)
- Google cloud platform (5 years of experience is preferred)
- Kubernetes (3 years of experience is required)
- Terraform (3 years of experience is preferred)
- Linux Engineer (5 years of experience is required)
Vacancy expired!