15 Jan
DevOps Site Reliability Engineer (SRE)
Illinois, Chicago , 60290 Chicago USA

Vacancy expired!

Job Description: Candidate will be part of the SRE team and lead technical role to determine Reliability Chaos Engineering needs of mission critical systems and business processes Candidate will assess high level architecture and design issues relating to platform enterprise software interactions with other systems Application development infrastructure database and middleware teams to ensure stability and reliability of the system Chaos Engineering will proactive detect issues within the applications platform network and databases in a controlled way using Chaos tools like Chaos Monkey Gremlin Simian Army Candidate should have familiarity with Internet protocols such as HTTP DNS TCP and UDP and Linux development environment and well versed with DevOps Candidate will identify anti patterns optimization and support development of self healing capabilities Responsibilities Create operational tooling for monitoring self healing infrastructures and chaos testing Design and create controlled chaos in production systems Work across teams identify and fix issues that affect systems reliability and performance Guide and design architectural decisions and direct solutions that will enhance our client s product reliability Dive into system and latent reliability issues service performance and capacity modeling of distributed systems at scale Partner with development team to identify anti patterns and optimization strategies create fallback options and help develop self healing capabilities across the enterprise in a sustainable manner Requirements A passion for creating reliable applications and a systematic problem solving approach coupled with a strong sense of ownership and drive 7 years of hands on experience with cloud based technologies and tools in configuration management deployment monitoring and operations Experience with Chaos Engineering tools such as Chaos Monkey Gremlin Simian Army and familiarity with Internet protocols such as HTTP DNS TCP and UDP and Linux development environment Demonstrable Experience with Distributed tracing using tools like Zipkin Jaeger Experience in Application Performance Managing Real User Monitoring infrastructure monitoring and log analysis tool such as Dynatrace ELK Nagios Sensu and Splunk 5 yrs of experience with DevOps Continuous Delivery with configuration automation

Vacancy expired!


Related jobs

Report job