Site Reliability Engineer- Onsite Phoenix AZ : 2022-05-22

22 May

Site Reliability Engineer- Onsite Phoenix, AZ

Arizona, Phoenix , 85001

Vacancy expired!

Skills you have:

3+ years of hands-on experience in a production/operational role
Experience with the operation and management of cloud-based services, including operational processes, such as incident management life cycle and event management approaches
Any technical background in digital, coding, networking, systems, and/or applications – you’ll be exposed to all of these, and your future career path could involve an extreme focus on any of these areas
Familiarity with modern operations concepts such as Agile and DevOps
Familiarity with security as it applies to operational platforms and processes
Proven understanding of internet technologies, especially streaming and cloud technologies and services, basic network components and how they interoperate, as well as Internet protocols and configurations
Understanding of orchestration and monitoring of cloud-native services
Firm grasp of a scripting or high-level programming language, such as Python, Ruby, or Perl, with a demonstrated proficiency of experience in leveraging such language for the purpose of automation
Ability to work under pressure in a fast-paced environment and make quick decisions and keep a cool head while under pressure
Credibility with partner-stakeholders (business, engineering, product/program etc.) for a strong team and results-oriented delivery
Excellent oral and written communication skills – demonstrated ability to influence technical and non-technical audiences including those at the senior leadership levels
Schedule flexibility is key to support the operational needs of the business; this role may operate on an adjusted schedule including weekends, overnights, early mornings, etc

Responsibilities:

Overall responsibility for Production Operations of content services, applications, and events, including event monitoring incident management, infrastructure, network, and cloud services management
Create and maintain response plays across a variety of incident management and monitoring tools, such as Blameless, PagerDuty, Service Now, Dataminr, DataDog & New Relic
Drive system enhancements and overall reliability through Dev, SRE, and other functions to prevent future incidents and improve system resiliency/quality; own the post-mortem/incident analysis process with involved teams to identify the root cause and remediation tasks, and identifying areas of concern and drive-thru resolution
Handle the development and reporting of key operational metrics to drive improvements over time
Lead creation and ongoing updates to documentation, including operational runbooks, support monitoring and remediation activities, providing guidance as support, and enabling function for Tier 1 Operations teams
Support marquee events and day-to-day offerings by preparing documentation to guide event readiness and participate in the day of operational coverage
Participate in the development and implementation of Digital Operations processes and SOPs and identify opportunities for automation of operational tasks, incident remediation, and scaling activities
Execute production changes to Digital systems, infrastructure, products, and platforms in support of event and release activities
Act as a point of contact for incident escalation for the Tier 1 Operations team and triage and mitigate as needed
Work closely with the information security team to ensure security requirements are effectively met

Vacancy expired!

Subscribe for new vacancies

Related jobs

»Mechanical HVAC Engineer

2022-05-18

»Senior Mechanical & HVAC Design Engineer

2022-05-18

»Electrician - Project Engineer

2022-05-18

»ENGINEER – CUSTOM / FRAMELESS CABINETRY

2022-05-18

»As-Built Field Tech/ Drafter (Subcontractor) - Phoenix, AZ

2022-05-18

Report job

ID	#41367170
State	Arizona
City	Phoenix
Source	Prosum
Job type	Permanent
Salary	Depends on Experience
Showed	2022-05-22
Date	2022-05-18
Deadline	2022-07-17
Category	Architect/engineer/CAD