28 Nov
Site Reliability Engineer
Minnesota, Minneapolis / st paul , 55401 Minneapolis / st paul USA

Vacancy expired!

Site Reliability Engineer (SRE)Minneapolis, MNContract to hireInformation Technology (IT) Department is currently seeking a Senior ITSite Reliability Engineer (SRE) to join its Production Support Unit. The Production SupportUnit is responsible for a variety of operational services with focus on providing enterpriseservices that result in a highly effective technology experience for our staff and businesspartners (enterprise infrastructure monitoring). As a lead member of our SRE serviceapproach, and in support of our world class hybrid cloud-based services, the consultant willparticipate in technical advocacy for product optimization, deploy scalable automation,monitor capacity and performance, incident coordination, root cause analysis, and incidentpost mortem.Scope of services/description of work to be performed

  • Take responsibility for designing solutions that correspond to non-functional requirements
  • such as availability, performance, security, and maintainability.
  • Leverage your expertise in coding, algorithms, complex analysis, enterprise incident
  • coordination, and large-scale system design.
  • Model SRE culture of intellectual curiosity, problem solving, openness, collaboration,
  • reasonable risk taking, and big thinking in a self-directed environment.
  • Build highly scalable platforms and fault tolerant systems across a range of technologies
  • Define, drive adoption and enforcement of service level objectives at both service and
  • experience levels
  • Analyze root-cause complex problems involving multiple integrated systems and services,
  • networks, hardware and software that relate to scaling and performance
  • Set standards for deployments at scale, infrastructure reliability and scalability
  • Influence engineering teams with customer focus, world class quality, effective
  • communication, decisive, fast moving solutions, quick and constructive resolution of conflicts
  • Manage service availability and scalability through process, tools, and automation
  • Perform post-mortems and optimize incident response processes
  • Lead incident response for production incidents; Drive investigation, analysis and
  • troubleshooting to resolve production incidents and systematically drive down detection and
  • mitigation times
  • Bring a strong engineering focus to operations, putting your energy into preventing incidents,
  • automation frameworks, self-service infrastructure, logging and metrics, and operational
  • scorecards
  • Develop CI/CD processes to improve cadence
  • Identify or utilize existing tools for logging, monitoring, event management, notification,
  • runbook automation, root cause analysis
  • Partner with security engineers to develop plans and automation that aggressively and safely
  • respond to new risks and vulnerabilities.
  • Develop, communicate, and monitor standard processes to promote the long-term health of
  • sustainability and health of operational development tasks
Specific skills/experience required: 2+ years of experience related to IT Site Reliability Engineering such asconfiguration, monitoring, information management, AIOPS, DEVOPS, technicalarchitecture, Cloud management systems, ITOM/HDIM, Incident Coordination, orother components of experience centric operations. Experience:
  • Experience with building and maintaining application stacks in a Hybrid Cloud
  • environment, as well as expertise with Microsoft Azure is a plus.
  • Thought leader and mentor for internal and external technical talent
  • 3-5 years or more building and scaling distributed systems leveraging web scale
  • technologies like Linux, Apache, MongoDB, Python, Oracle RDBMS, Redis, Postgres and Hadoop
  • Experience with Linux/Unix internals and systems services like DNS, DHCP, TFTP,
  • iptables, smtp, as well as networking protocols such as TCP, UDP and HTTP.
  • Programming experience in one or more of the following languages: Go, Java,
  • Python, Ruby, Shell, Powershell, JSON, YAML, REST, CLI, and CI/CD tools such as
  • Travis, Drone, Jenkins, Azure DevOps.
  • Hands-on experience using source control (Git, GitHub) and feature branching
  • strategies Preferred Technical and Professional Expertise
  • Experience with containers, such as with Docker, Kubernetes and Open Shift
  • Experience with monitoring and observability such as with New Relic, Nagios, Icinga,
  • or Sysdig
  • Experience automating infrastructure, configuration management, testing, and
  • deployments using tools like Ansible, Chef and can explain the Infrastructure as
  • Code paradigm
  • Participate in security compliance efforts; experience drafting and/or
  • reviewing IT policies.
Excellent interpersonal, written, and verbal communication skills. Ability to:o Adapt to changing priorities, demands, and timelines.o Champion change throughout the organization.o Establish and maintain effective working relationships with all levels of theorganization and contribute in a team environment.Work as a leader in a team environment ensuring customer satisfaction and technicalexcellence.regardspoornima4048014505

Vacancy expired!


Related jobs

Report job