02 Jan
Cloud Reliability Engineer IV
Maine, Scarborough , 04070 Scarborough USA

Address: USA-ME-Scarborough-145 Pleasant Hill Rd Store Code: IT Executive & Administration (2760797) Retail Business Services is the services company of leading grocery retail group Ahold Delhaize USA, providing services to five East Coast grocery brands: Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. Retail Business Services leverages the size and scale of the local brands to and provides industry-leading expertise, insights and analytics to local brands to support their strategies. We are committed to diversity, equity and inclusion and we foster a community of belonging where everyone is valued.For more information, visit https:// www.retailbusinessservices.com . Primary Purpose:Platform Reliability Engineer will help ensure service availability, identifying and automating manual processes, and bridging the gaps between product development teams and operations. Implementing operational improvements in availability, latency, performance, efficiency, change management, monitorChriing, incident response, patch management and capacity planning are all within scope for this role. Whether it's done through code, the introduction of modern tools, and/or better processes continuous improvement and efficiency is the goal.You'll provide operational excellence with troubleshooting skills, ownership in supporting various Azure services.Duties and Responsibilities:Builds, manages, and operate Azure Core Services with automation and infrastructure as codeManages, and operates the continuous delivery framework and tools, manages, and automates the lifecycle of the different platform components and help support product teamsLeverage cloud architecture, applying site reliability principles, full-stack troubleshooting skills across network, application, security, Identity, OS, Containers, on-prem, and distributed services layers.Lead and set strategy, roadmap for cloud reliability and recommend best practices for OperationsMentor the team members to follow the frameworks and guide them to accelerate delivery of projectsProvide reasoning about system & application architecture as well as be comfortable looking at code and offering feedback on how it can be improved to increase reliability.Identify opportunities and drive the implementation of automation to improve patch management, service health, manageability, reliability, and telemetry.Own, triage, investigate and resolve service issues with an emphasis on broad communications, learning & teaching throughout the processDesign process or technology solutions that monitor, identify, and resolve platform, system, deployment, and environmental issues both prior & post production releases, and ensure measurable improvements against Service KPIs.Drive Security and compliance aspects for services in accordance with Azure compliance requirements.Engage in service capacity planning, demand forecasting and work towards Azure cost optimizations.Create and document Runbooks, Operational procedures, and Standards on confluenceCommunicate on a deeply technical level with product engineering, project management and product teams to improve and optimize products, improve infrastructure, and evolve services.Work within a project management/agile scrum teams in a support role as part of a wider teamRemain current on new technologies, methods and procedures including, but not limited to, coding practices such as Test Driven Development, Continuous Integration, Continuous Deployment and Operational excellence,Qualifications:Bachelor's Degree in Computer Science, Information Technology, Engineering, or related field8+ years of IT experience focused on infrastructure which includes server, storage, network, security, Identity4+ years of experience supporting, maintaining, and automating Azure environments3+ years of experience using IaC tools (ARM, Terraform, JSON,YAML, PowerShell, Github etc)Production experience in Cloud technologies - Azure IaaS, PaaS, networking, Azure functions, Azure automation and runbooks, workbooks, Insights, Security center, Azure Monitor, Log Analytics.Ability to read, write, configure, design, and script end-to-end service telemetry, alerting and self-healing capabilities for platform services, lead the execution and ongoing management of servicesAbility to work in an Extreme Programming environment and work in a paired programming/operating model, able to lead the team and help remove roadblocksAble to facilitate diverse teams, multi-task, and work under pressure to meet aggressive schedule targetsHands on experience with IaC tools like ADO, ARM, terraform, ansible, PowerShell, python, azcli, githubExperience in service capacity planning, demand forecasting, software performance analysis and system tuningTechnical and Operational expertise in Windows/Linux/VMware/Hyper-V/AKS, SQL and N0-SQL DB's, IaaS, PaaS, FaaS, Data, BCDR, Security, Management, Storage, Networking, Monitoring, Identity and ConnectivityExperience managing and maintaining code repos, build systems, and CICD pipelinesExperience in infrastructure and configuration as code, as well as service auto-scale capabilities.Worked in Devops and Agile environments, Blend of both Development and SRE mindsetSystematic problem-solving and troubleshooting skills coupled with a strong sense of ownership and drive.Participate in on call rotation. Participate, collaborate, and provide guidance in retrospectives.At least 4 years of hands-on operational experience supporting the following or related experience:Azure Virtual Network, VWAN, Express route, Load Balancer (L4/L7), Traffic Manager, CDN, Azure DNS, routing & routing protocols like BGP, firewall conceptsAzure Identity including any of the following: Azure AD, PIM, Conditional Access, MFA, Azure AD Connect, Password less sign-ins, Microsoft Defender, key vaultAzure Governance, Security, Monitoring, Workbooks, Compliance, and cost awarenessAzure Virtual Machines, Containers and/or Kubernetes and/or OpenShift (infrastructure perspective)Azure Storage Account, Disk, Snapshot, Backup, Site Recovery, file sync, Data Lake.Preferred Qualifications:Certification in Azure Administrator -required, Azure DevOps -preferred, Azure Solutions Architect -preferred #LI-RV1#LI-remote#DICEJobs Retail Business Services is an equal opportunity employer. We comply with all applicable federal, state and local laws. Qualified applicants are considered without regard to sex, race, color, ancestry, national origin, citizenship status, religion, age, marital status (including civil unions), military service, veteran status, pregnancy (including childbirth and related medical conditions), genetic information, sexual orientation, gender identity, legally recognized disability, domestic violence victim status or any other characteristic protected by law. We provide reasonable accommodations to applicants and employees with disabilities.If you have a disability and require assistance in the application process, please contact our Talent Acquisition Department at tad@retailbusinessservices.com> Job Requisition: 251860externalUSA-ME-Scarborough10312022

Related jobs

Report job