04 Nov
Devops SRE at Austin, TX
Texas, Westlake , 76262 Westlake USA

Role: SRE Lead Location: Austin, TX Who are we looking for? Application SRE with overall experience of 8+ years of experience in supporting Complex and critical large scale distributed systems and extensive hands-on experience in handling production failures & driving root cause analysis and remediation. Primary Responsibilities: Effectively handle the Production outages & Performance Issues with quality analysis quick resolutions Manage incidents and effectively communicate with users, application owners and senior stakeholders across all areas. Work with development teams to improve applications' operational features for faster MTTD and MTTR and auto recovery Identify and/or analyze patterns of incidents/problem, conduct flawless post-mortems, develop permanent remediation plans, implement automation to prevent future incidents from re-occurring again Challenge existing application setup, processing and suggest different ways to solve problem or improve stability Actively participate in Change management process with view to manage risk in production environment Identify s / processes that can be automated and then work with Engineering team in automating them Build and improve run books for generalists to minimize operational errors and gain fungibility/efficiency Build E2E Monitoring (Hardware, Availability, Logging, distributed tracing, Business Transaction) of the system as well as End User Experience Monitoring using APM Tools like Splunk, Appdynamics, 1000Eyes etc. as a developer/configurator for performance diagnostics, monitoring, ing & Dashboarding. Develop Self-healing solutions for the repeated infrastructure and service failures. Minimize manual involvement by driving solutions, automation and implementing continuous improvements that creates an operating environment, including development & configuration for dynamic monitoring, ing & recovery Develop reports that provide trending statistics to track and manage application health and support service performance Technical Skills: Min 5+ years of hands-on experience in Java/.NET Application Support Should have solid hands-on experience in troubleshooting and fixing application failures, application Performance degradation, Code debugging, Batch Failure, Hardware and Network failures. Comfortable with large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management. Solid understanding of ITIL, DevOps, SRE, CICD, Cloud Computing. Experience with identifying application/infrastructure risks and mitigation strategy and the ability to work with a team to ensure risks are mitigated. Experience with debugging techniques for root cause analysis of issues. ITIL working knowledge: Event, Incident, Release, Problem and Knowledge Management. Experience developing monitoring solutions across front end, Webapi, services and database layer Experience with instrumentation, monitoring, ing, and responding - relative to performance and availability of application, using tools such as AppDynamics, Splunk, 1000Eyes,ITRS etc. Experience in Administration of Windows & Linux Servers, Oracle DBA, Networking and Load Balancing. Experience with scripting languages (e.g., Bash, Python, Shell, PowerShell) to automate tasks. Experience with Cloud Orchestration/Workflow automation and High Availability Experience with Automation and Configuration tools like Ansible, Puppet, Chef Clear understanding of one or more Cloud systems (PCF, Google Cloud Platform, AWS, Azure Cloud or others) Fair Understanding of CI/CD and DevOps Tools Qualification: At least 8+ years of work experience in Application Production Support. Education qualification: B.Tech, BE, BCA, MCA, M. Tech or equivalent technical degree from a reputed college. What's in for you? With the current opportunity, you will get to work with the team that has consistently been setting benchmarks for other deliveries in terms of delivery high CSATs, project completion on time and being one of the best teams to work for in the organization. You get an open and transparent culture along with freedom to experimentation and innovation About the practice/ Project: Leading North American Retail Brokerage firm dealing with technology projects in the area of Digital. The customer is a leader in technology adoption. Skills PRIMARY COMPETENCY : DevOps PRIMARY SKILL : SRE PRIMARY SKILL PERCENTAGE : 51 SECONDARY COMPETENCY : Microsoft Technologies SECONDARY SKILL : .Net SECONDARY SKILL PERCENTAGE : 25 TERTIARY COMPETENCY : IT Service Management TERTIARY SKILL : ITIL - Incident Manager TERTIARY SKILL PERCENTAGE : 24


Related jobs

Report job