Site Reliability Engineer
Your OpportunityAs a Site Reliability Engineer for Schwab's Core Trading Technology, you will be responsible for a sustainable approach to reliability using SRE principles. Our team is essential in supporting the operational reliability of real-time trading applications for the firm. You will partner with multiple support teams to provide guidance and drive adoption of key reliability engineering practices in support of large-scale and mission-critical services. We are looking for skilled candidates who are enthusiastic about learning new and existing technologies to deliver exceptional solutions for the production resiliency of our systems. The role will require a high level of responsibility and accountability yet has a support structure necessary for development growth. What you are good atThe role will encompass multiple aspects ranging from preventing & resolving production incidents and supporting application releases in our software deployment pipeline. You will have the opportunities to recommend how to improve monitoring and other processes in multiple environments and work with the respective teams to design and implement the recommendations. Shift coverage, on-call rotation, and proactive monitoring are key aspects of this role.
- Practice Site Reliability Engineering mindset and solve problems through automation and instrumentation.
- Identify opportunities to build innovative tools and solve unique operations problems on large enterprise and mission critical applications
- Develop tools, frameworks, and instrumentation to validate and increase rollout success for applications.
- Partner within the Support organizations to build and rollout plans for enhanced telemetry and reduce defects for software delivery to multiple lower environments
- Perform real-time troubleshooting of mission critical application workflows and incorporate feedback to product development.
- Monitor the current-state solution portfolio to identify deficiencies through aging of the technologies used by the application, or misalignment with business requirements.
- Understand, advocate and augment the Schwab Reliability Engineering principles, guidelines and standards
- Analyze the business-IT environment (run, grow and transform the business) to detect critical deficiencies, and recommend solutions for improvement
- Assist with the evaluation and selection of software product standards and services, as well as the design of standard and custom software configurations
- Proven track record supporting production application development and support efforts adhering to a mix of DevOps & SRE frameworks
- Ability to grasp difficult concepts, large architectures, and sophisticated designs quickly
- Progressive experience supporting highly available, mission critical environments, experience leveraging tools to instrument and automate proactive and eventually predictive availability solutions
- Ability to understand multiple technologies and how they inter-relate and integrate
- Proven capability to provide operational visibility on environment health to technology and business partners
- Strong automation, innovation, and problem-solving skills
- Receptive, approachable teammate, with the ability to positively interact with business partners, technology teams, recruiting personnel, offshore, and professional services
- Strong customer advocate with good written and verbal communication skills
- Flexibility to participate in oncall support and shift rotation
- 4-5 year of experience in Enterprise level Infrastructure orchestration with Ansible, Chef, SALT, Puppet and/or Harness
- 4-5 years of experience with C#, .Net, and scripting
- 4-5 years of experience in High Availability and distributed systems, Linux and Windows administration, troubleshooting and support
- 4-5 year of experience with Atlassian tools Jira, Confluence, Bamboo, BitBucket
- Working knowledge of Monitoring tools - Splunk, Appdynamics, Geneos ITRS
- Knowledge of networking including DNS, DHCP, firewalls, load balancers and IP routing
- Familiarity with one or more databases- Oracle, SQL Server, Mongo DB
- Experience with C#, .Net, and scripting
- Excellent debugging skills across a variety of integrated platforms
- BS in computer science or related technical field with at least 5 years of experience with listed technical skills.