21 Jan
Site Reliability Engineer
Vacancy expired!
QUALIFICATIONS
- Bachelor's degree in computer science or equivalent experience
- 5+ years public facing production application support experience in high uptime / high transaction volume environment
- 5+ years UNIX administration experience including diagnosis of performance issues, package management, load estimation, kernel tuning, networking configuration, etc.
- 4+ years software engineering experience (Java, C, C, Python, Go)
- MUST HAVE Strong scripting skill
- MUST have worked with automation tools such as Terraform and Puppet
- Understand or worked with CI/CD environment and tools such as Jenkins
- Understanding of networking principles, esp. TCP/IP
- Excellent troubleshooting and analytic skills
- Ability to work independently on large, complex projects with minimal guidance
- MUST HAVE experience deploying and operating Container technologies with Kubernetes in Production
- MUST HAVE experience creating and managing Kubernetes cluster deployment, configuration and operations using Helm/YAML
- Experience with Project Contour as Kubernetes ingress controller will be considered a big plus
- MUST HAVE experience using Infrastructure-as-code (IaC) to automate various aspects of site operations
- Experience with deploying and running Production workloads in Cloud environment will be a plus
- Experience working with Splunk to identify and troubleshoot issues is necessary (Splunk query experience is critical)
- Engineer extensive scripting and automation to install and operate applications with minimal manual intervention
- Evaluate, test, deploy and maintain both custom developed and third party software upgrades
- Maintain SDLC systems such as test environments, source control and automated build/test/deploy systems
- Provide developer support ongoing, frequently embedded in development teams to facilitate collaboration
- Create & maintain application architecture and troubleshooting documentation
- Manage configuration and operations of Kubernetes clusters using Infrastructure-as-code approaches
- Provide 24x7 production support as part of a team rotation, resolving or escalating issues as appropriate
- Maintain production services to highly demanding SLAs
- Take ownership of production issues, working closely with infrastructure and development teams on issue resolution
- Support releases on a regularly scheduled basis, as well as emergency releases as needed
- Deploy application and data changes to all environments as needed
- Design and implement new environments, services and application architecture modifications
- Research, evaluate and implement operational improvements, application packages and architectural modifications
- Participate in change control, release planning, and other operational planning
- Remain current on industry leading solutions in both private and public cloud hosting (VMWare, Xen, KVM, Amazon Web Services (AWS), Azure, Google App Engine, Kubernetes etc.)
- Remain current on modern open-source persistence technologies (Hazelcast, BDB, Project Voldemort, MEMCACHED, etc.)
- Remain current on modern containerization technologies (Docker, vSphere Integrated Containers, Kubernetes)
Vacancy expired!