-
ID
#45983373 -
Job type
Permanent -
Salary
TBD -
Source
Elevance Health -
Date
2022-09-23 -
Deadline
2022-11-21
Cloud Site Reliability Engineer
Georgia, Atlanta, 30303 Atlanta USAPermanent
Vacancy expired!
- Identify areas in the infrastructure that require improvement. Create solutions for improvement and drive those solutions to completion.
- Deploys and maintains Kubernetes clusters including autoscaling.
- Provide operational support for serverless stack on AWS/GCP.
- Identifies opportunities to improve and secure Kubernetes clusters.
- Participates as member of the incident response team for of Kubernetes cluster issues.
- Designs and builds systems which improve scalability, usability, and performance.
- Develops and maintains monitoring and alerting of all production services.
- Builds metric-based dashboards for production services as needed.
- Work closely with Security team to maintain security in production environments.
- Work closely with Network team to maintain networks in production environments.
- Member of North America's Incident response team.
- Performs technology needs analysis by conferring with departmental personnel to analyze current operational procedures, identify problems, and learn specific input and output requirements.
- Reviews computer system capabilities, workflow, and scheduling limitations to determine if requested program or program change is possible within existing system.
- Requires an BA/BS degree in Information Technology, Computer Science or related field of study
- Minimum of 10 years in one or more of the following: information systems, computer networking, telecommunications, systems development and management required;
- Significant experience with multiple technical and business disciplines required;
- Broad-based experience to plan and design highly complex systems;
- Or any combination of education and experience, which would provide an equivalent background.
- 5+ years of experience in programming languages like Python, Golang, Java.
- 5+ years working in Linux environments, and scripting in shell.
- 5+ years in operational experience troubleshooting complex and large-scale issues.
- Knowledge of best practices and solutions for implementing CI/CD solutions as well as release pipelines.
- Knowledge in creating a robust and resilient infrastructure
- Working knowledge of building and managing Day 2 operations of Kubernetes system.
- DevOps certifications or Kubernetes Certifications is a plus.
- Solid understanding of cloud computing - AWS, GCE, Azure
- Experience in incident response and root cause analysis service disruptions
- Demonstrated expertise in a range of automation tools and approaches in Windows and Linux environment (e.g. Git, Ansible, Terraform, Python, shell scripts, as well as other scripting languages).
Vacancy expired!
Report job