SalaryUSD Depends on Experience Depends on Experience
SourceK Source Inc
Hi, This is Sagar from Ksource Inc. We have an excellent job opportunity for you below is the JD please have a look.Job Title: Site Reliability Engineer Architect Location: Atlanta, GADuration: Full time SRE architect will play the mission-critical role of ensuring that critical systems are healthy, monitored, automated, and designed to scale. This role requires a thoughtful problem solver with excellent organizational skills. The Site Reliability engineering team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. This role will be responsible for responding to production problems, investigating their causes, and engineering and advising on permanent solutions.Responsibilities
- Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
- Design, develop, ship, and motivate the creation of software and systems to increase product reliability and organizational efficiency.
- Guide reliability practices through the entire software development lifecycle through activities like architecture reviews, code reviews, creating platforms and frameworks, capacity planning.
- Work with senior engineering and testing team members to build tools and testing strategies for problem prevention, detection, and chaos testing.
- Design and create centralized logging and monitoring systems.
- Design and create robust logging, monitoring, and alerting systems.
- Troubleshoot production incidents in real time.
- Lead root cause investigations.
- Improve service reliability through blameless post-incident reviews and using code to prevent or respond to problem recurrence.
- Proactively identify system anomalies.
- Recommend and execute testing strategies.
- Recognize automation opportunities.
- Participate in on-call rotation and be able to work on weekend during on call schedule.
- Code level debugging on issues escalated to the team.
- Develop tools to automate routine jobs through knowledge learned on the job.
- Plug into software release cycle. Work closely with developers to ensure software releases are well designed, planned, implemented, released, and monitored.
- Automate time-consuming and manual processes.
- Experience Cloud technologies and solutioning. (AWS, Google Cloud Platform Preferred)
- Experience with IAC tools (Terraform, CloudFormation)
- Experience with configuration management tools like Ansible.
- Experience with container technology and orchestration (Kubernetes, Docker).
- Proficiency with tools like Git, Bitbucket
- Linux operating system, testing tools and database management with MySQL.
- Experience in one or more of the following: Java, JS, Duck creek, Python, Microservices
- Experience with Monitoring tools like AppDynamics.
- Experience with Log management and ELK Stack. (Elastic Search, Logstash, Kibana)
- Experience with APICA, Zebra tester for synthetic monitoring
- Experience with Pager Duty for Alerting.
- Understanding of the Application servers, Network and Databases.
- Excellent understanding of Scalability processes and techniques.
- Understanding of Jenkins or other build tools.
- Hands on experience in administering high availability and high-performance environments, as well as managing large-scale deployments of traffic-heavy applications.
- Someone who can handle multiple complex systems and not shy away from the challenge of improving them.
- The willingness to try new technologies and make them harmonize with existing systems to achieve better operations overall.
- Experience of working in large financial services or retail chain organizations
- Excellent communication and organizational skills
- Thriving as a member of a team excelling under pressure
- The ability to think fast; A natural problem-solver
Site Reliability Engineer Architect