• Find preferred job with Jobstinger
  • ID
    #46081736
  • Job type
    Contract
  • Salary
    $0 - $0
  • Source
    Techbytes Solutions Inc.
  • Date
    2022-09-28
  • Deadline
    2022-11-26
 
Contract

Vacancy expired!

Onsite: Site Reliability Engineer

Location: Atlanta GA (Onsite) & St. Louis MI

Anticipated Duration: 6-12 months

Responsibilities:-Skillset:
  • Jenkins
  • Terraform
  • Kubernetes
  • Docker
  • Google Cloud Platform or similar
  • Hashicorp tools like a vault
  • Ubuntu
  • Java basics
  • Bash/ shell

Objectives of this Role
  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Build software and systems to manage platform infrastructure and applications
  • Improve reliability, quality, and time-to-market of our suite of software solutions
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Provide primary operational support and engineering for multiple large, distributed software applications
  • Influence and Design infrastructure, architecture, standards, and methods to build
large-scale systems.
  • Architect strategies and implementation plans to support end-to-end monitoring, alerting,
troubleshooting and developing dashboards to ensure SLAs and proactive notifications.
  • Identify business problems through deep-dive discussions with business owners and device
suitable solutions using SRE Principles, CI/CD, Virtual services, and SDLC methodologies.
  • Explore and work on the latest DevOps, Infrastructure as a Code, Site Reliability, and Cloud
  • technologies to manage continuous integration &amp, delivery, configuration and
infrastructure orchestration.
  • Engage with Product Managers and Scrum masters to identify dependencies, risks and
  • suggest remediation with a corrective action plan via sustainable, preventive, and
automated reconciliation.
  • Work with development and operations teams to build highly available, cost-effective
systems with extremely high uptime metrics.
  • Guide the team on reliability practices through activities like architecture reviews, code
reviews, creating platforms and frameworks, capacity planning, and chaos testing.
  • Automate system scalability and recovery, and continually work to improve system
resiliency, performance, and efficiency.
  • Responsible to facilitate blameless postmortems and proactive identification of potential
outages factor into iterative improvement.
  • Contributes to developing Service Level Objectives (SLOs), identifying Service Level
Indicators (SLIs) and Error budgets are based on organizational-level SLAs.

Required Skills and Qualifications
  • Bachelor’s degree in computer science or other highly technical, scientific discipline
  • Ability to program (structured and OO) with one or more high-level languages, such as

    Python, Java, C/C, Ruby, and JavaScript
  • Experience with distributed storage technologies like

    NFS, HDFS, Ceph, and S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks

Preferred Qualifications
  • Previous success in technical engineering
  • Coding experience beyond simple scripts

Vacancy expired!

Report job

Related Jobs

Jobstinger