Job Detail

Site Reliability Engineer Gitaly -Remote

  • Medium Level
  • Type Fixed
  • Duration: More than 06 months

Project Detail

Gitaly is the Git data storage tier of GitLab, providing a reliable, secure and fast distributed Git data store over gRPC. For more information about Gitaly, see the team’s Direction page. 

Gitaly’s high-availability storage requires developers who understand distributed storage systems, their management, observability and availability. Cluster team contributes features, fixes bugs and improves performance of this software stack.

Currently, we're building a new distributed cluster solution and improvements to our Disaster Recovery readiness.

What you’ll do  

  • Work with peer SREs to maintain Gitaly’s environments within GitLab’s SaaS offerings, including cost and performance optimization, capacity planning, migrations and debugging production issues.
  • Participate in architectural discussions and decisions surrounding Gitaly, within the greater GitLab ecosystem.
  • Design RPC interfaces for the Gitaly service.
  • Scope, estimate and describe tasks to reach the team’s goals.
  • Develop production automation and tooling for Gitaly, for use both in SaaS and self-managed installations.
  • Help ensure that Gitaly development tooling, releases and other processes serve the team and the product’s goals.
  • Develop Gitaly in accordance with the product’s goals and a focus on reliability and maintainability.
  • Instrument, monitor and profile Gitaly in the production environment.
  • Build dashboards and alerts to monitor the health of your services.
  • Conduct acceptance testing of the features you’ve built.
  • Educate all team members on best practices relating to high availability.
  • Write performant, maintainable, and elegant code and peer review others’ code.
  • Be positive and solution-oriented.
  • Constantly improve the quality & security of the product.
  • Take initiative in improving the software in small or large ways to address pain points in your own experience as a developer.
  • Qualify developers for hiring.
  • Respond to user emergencies, platform alerts and support requests, including regular on-call duties.

What you’ll bring 

 

  • Mandatory: experience running highly-available systems in production environments at scale.
  • Mandatory: hands-on experience with Cloud technologies including Kubernetes.
  • Mandatory: proven professional experience building, debugging, optimizing software in large-scale, high-volume environments.
  • Mandatory: proven professional experience writing and testing high-quality code.
  • Mandatory: a good understanding of building instrumented, observable software systems.
  • Highly desirable: Experience with Terraform infrastructure as code.
  • Highly desirable: proven professional experience writing and testing quality code in Go.
  • Highly desirable: a good understanding of git’s internal data structures or experience running git servers.
  • Highly desirable: experience with gRPC.
  • Highly desirable: willingness to learn Ruby.

    Interested?? Click me to apply

Skills Required