Site Reliability Engineer Gitaly -Remote

Job Detail

Site Reliability Engineer Gitaly -Remote

Project Detail

Gitaly is the Git data storage tier of GitLab, providing a reliable, secure and fast distributed Git data store over gRPC. For more information about Gitaly, see the team’s Direction page.

Gitaly’s high-availability storage requires developers who understand distributed storage systems, their management, observability and availability. Cluster team contributes features, fixes bugs and improves performance of this software stack.

Currently, we're building a new distributed cluster solution and improvements to our Disaster Recovery readiness.

What you’ll do

Work with peer SREs to maintain Gitaly’s environments within GitLab’s SaaS offerings, including cost and performance optimization, capacity planning, migrations and debugging production issues.
Participate in architectural discussions and decisions surrounding Gitaly, within the greater GitLab ecosystem.
Design RPC interfaces for the Gitaly service.
Scope, estimate and describe tasks to reach the team’s goals.
Develop production automation and tooling for Gitaly, for use both in SaaS and self-managed installations.
Help ensure that Gitaly development tooling, releases and other processes serve the team and the product’s goals.
Develop Gitaly in accordance with the product’s goals and a focus on reliability and maintainability.
Instrument, monitor and profile Gitaly in the production environment.
Build dashboards and alerts to monitor the health of your services.
Conduct acceptance testing of the features you’ve built.
Educate all team members on best practices relating to high availability.
Write performant, maintainable, and elegant code and peer review others’ code.
Be positive and solution-oriented.
Constantly improve the quality & security of the product.
Take initiative in improving the software in small or large ways to address pain points in your own experience as a developer.
Qualify developers for hiring.
Respond to user emergencies, platform alerts and support requests, including regular on-call duties.

What you’ll bring

Mandatory: experience running highly-available systems in production environments at scale.
Mandatory: hands-on experience with Cloud technologies including Kubernetes.
Mandatory: proven professional experience building, debugging, optimizing software in large-scale, high-volume environments.
Mandatory: proven professional experience writing and testing high-quality code.
Mandatory: a good understanding of building instrumented, observable software systems.
Highly desirable: Experience with Terraform infrastructure as code.
Highly desirable: proven professional experience writing and testing quality code in Go.
Highly desirable: a good understanding of git’s internal data structures or experience running git servers.
Highly desirable: experience with gRPC.
Highly desirable: willingness to learn Ruby.

Interested?? Click me to apply

Skills Required

Kubernetes Golang

Sh 0

Client Budget

0

Proposals Received TillJun 30 2025

Scan with your Smart Phone To Get It Handy.

Project ID:: 3WYKDTNC

Job Detail