Senior Site Reliability Engineer

Job Detail

Helping to create a repeatable OpenStack cloud deployment
Implementing a network topology using Open vSwitch, providing per-tenant networking, load balancing, and IPv6
Performing day-to-day operational tasks on Wikimedia’s Cloud Services infrastructure (deployment, maintenance, configuration, troubleshooting). Develop and support automation tools and processes in support of these tasks.
Participating in on-call rotation and support in a 24×7 environment

Comfortable working and thriving within a Linux ecosystem
Understand networking in the physical domain of switches and servers
Software development skills in at least one of the following languages: Python, Go, Javascript, and/or Ruby
B.S. or M.S. in Computer Science or related field or equivalent in related work experience.

Share our values, appreciate our code of conduct, support our team norms, and work by all three
Strong English language skills and ability to work independently, as an effective part of a globally distributed team
Support of our users (volunteer and staff developers) using our service offerings
Passionate about the value of learning and growing together

Utilized configuration management tools such as Puppet, Ansible, Chef, and SaltStack
Used Kubernetes, Docker Swarm, Mesos, or similar container orchestration platforms
Operated an elastic computing environment such as OpenStack or Cloudstack
Operated a multi-tenant capable software-defined network (SDN)
Experience in serverless computing environments
Linux systems troubleshooting and debugging skills
Interest in open-source software projects and communities

Client Budget

Proposals Received TillJun 28 2025

Project ID:: I4PK5X2S