Site Reliability Engineer (M/F) – Lisboa

Introdução

Claire Joster is currently recruiting for a reference client in car rental services, who aims to strengthen its internal structure with the integration of a Site Reliability Engineer (M/F).

Função

Define Reliability: Design, implement, and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for our production services;
Automation: Write code and scripts (e.g., Python, Go, Bash) to automate operational tasks, system provisioning, and incident remediation;
Incident Response: Act as a key responder for production incidents. Participate in a 24/7 on-call rotation, lead troubleshooting efforts, and drive incidents to resolution;
Blameless Post-mortems: Lead and participate in blameless post-incident reviews to identify root causes and implement lasting corrective actions;
System Architecture: Partner with development teams to design, build, and deploy scalable, highly available, and fault-tolerant systems;
Monitoring & Observability: Build and maintain comprehensive monitoring and logging solutions (e.g., Prometheus, Grafana, ELK Stack, Datadog) to proactively detect and diagnose issues;
Capacity Planning: Monitor system performance and usage, forecast demand, and plan for future capacity needs;
Reduce Toil: Identify and eliminate manual, repetitive operational work by building durable, automated solutions.

Requisitos

Minimum 5 years of experience in Site Reliability Engineering, software engineering, or large-scale systems administration;
Strong experience with cloud platforms (AWS, Azure);
Proficiency with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible, CloudFormation);
Hands-on experience with CI/CD tools (e.g., Jenkins, GitLab CI, GitHub Actions);
Solid understanding of containerization technologies (Docker) and orchestration systems (Kubernetes);
Experience with version control systems, particularly Git;
Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack);
A systematic, data-driven approach to problem-solving and troubleshooting;
Experience with on-call rotations and incident management.

05/1/2026