Talent.com
Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer (SRE)

SallaAl Jubayl, Saudi Arabia
4 days ago
Job description

We are looking for a Senior Site Reliability Engineer (SRE) to help design, scale, and secure our rapidly growing platform infrastructure. You will work across all critical systems — from customer-facing applications and APIs to internal platforms and data services — ensuring availability, performance, and cost efficiency at scale. You'll be hands‑on with Kubernetes, observability, GitOps, automation, and cloud infrastructure, while partnering closely with application, platform, and data teams to deliver a highly reliable and self‑healing environment. This role is ideal for an engineer who thrives on complex distributed systems, loves to automate everything, and can balance speed, stability, and cost‑efficiency in production.

Qualifications

Bachelor's degree in Computer Science, Engineering, or a related field — or equivalent work experience.

Design, deploy, monitor, and maintain production workloads across Kubernetes (EKS / AKS / GKE) clusters.

Build self‑healing, auto‑scaling systems that minimize manual intervention and ensure uptime.

Design and operate reliable database and storage platforms (SQL, NoSQL, and object stores) within Kubernetes environments.

Implement backup, disaster recovery, replication, and failover strategies to meet RPO / RTO targets.

Troubleshoot and recover Kubernetes Persistent Volumes (StorageClasses, CSI drivers, PVC issues).

Optimize storage performance and cost through multi‑tier strategies, hot / cold data separation, and S3 / offloading lifecycle policies.

Secure and scale object storage platforms (e.g., MinIO / S3‑compatible) for high‑throughput data pipelines.

Manage block storage (EBS / io2 / gp3) and shared file systems (EFS, NFS) for resilience and cost balance.

Collaborate with teams to optimize networking, ingress / egress traffic, and service mesh for secure communication.

Platform & Infrastructure Reliability

Design, deploy, monitor, and maintain production workloads across Kubernetes (EKS / AKS / GKE) clusters

Build self‑healing, auto‑scaling systems that minimize toil and manual intervention

Optimize networking, ingress / egress traffic control, and service mesh for secure & performant communication

Design and operate reliable database and storage platforms (SQL, NoSQL, and object stores) in Kubernetes environments

Own backup, disaster recovery, replication, and failover strategies to meet RPO / RTO targets for critical data services

Optimize storage performance and cost through multi‑tier strategies, hot / cold data separation, and S3 / offloading lifecycle policies

Troubleshoot and recover Kubernetes Persistent Volumes confidently during incidents (StorageClasses, CSI drivers, PVC issues)

Secure and scale object storage platforms (e.g., MinIO / S3‑compatible) and integrate with workloads for high‑throughput data pipelines

Work with block storage (EBS / io2 / gp3) and shared file systems (EFS, NFS) to balance performance, resiliency, and cost

Automation & Delivery

Champion GitOps and CI / CD best practices (ArgoCD, Flux, GitHub Actions). Build automation for infrastructure provisioning and upgrades using Terraform, Helm, and Kubernetes Operators

Reduce release risk through progressive delivery strategies (blue / green, canary, spot instance rolling updates)

Observability & Incident Response

Own the monitoring and alerting stack (Prometheus, Grafana, Loki, VictoriaMetrics, OpenSearch)

Lead incident management and postmortems to prevent recurrence

Provide real-time visibility into system health, performance, and cost metrics

Security & Compliance

Implement least‑privilege IAM policies, secure service‑to‑service communication, and network ACLs / firewalls

Enforce Kubernetes RBAC, secret management, and secure image supply chain

Participate in audit readiness and compliance efforts

Performance & Cost Optimization

Analyze and tune system performance under scale (CPU / memory / IO)

Partner with product and platform teams to right‑size clusters, databases, and storage tiers

Introduce cost visibility dashboards for engineering leadership.

Preferred Qualifications

Experience managing mission‑critical systems at scale (high traffic, multi‑region)

Proven cost optimization in cloud / K8s environments

Familiarity with service mesh (Istio, Linkerd) or advanced networking / egress control

Experience with data platform components (Airflow, Debezium, ClickHouse, etc.) is a plus but not required

Strong communication skills and teamworker — able to collaborate across engineering, DevOps, security, and product teams.

Requirements

8+ years in SRE / DevOps / Infrastructure Engineering roles

Deep Kubernetes expertise (multi‑cluster, Helm chart development, advanced networking)

Strong GitOps workflows using ArgoCD / Flux

Expertise with AWS (preferred) or Azure / GCP, plus Infrastructure‑as‑Code (Terraform, Pulumi, CloudFormation)

Advanced knowledge of SQL & NoSQL databases (MySQL / Aurora, PostgreSQL, MongoDB, Redis)

Scripting / automation skills in Python, Bash, or Go

Solid background in monitoring / observability (Prometheus, Grafana, Loki, ELK / Opensearch, VictoriaMetrics)

Experience with CI / CD at scale and managing production incidents

Experience with streaming / messaging (Kafka, RabbitMQ, or similar)

Benefits

Comprehensive Training & Development programs

Performance‑based Bonus incentives

Flexible Work From Home options

#J-18808-Ljbffr

Create a job alert for this search

Senior Site Engineer • Al Jubayl, Saudi Arabia

Related jobs
Safety / EHS Engineer – Overhead Transmission Line (OHTL) & Substation – KSA (L1)

Safety / EHS Engineer – Overhead Transmission Line (OHTL) & Substation – KSA (L1)

Hudson ManpowerAl Jubail, 01, SA
Candidates must have a strong understanding of health, safety, and environmental regulations in the construction and electrical infrastructure industry. Transmission Lines / Substation Projects.With...Show moreLast updated: 30+ days ago
Site Manager(Power Plant) | KSA

Site Manager(Power Plant) | KSA

Hudson ManpowerAl Jubail, 01, SA
Oversee day-to-day site operations and ensure the timely execution of construction and installation activities.Supervise all on-site personnel, subcontractors, and suppliers to ensure work is perfo...Show moreLast updated: 16 days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

CanonicalAl Jubayl, Saudi Arabia
Senior Site Reliability Engineer.Globally remote role Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets.Our platform, Ubu...Show moreLast updated: 30+ days ago
  • Promoted
Mechanical Reliability Engineer

Mechanical Reliability Engineer

Doroob Hr ConsultingAl Jubayl, Saudi Arabia
Urgently need to work with a leading industrial company in Saudi Arabia.The role is for an Office Mechanical Engineer specialized in asset management or reliability, with strong education in those ...Show moreLast updated: 1 day ago
  • Promoted
Site Manager - BHIG

Site Manager - BHIG

Air ProductsAl Jubayl, Saudi Arabia
Air Products At Air Products, our purpose is to bring people together to reimagine what’s possible, collaborate and innovate solutions to the world’s most significant energy and environmental susta...Show moreLast updated: 21 days ago
  • Promoted
2nd Engineer

2nd Engineer

Telford OffshoreAl Jubayl, Saudi Arabia
Telford Offshore is looking for a.Second Engineer for DP3 Accommodation / Construction barge.Minimum 2 years experience in rank on DP vessels. Experience with cranes and gangways.Experience with Cat...Show moreLast updated: 2 days ago
  • Promoted
Engineer, Reliability Job

Engineer, Reliability Job

TasneeAl Jubayl, Eastern Province, Saudi Arabia
Press Tab to Move to Skip to Content Link.Select how often (in days) to receive an alert : .Tasnee was established in 1985 as the Saudi private sector's first fully owned joint-stock industrial compa...Show moreLast updated: 30+ days ago
  • Promoted
Engineer III, Electrical Reliability Job

Engineer III, Electrical Reliability Job

TasneeAl Jubayl, Saudi Arabia
Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert : Engineer III, Electrical Reliability Job. Jubail Tasnee was established in 1985 as the Saudi private sector'...Show moreLast updated: 30+ days ago
  • Promoted
Site Engineer - Transformer

Site Engineer - Transformer

Alfanar Engineering ServicesAl Jubayl, Saudi Arabia
Get AI-powered advice on this job and more exclusive features.Direct message the job poster from Alfanar Engineering Services Recruitment Specialist @ Alfanar Projects | (ATS) | India Branch | Conn...Show moreLast updated: 25 days ago
  • Promoted
Senior Site Reliability / Gitops Engineer

Senior Site Reliability / Gitops Engineer

CanonicalAl Jubayl, Saudi Arabia
Senior Site Reliability / Gitops Engineer.Senior Site Reliability / Gitops Engineer.Canonical Senior Site Reliability / Gitops Engineer. Be among the first 25 applicants Join to apply for the.Senior...Show moreLast updated: 30+ days ago
Safety / EHS Engineer – Overhead Transmission Line (OHTL) & Substation – KSA (#S1002)

Safety / EHS Engineer – Overhead Transmission Line (OHTL) & Substation – KSA (#S1002)

Hudson ManpowerRas Al Khair, 04, SA
Candidates must have a strong understanding of health, safety, and environmental regulations in the construction and electrical infrastructure industry. Transmission Lines / Substation Projects.With...Show moreLast updated: 30+ days ago
  • Promoted
Engineering Manager- Ceph & Distributed Storage

Engineering Manager- Ceph & Distributed Storage

CanonicalAl Qatif, Saudi Arabia
Engineering Manager- Ceph & Distributed Storage.Canonical 3 days ago Be among the first 25 applicants Join to apply for the. Engineering Manager- Ceph & Distributed Storage.Canonical Canonical is a ...Show moreLast updated: 30+ days ago
  • Promoted
Lead Engineer

Lead Engineer

Jubail O&M Company LimitedAl Jubayl, Eastern Province, Saudi Arabia
Reporting directly to the Technical Support Manager, in Developing and implementing an asset management plan, monitoring asset performance and risk management. Should act as the Subject Matter Exper...Show moreLast updated: 21 days ago
  • Promoted
Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer (SRE)

SallaAl Qatif, Saudi Arabia
We are looking for a Senior Site Reliability Engineer (SRE) to help design, scale, and secure our rapidly growing platform infrastructure. You will work across all critical systems — from customer-f...Show moreLast updated: 4 days ago
  • Promoted
Site Reliability Engineering Manager

Site Reliability Engineering Manager

CanonicalAl Jubayl, Saudi Arabia
We are hiring a Site Reliability Engineering Manager aspiring for a world-class DevOps and GitOps engineering management challenge, bringing together operations management, software engineering and...Show moreLast updated: 30+ days ago
  • Promoted
Site Manager

Site Manager

John Wood Group PLCAl Qatif, Saudi Arabia
Overview Remarkable people, trusted by clients to design and advance the world.Wood is currently looking for SiteManager to support our Projects business. Ma’aden is executing a strategic project to...Show moreLast updated: 30+ days ago
  • Promoted
Safety Engineer -Saudi Arabia

Safety Engineer -Saudi Arabia

Hudson Manpowerرأس الخير, Eastern Province, Saudi Arabia
Now Hiring : Safety / EHS Engineers for Saudi Arabia – 50 Openings!.Are you an experienced EHS professional ready to make an impact on large-scale transmission and substation projects in KSA?.We’re ...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability / Gitops Engineer

Site Reliability / Gitops Engineer

CanonicalAl Jubayl, Saudi Arabia
Site Reliability / Gitops Engineer.Canonical 1 day ago Be among the first 25 applicants Join to apply for the.Site Reliability / Gitops Engineer. Canonical Get AI-powered advice on this job and more...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

CanonicalAl Jubayl, Saudi Arabia
Be among the first 25 applicants Canonical.Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT.Our cu...Show moreLast updated: 30+ days ago
  • Promoted
Operation Readiness Engineer - Ras Al-Khair Site

Operation Readiness Engineer - Ras Al-Khair Site

Worleyرأس الخير, Eastern Province, Saudi Arabia
Commissioning Engineer - Ras Al-Khair Site.Join or sign in to find your next job.Commissioning Engineer - Ras Al-Khair Site. Commissioning Engineer - Ras Al-Khair Site.Commissioning Engineer - Ras A...Show moreLast updated: 30+ days ago