Talent.com
Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer (SRE)

SallaAbha, Saudi Arabia
20 منذ أيام
الوصف الوظيفي

We are looking for a Senior Site Reliability Engineer (SRE) to help design, scale, and secure our rapidly growing platform infrastructure. You will work across all critical systems — from customer-facing applications and APIs to internal platforms and data services — ensuring availability, performance, and cost efficiency at scale. You'll be hands‑on with Kubernetes, observability, GitOps, automation, and cloud infrastructure, while partnering closely with application, platform, and data teams to deliver a highly reliable and self‑healing environment. This role is ideal for an engineer who thrives on complex distributed systems, loves to automate everything, and can balance speed, stability, and cost‑efficiency in production.

Qualifications

Bachelor's degree in Computer Science, Engineering, or a related field — or equivalent work experience.

Design, deploy, monitor, and maintain production workloads across Kubernetes (EKS / AKS / GKE) clusters.

Build self‑healing, auto‑scaling systems that minimize manual intervention and ensure uptime.

Design and operate reliable database and storage platforms (SQL, NoSQL, and object stores) within Kubernetes environments.

Implement backup, disaster recovery, replication, and failover strategies to meet RPO / RTO targets.

Troubleshoot and recover Kubernetes Persistent Volumes (StorageClasses, CSI drivers, PVC issues).

Optimize storage performance and cost through multi‑tier strategies, hot / cold data separation, and S3 / offloading lifecycle policies.

Secure and scale object storage platforms (e.g., MinIO / S3‑compatible) for high‑throughput data pipelines.

Manage block storage (EBS / io2 / gp3) and shared file systems (EFS, NFS) for resilience and cost balance.

Collaborate with teams to optimize networking, ingress / egress traffic, and service mesh for secure communication.

Platform & Infrastructure Reliability

Design, deploy, monitor, and maintain production workloads across Kubernetes (EKS / AKS / GKE) clusters

Build self‑healing, auto‑scaling systems that minimize toil and manual intervention

Optimize networking, ingress / egress traffic control, and service mesh for secure & performant communication

Design and operate reliable database and storage platforms (SQL, NoSQL, and object stores) in Kubernetes environments

Own backup, disaster recovery, replication, and failover strategies to meet RPO / RTO targets for critical data services

Optimize storage performance and cost through multi‑tier strategies, hot / cold data separation, and S3 / offloading lifecycle policies

Troubleshoot and recover Kubernetes Persistent Volumes confidently during incidents (StorageClasses, CSI drivers, PVC issues)

Secure and scale object storage platforms (e.g., MinIO / S3‑compatible) and integrate with workloads for high‑throughput data pipelines

Work with block storage (EBS / io2 / gp3) and shared file systems (EFS, NFS) to balance performance, resiliency, and cost

Automation & Delivery

Champion GitOps and CI / CD best practices (ArgoCD, Flux, GitHub Actions). Build automation for infrastructure provisioning and upgrades using Terraform, Helm, and Kubernetes Operators

Reduce release risk through progressive delivery strategies (blue / green, canary, spot instance rolling updates)

Observability & Incident Response

Own the monitoring and alerting stack (Prometheus, Grafana, Loki, VictoriaMetrics, OpenSearch)

Lead incident management and postmortems to prevent recurrence

Provide real-time visibility into system health, performance, and cost metrics

Security & Compliance

Implement least‑privilege IAM policies, secure service‑to‑service communication, and network ACLs / firewalls

Enforce Kubernetes RBAC, secret management, and secure image supply chain

Participate in audit readiness and compliance efforts

Performance & Cost Optimization

Analyze and tune system performance under scale (CPU / memory / IO)

Partner with product and platform teams to right‑size clusters, databases, and storage tiers

Introduce cost visibility dashboards for engineering leadership.

Preferred Qualifications

Experience managing mission‑critical systems at scale (high traffic, multi‑region)

Proven cost optimization in cloud / K8s environments

Familiarity with service mesh (Istio, Linkerd) or advanced networking / egress control

Experience with data platform components (Airflow, Debezium, ClickHouse, etc.) is a plus but not required

Strong communication skills and teamworker — able to collaborate across engineering, DevOps, security, and product teams.

Requirements

8+ years in SRE / DevOps / Infrastructure Engineering roles

Deep Kubernetes expertise (multi‑cluster, Helm chart development, advanced networking)

Strong GitOps workflows using ArgoCD / Flux

Expertise with AWS (preferred) or Azure / GCP, plus Infrastructure‑as‑Code (Terraform, Pulumi, CloudFormation)

Advanced knowledge of SQL & NoSQL databases (MySQL / Aurora, PostgreSQL, MongoDB, Redis)

Scripting / automation skills in Python, Bash, or Go

Solid background in monitoring / observability (Prometheus, Grafana, Loki, ELK / Opensearch, VictoriaMetrics)

Experience with CI / CD at scale and managing production incidents

Experience with streaming / messaging (Kafka, RabbitMQ, or similar)

Benefits

Comprehensive Training & Development programs

Performance‑based Bonus incentives

Flexible Work From Home options

#J-18808-Ljbffr

إنشاء تنبيه وظيفي لهذا البحث

Senior Site Engineer • Abha, Saudi Arabia

الوظائف ذات الصلة
  • عَرْضٌ مُرَوَّجٌ له
Site Reliability / Gitops Engineer

Site Reliability / Gitops Engineer

CanonicalAbha, Saudi Arabia
Site Reliability / Gitops Engineer.Canonical 1 day ago Be among the first 25 applicants Join to apply for the.Site Reliability / Gitops Engineer. Canonical Get AI-powered advice on this job and more...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Site Reliability Engineer

Site Reliability Engineer

CanonicalKhamis Mushait, Saudi Arabia
Be among the first 25 applicants Canonical.Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT.Our cu...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
PLG Sr Backend Engineer / Lead (Remote)

PLG Sr Backend Engineer / Lead (Remote)

LucidyaAbha, Saudi Arabia
About Lucidya Lucidya empowers brands to unlock the power of customer intelligence across the Middle East and beyond.Joining this team means working on. Lucidya’s global expansion and PLG success.Ba...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Senior Web Engineer

Senior Web Engineer

CanonicalKhamis Mushait, Saudi Arabia
Canonical 3 days ago Be among the first 25 applicants Join to apply for the.Canonical Get AI-powered advice on this job and more exclusive features. Canonical is a leading provider of open-source so...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Technical Service Engineer (KSA)

Technical Service Engineer (KSA)

Sungrow MENA & Central AsiaKhamis Mushait, Saudi Arabia
Technical Service Engineer (KSA).Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features. Your Mission Hello there and welcome to your new challenge here at Su...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Senior / Staff / Principal Engineer

Senior / Staff / Principal Engineer

CanonicalAbha, Saudi Arabia
Senior / Staff / Principal Engineer.Canonical 3 days ago Be among the first 25 applicants Join to apply for the.Senior / Staff / Principal Engineer. Canonical Canonical is a leading provider of open source ...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Senior Site Reliability / Gitops Engineer

Senior Site Reliability / Gitops Engineer

CanonicalKhamis Mushait, Saudi Arabia
Senior Site Reliability / Gitops Engineer.Senior Site Reliability / Gitops Engineer.Canonical Senior Site Reliability / Gitops Engineer. Be among the first 25 applicants Join to apply for the.Senior...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Software Engineer, Ceph & Distributed Storage

Software Engineer, Ceph & Distributed Storage

CanonicalAbha, Saudi Arabia
Software Engineer, Ceph & Distributed Storage — Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets.Our platform,...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Sr. Site Reliability Engineer (SRE)

Sr. Site Reliability Engineer (SRE)

SiFiAbha, Saudi Arabia
Site Reliability Engineer (SRE).SiFi is a rapidly growing B2B Fin-Tech company transforming expense management for businesses in Saudi Arabia. As a licensed EMI from the Saudi Central Bank, we empow...أظهر المزيدآخر تحديث: 12 منذ أيام
  • عَرْضٌ مُرَوَّجٌ له
CloudOps / SysOps Engineer - Remote

CloudOps / SysOps Engineer - Remote

Info Resume EdgeKhamis Mushait, Saudi Arabia
Description The role focuses on managing and monitoring Azure cloud infrastructure and on‑premises systems to ensure high availability, security, and cost efficiency. Key Responsibilities Ongoing du...أظهر المزيدآخر تحديث: 21 منذ أيام
  • عَرْضٌ مُرَوَّجٌ له
Engineering Manager - Build and Release Infrastructure

Engineering Manager - Build and Release Infrastructure

CanonicalKhamis Mushait, Saudi Arabia
Engineering Manager - Build and Release Infrastructure.Engineering Manager - Build and Release Infrastructure.Canonical Engineering Manager - Build and Release Infrastructure.Be among the first 25 ...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Permit Receiver

Permit Receiver

ADCAbha, Saudi Arabia
Overview Specialism Administration / Secretarial / Office Support.Responsibilities The Permit Receiver serves as the critical control point for authorizing and monitoring high-risk work activities ...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Senior Site Reliability Engineer

Senior Site Reliability Engineer

CanonicalAbha, Saudi Arabia
Senior Site Reliability Engineer.Globally remote role Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets.Our platform, Ubu...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Site Reliability Engineering Manager

Site Reliability Engineering Manager

CanonicalKhamis Mushait, Saudi Arabia
We are hiring a Site Reliability Engineering Manager aspiring for a world-class DevOps and GitOps engineering management challenge, bringing together operations management, software engineering and...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Engineering Manager- Ceph & Distributed Storage

Engineering Manager- Ceph & Distributed Storage

CanonicalAbha, Saudi Arabia
Engineering Manager- Ceph & Distributed Storage.Canonical 3 days ago Be among the first 25 applicants Join to apply for the. Engineering Manager- Ceph & Distributed Storage.Canonical Canonical is a ...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Community Engineer (multiple roles and seniority levels)

Community Engineer (multiple roles and seniority levels)

CanonicalAbha, Saudi Arabia
Community Engineer (multiple roles and seniority levels).Canonical is building community management at scale with multiple positions available at different seniority levels.All applications and can...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا
  • عَرْضٌ مُرَوَّجٌ له
Senior Projects Risk Engineer (E2)

Senior Projects Risk Engineer (E2)

Hill InternationalAbha, Aseer, Saudi Arabia
Senior Projects Risk Engineer (E2).About the job Senior Projects Risk Engineer (E2).General Description of Role and Responsibilities : . Has a strong background in pre-construction management includin...أظهر المزيدآخر تحديث: 24 منذ أيام
  • عَرْضٌ مُرَوَّجٌ له
PLG Sr Backend Engineer / Lead (Remote)

PLG Sr Backend Engineer / Lead (Remote)

Lucidya LLC.Khamis Mushait, Saudi Arabia
Lucidya empowers brands to unlock the power of customer intelligence across the Middle East and beyond.Joining this team means working on. Lucidya’s global expansion and PLG success.Backend Engineer...أظهر المزيدآخر تحديث: منذ أكثر من 30 يومًا