The Job Description Ensure System Uptime and Reliability :
Monitor and maintain cloud-based applications and infrastructure, ensuring minimal downtime and efficient incident response. Build and Optimize Monitoring and Alerting Systems :
Set up and continuously improve comprehensive monitoring and alerting frameworks to detect and address issues proactively. Cloud Infrastructure Management :
Manage, optimize, and scale systems on Azure cloud platforms, ensuring high performance and cost-effectiveness. Incident Management and Response :
Act as the first line of defense in identifying, diagnosing, and resolving technical issues in real-time or escalate them to the appropriate teams. Tooling and Observability :
Leverage technologies such as Grafana for observability and Argo for CI / CD automation, enhancing our ability to respond swiftly and effectively to infrastructure needs. Collaboration :
Work closely with cross-functional teams to align on SRE best practices, share insights, and support development and operational goals. Language Requirements :
Fluent spoken and written Arabic / English. Tagged as :
AWS ,
Azure ,
GCP
#J-18808-Ljbffr
Engineer • Riyadh, Saudi Arabia