Overview
Join to apply for the
Sr. Site Reliability Engineer (SRE)
role at
SiFi . This is a remote position. About SiFi :
SiFi is a rapidly growing B2B Fin-Tech company transforming expense management for businesses in Saudi Arabia. As a licensed EMI from the Saudi Central Bank, we empower companies with innovative tools to simplify finance management. Position Overview As a Senior Site Reliability Engineer (SRE), you will provide scalable, reliable, durable, and secure global database services for our clients’ cloud infrastructure hosted on AWS or GCP. Senior SRE Engineer will help in building highly reliable cloud services using a customer-first approach while innovating technically. You will understand our customers\' needs and how we can meet them. Primary Responsibilities
Identifies significant projects that result in substantial improvements in reliability, cost savings, and / or revenue. Identifies changes in the product architecture from the reliability, performance, and availability perspectives with a data-driven approach. Influences the product roadmap and works with engineering and product counterparts to influence improved resiliency and reliability of the Gitlab product. Proactively work on efficiency and capacity planning to set clear requirements and reduce the system resource usage to make GitLab cheaper to run for all our customers. Identify parts of the system that do not scale, provide immediate palliative measures, and drive long-term resolution of these incidents. Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives. Collaboration and Communication : Leads initiatives and problem definition and scoping, design, and planning through epics and blueprints. Deep domain knowledge and radiating that knowledge through recorded demos, technical presentations, discussions, and Incident Reviews. Perform and run blameless RCAs on incidents and outages aggressively, looking for answers that will prevent the incident from ever happening again. For stable counterpart assignments, maintain awareness and actively influence stage group plans and priorities through participation in stage group meetings and a sync discussions. Act as a champion for reliability. Influence and Maturity : Set an example for a team of SREs with positive and inclusive leadership and discussion on work. Show ownership of a significant part of the infrastructure. Trusted to de-escalate conflicts inside the team. Requirements
5+ years of related experience. Performs application-specific production support, incident management, problem management, RCAs, and service restoration as needed to quickly respond to and resolve production issues. Collaborating with engineering and development teams to evaluate and identify optimal cloud solutions. Plan and achieve high availability, performance, and availability of the product service. Development / coding experience and skills for writing custom automation solutions. Strong understanding of web hosting infrastructure and high availability architecture. Demonstrated knowledge of fundamental cloud security (e.g., Identity and Access Management, ACL, firewalls). Deep understanding of AWS cloud services and how to leverage them for computing, storage, and managed services including, but not limited to databases, managed Kubernetes, ECS, and Python / Django application services. Strong Experience in Infrastructure as Code (IaC) technologies like Terraform. Familiarity with Kubernetes-specific platform components, such as ingress controllers, cluster DNS, autoscalers, and others. Seniority level
Mid-Senior level Employment type
Full-time Job function
Engineering and Information Technology Get notified about new Senior Site Reliability Engineer jobs in
Riyadh, Saudi Arabia .
#J-18808-Ljbffr
Site Engineer • Abha, Saudi Arabia