Overview
We are seeking a skilled Platform Engineer – AI / MLOps to join our Platform Team. The successful candidate will focus on deploying, scaling, and supporting AI / MLOps workloads in production environments. You will collaborate closely with AI engineers and data scientists to ensure secure, reliable, and cost-effective AI deployments on Kubernetes clusters (GKE, OpenShift).
Responsibilities
- Deploy, manage, and optimize AI / MLOps workloads (training, inference, pipelines) on Kubernetes (GKE, OpenShift).
- Build and maintain Helm charts, Kubernetes manifests, and GitOps pipelines (ArgoCD) for AI services.
- Optimize GPU-enabled workloads for performance, scalability, and cost-efficiency.
- Implement and support CI / CD and MLOps workflows to automate model deployment, versioning, and monitoring.
- Ensure observability, security, and compliance of deployed AI workloads.
- Collaborate with AI engineers and data scientists to translate research and business requirements into production-ready deployments.
- Troubleshoot and resolve production issues to maintain high availability and reliability.
Qualifications
5 Years of Experience2+ years of solid hands-on experience with Kubernetes (GKE, OpenShift) and containerized workloads.Strong knowledge of MLOps practices (model serving, pipelines, monitoring, model lifecycle management).Experience with Helm, GitOps (ArgoCD), and CI / CD pipelines .Familiarity with GPU workloads and cloud platforms (Google Cloud preferred, AWS / Azure a plus).Solid understanding of cloud-native observability and security practices .Strong problem-solving skills and ability to work in a cross-functional team environment.Seniority level
AssociateEmployment type
Full-timeJob function
Information Technology#J-18808-Ljbffr