Senior Solutions Architect, Cloud Infrastructure and DevOps - NVIS
Join to apply for the Senior Solutions Architect, Cloud Infrastructure and DevOps - NVIS role at NVIDIA. What You'll Be Doing
Maintain large scale HPC / AI clusters with monitoring, logging and alerting. Manage Linux job / workload schedulers and orchestration tools. Develop and maintain continuous integration and delivery pipelines Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources. Deploy monitoring solutions for the servers, network and storage. Perform troubleshooting bottom up from bare metal, operating system, software stack and application level. Being a technical resource, develop, redefine and document standard methodologies to share with internal teams. Support Research & Development activities and engage in POCs / POVs for future improvements. What We Need To See
BS / MS / PhD or equivalent experience in Computer Science, Electrical / Computer Engineering, Physics, Mathematics, or related fields. At least 8 years of professional experience in networking fundamentals, TCP / IP stack, and data center architecture. Knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and supporting software. Extensive knowledge and hands-on experience with Kubernetes, including container orchestration for AI / ML workloads, resource scheduling, scaling, and integration with HPC environments. Experience in managing and installing HPC clusters, including deployment, optimization, and troubleshooting. Excellent knowledge of Linux systems (Redhat / CentOS and Ubuntu), including internals, ACLs, OS-level security protections, and common protocols like TCP, DHCP, DNS, etc. Experience with multiple storage solutions, including Lustre, GPFS, ZFS, and XFS. Familiarity with newer and emerging storage technologies is a plus. Proficiency in Python programming and bash scripting. Comfortable with automation and configuration management tools, including Jenkins, Ansible, Puppet / Chef, etc. Ways To Stand Out From The Crowd
Knowledge of CI / CD pipelines for software deployment and automation. Knowledge of Kubernetes, container related microservice technologies. Experience with GPU-focused hardware / software (DGX, CUDA). Background with RDMA (InfiniBand or RoCE) fabrics. NVIDIA is at the forefront of breakthroughs in Artificial Intelligence, High-Performance Computing, and Visualization. Our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. We offer highly competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. As an equal opportunity employer, we are committed to fostering a supportive and empowering workplace for all. JR Seniority level
Mid-Senior level Employment type
Full-time Job function
Engineering and Information Technology Industries : Computer Hardware Manufacturing, Software Development, and Computers and Electronics Manufacturing
#J-18808-Ljbffr
Solution Architect • Riyadh, Saudi Arabia