H
Posted 2 days ago
Staff DevOps Engineer
Humanoid
📍 London
EngineeringHybrid
Job description
<p>Humanoid is the first AI and robotics company in the UK, creating the world’s most advanced, reliable, commercially scalable, and safe humanoid robots. Our first humanoid robot HMND 01 is a next-gen labour automation unit, providing highly efficient services across various use cases, starting with industrial applications.</p><p><br><br>Is this the role you are looking for If so read on for more details, and make sure to apply today.<br></p><p><strong>Our Mission</strong></p><p>At Humanoid we strive to create the world’s leading, commercially scalable, safe, and advanced humanoid robots that seamlessly integrate into daily life and amplify human capacity.</p><p><br></p><p>We are building large-scale compute infrastructure to train next-generation robotics models, including transformer-based systems like VLA. </p><p>As a Staff Engineer, you will lead the design and evolution of our multi-GPU, cross-cloud platforms, driving architecture, reliability, and performance at scale. This role sits at the intersection of DevOps, MLOps, and distributed systems, enabling cutting-edge AI in real-world environments.</p><p><br></p><p><strong>What You’ll Do: </strong></p><ul><li>Lead the design and evolution of scalable multi-GPU infrastructure across cloud environments (AWS, GCP, etc.)</li><li>Own architecture and long-term technical direction of model training platforms</li><li>Drive reliability, performance, and cost-efficiency at scale</li><li>Define and implement best practices for infrastructure, DevOps, and MLOps across the organization</li><li>Build and evolve infrastructure-as-code and automation for provisioning, orchestration, and lifecycle management</li><li>Architect and improve CI/CD systems for both infrastructure and ML training workflows</li><li>Optimize distributed training workloads (scheduling, resource utilization, observability)</li><li>Partner with ML engineers and researchers to enable efficient experimentation and productionization</li><li>Lead troubleshooting and resolution of complex system issues across distributed, GPU-heavy environments</li><li>Mentor engineers and raise the bar for engineering quality and operational excellence</li><li>Document architecture, systems, and key technical decisions</li></ul><p><br></p><p><br></p><p><strong>We’re Looking For:</strong></p><ul><li>7+ years of experience in DevOps, MLOps, or infrastructure engineering (Staff level)</li><li>Proven experience designing and operating <strong>multi-GPU / distributed compute infrastructure</strong></li><li><strong>Experience with GPU scheduling/orchestration (e.g., Kubernetes schedulers, Volcano, Ray, etc.)</strong></li><li>Strong experience with Kubernetes and containerized workloads at scale</li><li>Deep expertise in Infrastructure-as-Code (Terraform, Helm, or similar)</li><li>Deep familiarity with at least one major cloud provider (AWS preferred)</li><li>Strong experience building and scaling CI/CD systems (e.g., GitHub Actions, GitLab CI, ArgoCD)</li><li>Proficiency in Python for automation and tooling</li><li>Strong understanding of distributed systems, networking, and system reliability</li><li>Demonstrated ability to lead large xwzovoh technical initiatives and influence system design</li><li>Experience supporting ML workloads or training pipelines (PyTorch, TensorFlow, etc.)</li></ul><p>Nice to have: </p><ul><li>Experience with multi-cloud or hybrid cloud environments</li><li>Background in performance optimization for large-scale training workloads</li><li>Experience in robotics, simulation, or embodied AI systems</li></ul><p><br></p><p><br></p><p><strong>What we offer: </strong></p><ul><li>Competitive salary plus participation in our Stock Option Plan</li><li>Paid vacation with adjustments based on your location to comply with local labor laws</li><li>Travel opportunities to our Vancouver and Boston offices</li><li>Office perks: free breakfasts, lunches, snacks, and regular team events</li><li>Freedom to influence the product and own key initiatives</li><li>Collaboration with top‑tier engineers, researchers, and product experts in AI and robotics</li><li>Startup culture prioritising speed, transparency, and minimal bureaucracy</li></ul>