WR
Posted 4 days ago
Senior Robot Learning Engineer
Wave Recruitment
📍 Bristol
Engineering
Job description
<p>This robot learning role is with a seriously exciting scale up. The platform is mature, the data is flowing, and the team is ready to scale its most promising research directions into production-grade manipulation policies.</p><p><br><br>Scroll down to find an indepth overview of this job, and what is expected of candidates Make an application by clicking on the Apply button.<br></p><p>They need someone to lead the development and deployment of large behaviour models, taking diffusion transformers, VLAs, and language-conditioned policies from the literature onto a real bi-manual humanoid.</p><p><br></p><p>This is not a research-only role. You'll inherit a mature policy training codebase, a VR teleoperation pipeline producing high-frequency multi-modal data, and a Gymnasium environment wrapping a real robot. The work you ship runs on hardware.</p><p><br></p><p><strong>The Role</strong></p><p><br></p><p>You will architect, train, and deploy end-to-end large behaviour models for bi-manual and mobile manipulation, and lead the maturing of the early-stage RL pipeline.</p><p><br></p><p>The key responsibilities</p><p><br></p><ul><li>Architect, train, and evaluate end-to-end large behaviour models for bi-manual and mobile manipulation</li><li>Advance diffusion transformer policies, mature VLA integration, and develop language conditioning for true multi-task generalisation</li><li>Apply RL to refine pre-trained policies: RL token fine-tuning, residual RL, off-policy RL with reference-action regularisation, RL-based fine-tuning of diffusion policies</li><li>Build a systematic sim-to-real transfer pipeline, connecting existing simulation infrastructure to training</li><li>Deploy and iterate learned policies on physical robot hardware</li><li>Mentor junior researchers and engineers, and publish at top-tier venues</li></ul><p><br></p><p>What We're Looking For</p><p><br></p><p><strong>Essential:</strong></p><p><br></p><ul><li>PhD/MSc in ML, Robotics, CS, or related field with 4+ years of equivalent industry research experience</li><li>Demonstrated expertise training and deploying learned manipulation policies on real robots</li><li>Strong background in at least two of: behaviour cloning, diffusion policies, VLA/VLM architectures, RL for manipulation</li><li>PyTorch and large-scale (multi-GPU, distributed) training</li><li>Track record of publications at top-tier venues (CoRL, RSS, ICRA, NeurIPS, ICML, ICLR), or equivalent demonstrated research impact through deployed systems, patents, or significant open-source contributions</li><li>Strong Python; production-quality research code with proper testing, type hints, and documentation</li></ul><p><br></p><p><strong>Useful:</strong></p><p><br></p><ul><li>Hands-on experience with humanoid or bi-manual manipulation platforms</li><li>Diffusion transformer, ACT, or VLA architectures specifically</li><li>Pre-trained vision/language models for robot control (CLIP, DINOv2, PaliGemma)</li><li>MuJoCo, Isaac Sim, or ManiSkill for sim-to-real policy training</li><li>RL fine-tuning of pre-trained policies (residual RL, DPPO, or similar)</li><li>3D perception for policy conditioning (point clouds, keypoints, NeRFs)</li></ul><p><br></p><p>Key contribution areas</p><p><br></p><p><strong>Policy Architecture & Training</strong></p><p><br></p><ul><li>End-to-end large behaviour models for bi-manual and mobile manipulation</li><li>Scale and evolve diffusion transformer policies, VLA integration, and language conditioning</li><li>Extend the imitation learning pipeline to leverage growing teleoperation datasets</li><li>Apply RL to push beyond what imitation alone can reach</li><li>Target sub-millimetre precision and contact-rich manipulation</li></ul><p><br></p><p><strong>Generalisation & Scaling</strong></p><p><br></p><ul><li>Develop policies that generalise across tasks, object categories, and environments</li><li>Move from single-task to multi-task and task-conditioned architectures</li><li>Design hierarchical behaviour systems for long-horizon manipulation</li><li>Investigate data-efficient learning: few-shot adaptation, transfer learning, multi-dataset training xwzovoh </li><li>Drive systematic ablations across architectures</li></ul><p><br></p><p><strong>Sim-to-Real & Deployment</strong></p><p><br></p><ul><li>Build the sim-to-real transfer pipeline: domain randomisation, rendering augmentation, sim-to-real benchmarking</li><li>Deploy and iterate learned policies on physical robot hardware</li><li>Extend the Gymnasium environment wrapper and integrate with the robot's control stack</li><li>Leverage perception team outputs (keypoints, learned features, 3D point clouds) for policy conditioning</li></ul><p><br></p><p><strong>Research Leadership</strong></p><p><br></p><ul><li>Track the literature and bring relevant advances back to the team</li><li>Identify and propose new research directions aligned with the manipulation roadmap</li><li>Mentor junior researchers and engineers</li><li>Publish at top-tier venues — conference attendance and open-source contributions are actively supported</li></ul><p><br></p><p>What's On Offer</p><p><br></p><ul><li>Join a team with world class applied research scientists, ML engineers, and robotics software engineers</li><li>A mature platform that ships to physical hardware, not slides</li><li>Active support for conference attendance and open-source contributions</li><li>Competitive compensation</li></ul><p><br></p><p>Apply or send your CV to — </p>