FIG.AIFigure 03+1.5B.SeriesCTSLA.BOTOptimus+8k.Units.Q1BOS.DYNAtlas+New.CEO.2026AGIL.ROBDigit+AMZN.ScaleNEURA4NE-1+$1B.SeriesDAPP.TRONApollo+$520M.ExtAUNIT.REEG1+3k.Ships.25SUNDAYHomeBot+$1.15B.ValGALBOTG1+RMB2.5B.SerBSANC.AIPhoenix+$90M.SeriesD1X.TECHNEO+$125M.SerCMIND.ROBStealth+Founded.2026FUND.YTD2026$5.8B.Raised

Research Hub

Key academic papers shaping the development of humanoid robots — locomotion, manipulation, sim-to-real transfer, VLA models, and tactile sensing.

VLA ModelsMar 12, 2026

Ψ₀: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Songlin Wei, Hongyi Jing, Boqian Li et al. · University of Southern California / Shanghai AI Lab

A staged training approach that sidesteps the pitfalls of directly mixing human and robot data. Ψ₀ first pre-trains on 800 hours of egocentric human manipulation video, then post-trains a flow-based action expert on just 30 hours of humanoid robot data. The complete ecosystem — training pipelines, model weights, and inference engines — is fully open-sourced.

Key Finding:Outperforms baselines trained on 10× more data by >40% task success rate. Staged human-to-robot transfer is dramatically more data-efficient than joint training.
Read paper on arXiv →
ManipulationMar 12, 2026

HumDex: Humanoid Dexterous Manipulation Made Easy

Liang Heng, Yihe Tang, Jiajun Xu et al. · University of Southern California

A portable teleoperation framework for dexterous humanoid manipulation using IMU-based motion tracking. Introduces a learning-based hand control retargeting method and a two-phase training approach: pre-training on human motion data then fine-tuning on robot data to bridge the embodiment gap. Full system is open-sourced.

Key Finding:Achieves improved generalization to new object configurations with minimal data collection, reducing hardware cost and complexity for dexterous manipulation data pipelines.
Read paper on arXiv →
VLA ModelsMar 10, 2026

ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

Haoran Yang, Jiacheng Bao, Yucheng Xin et al. · Shanghai AI Lab / Northwestern Polytechnical University

ZeroWBC eliminates the need for robot teleoperation data by fine-tuning a Vision-Language Model to predict human motions from egocentric video and text instructions. A tracking policy adapts predicted motions to the robot's joints for whole-body control. Tested on the Unitree G1 humanoid across diverse motion categories including sitting and kicking.

Key Finding:Achieves natural whole-body visuomotor control with broader behavioral versatility than prior methods — without any robot-collected demonstrations.
Read paper on arXiv →
VLA ModelsMar 5, 2026

PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching

Weikai Qin, Sichen Wu, Ci Chen et al. · Shanghai Jiao Tong University

PhysiFlow proposes a "multi-brain" VLA framework that combines semantic understanding with physics-aware whole-body coordination. It uses latent flow matching to bridge high-level vision-language intent with low-level motor execution, improving inference efficiency while maintaining physical plausibility for full-body humanoid coordination.

Key Finding:Multi-brain latent flow matching outperforms single-branch VLA baselines on whole-body manipulation benchmarks with improved physical consistency.
Read paper on arXiv →
LocomotionMar 3, 2026

ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation

Xialin He, Sirui Xu, Xinyao Li et al. · University of Illinois Urbana-Champaign

ULTRA presents a unified multimodal controller for humanoid whole-body loco-manipulation handling varied inputs — from motion-capture data to imperfect egocentric vision. A physics-driven neural retargeting algorithm compresses skills into latent representations, enabling autonomous goal-directed execution without reference motions at test time. Evaluated on Unitree G1.

Key Finding:Outperforms tracking-only baselines in autonomous whole-body manipulation and demonstrates robust execution across input modality variations.
Read paper on arXiv →
Sim-to-RealMar 11, 2026

SPARK: Skeleton-Parameter Aligned Retargeting on Humanoid Robots with Kinodynamic Trajectory Optimization

Hanwen Wang, Qiayuan Liao, Bike Zhang et al. · UC Berkeley

A two-stage pipeline (accepted ICRA 2026) for converting human motion-capture data into physically feasible humanoid reference trajectories. Human motion is first aligned to the target robot's skeletal parameters, then three-stage kinodynamic trajectory optimization produces dynamically consistent motion references that generalize across different humanoid platforms.

Key Finding:Generalizes across multiple humanoid platforms and enables RL policies to train on physically consistent MoCap-derived references without per-robot manual tuning.
Read paper on arXiv →
ManipulationFeb 6, 2026

HuMI: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations

Ruiqian Nai, Boyuan Zheng, Junming Zhao et al. · Peking University / Beijing Institute of General Artificial Intelligence

HuMI enables learning diverse humanoid whole-body manipulation tasks without any physical robot during data collection. A portable wearable captures full-body human motion, feeding a hierarchical learning pipeline that translates human motions into dexterous humanoid skills. Tested across five tasks: kneeling, squatting, tossing, walking, and bimanual manipulation.

Key Finding:3× data collection efficiency vs. teleoperation and 70% success in unseen environments, demonstrating strong sim-to-real generalization.
Read paper on arXiv →
LocomotionFeb 3, 2026

RPL: Learning Robust Humanoid Perceptive Locomotion on Challenging Terrains

Yuanhang Zhang, Younggyo Seo, Juyue Chen et al. · UC Berkeley

A two-stage training framework for multi-directional humanoid locomotion on complex terrain. Stage one trains terrain-specific expert policies using privileged height map observations; stage two distills these into a single transformer policy driven by multiple depth cameras. A custom simulation tool achieves 5× faster depth rendering than prior alternatives.

Key Finding:Deployed on real hardware carrying 2 kg payloads, successfully traversing 20° slopes, variable-step staircases, and stepping stones separated by 60 cm gaps.
Read paper on arXiv →
TactileFeb 1, 2026

UniForce: A Unified Latent Force Model for Robot Manipulation with Diverse Tactile Sensors

Zhuo Chen, Fei Ni, Kaiyao Luo et al. · King's College London / University of Bristol

UniForce addresses the tactile sensor heterogeneity problem by learning a shared latent force representation across diverse sensor types (GelSight, TacTip, uSkin). It jointly models inverse and forward dynamics, constrained by force equilibrium and image reconstruction losses. The universal encoder enables zero-shot cross-sensor transfer for force-aware manipulation.

Key Finding:Consistent improvements in force estimation over prior methods across all three sensor types, with zero-shot transfer of manipulation policies between heterogeneous tactile sensors.
Read paper on arXiv →
ManipulationJan 4, 2026

DemoBot: Efficient Learning of Bimanual Manipulation from a Single Human Video

Yucheng Xu, Xiaofeng Mao, Elle Miller et al. · Beijing Institute of Technology / University of Edinburgh

DemoBot enables a dual-arm, multi-finger robot to learn complex bimanual manipulation from a single unannotated RGB-D video demonstration. Structured motion trajectories are extracted from video, then a three-innovation RL pipeline — temporal-segment RL, success-gated reset, and event-driven reward curriculum — refines those motions through contact-rich simulation before real deployment.

Key Finding:Successfully tackles long-horizon bimanual assembly tasks from just one video demonstration, with the RL curriculum showing significantly better convergence than direct policy learning.
Read paper on arXiv →
LocomotionFeb 5, 2026

Scalable and General Whole-Body Control for Cross-Humanoid Locomotion (XHugWBC)

Yufei Xue, Yunfeng Lin, Wentao Dong et al. · Shanghai AI Lab / Shanghai Jiao Tong University

XHugWBC trains a single policy that generalizes whole-body locomotion and manipulation across diverse humanoid hardware without robot-specific retraining. Key innovations include physics-consistent morphological randomization and semantically aligned observation/action spaces. Validated across 12 simulated and 7 real-world humanoid platforms.

Key Finding:100% zero-shot success rate across 7 real humanoid platforms despite large hardware variation. Accepted to ICML 2026.
Read paper on arXiv →
VLA ModelsDec 11, 2025

WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control

Haoran Jiang, Jin Chen, Qingwen Bu et al. · OpenDriveLab / Shanghai AI Lab / AgiBot

A unified latent VLA framework for simultaneous locomotion and manipulation. The model learns from action-free egocentric video paired with a loco-manipulation RL policy, dramatically reducing training data cost. Validated on the AgiBot X2 humanoid on tasks requiring navigation + bimanual manipulation across large spaces. Accepted to ICLR 2026.

Key Finding:Outperforms prior baselines by 21.3%, demonstrating pushing loads >50 kg, bimanual grasping with navigation, and multi-step placement tasks autonomously in sequence.
Read paper on arXiv →
ManipulationMay 5, 2025

TWIST: Teleoperated Whole-Body Imitation System

Yanjie Ze, Zixuan Chen, João Pedro Araújo et al. · Stanford University / Simon Fraser University

TWIST retargets human motion-capture data to a humanoid to generate reference clips, then trains a single unified whole-body controller combining RL and behavior cloning. One network handles whole-body manipulation, legged manipulation, locomotion, and expressive movement. Fully open-sourced including datasets, training code, and checkpoints.

Key Finding:A single controller achieves unprecedented coordinated whole-body motor skills spanning locomotion and manipulation without task-specific sub-controllers.
Read paper on arXiv →
VLA ModelsMar 18, 2025

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Johan Bjorck, Fernando Castañeda, Nikita Cherniadev et al. · NVIDIA Research

GR00T N1 is a 2.2B-parameter open foundation model built on a dual-system architecture — an Eagle-2 VLM for environmental understanding and a diffusion transformer for real-time motor generation. Trained on real-robot trajectories, human videos, and synthetic data. Fully open-sourced on GitHub and HuggingFace.

Key Finding:Outperforms SoTA imitation learning baselines and transfers zero-shot to real Fourier GR-1 for language-conditioned bimanual manipulation.
Read paper on arXiv →
Sim-to-RealFeb 27, 2025

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

Toru Lin, Kartik Sachdev, Linxi Fan et al. · UC Berkeley / NVIDIA / UT Austin

A practical sim-to-real RL recipe for training vision-based dexterous manipulation on humanoids with multi-fingered hands without relying on demonstrations. Components include automated real-to-sim tuning, contact-based reward formulation, divide-and-conquer policy distillation, and modality-specific augmentation to close the perceptual sim-to-real gap.

Key Finding:First successful sim-to-real RL transfer of vision-based dexterous manipulation to a humanoid, achieving high success on unseen objects. Published at CoRL 2025.
Read paper on arXiv →
VLA ModelsJun 13, 2024

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti et al. · Stanford University

OpenVLA is a 7B-parameter open-source VLA model trained on 970k robot demonstrations, achieving state-of-the-art performance on manipulation benchmarks. The open weights and training code established a community baseline for vision-language-action research across diverse robot platforms.

Key Finding:7B VLA models generalize to novel objects and environments with 16.5% improvement over prior SoTA, and the open-source release accelerated the entire VLA research community.
Read paper on arXiv →