



Integrating multimodal inputs such as vision, touch, and language to predict action outputs
Enabling a complete closed-loop capability that directly maps multimodal perceptual inputs to robotic control actions
Enhancing the reasoning and task generalization abilities of robots in complex scenarios



Multimodal end-to-end model empowers robots with multi-dimensional perception and precise control, realize autonomous execution of complex physical interaction tasks