



Integrating visual, tactile, linguistic, and other multimodal information inputs to predict action output capabilities
Realize the complete closed-loop capability of directly mapping multimodal perception input to robot control actions
Improve the reasoning and task generalization ability of robots in complex scenes
Multi modal end-to-end model empowers robots with multi-dimensional perception and precise control, realize autonomous execution of complex physical interaction tasks