Figure Unveils Helix, a Breakthrough AI Brain for Humanoid Robots
Figure, a California-based robotics company, has introduced Helix, a Vision-Language-Action (VLA) model designed to enhance humanoid robots' perception, language understanding, and control.

Founder Brett Adcock described Helix as the most significant AI advancement in the company’s history.
"Helix thinks like a human… and to bring robots into homes, we need a step change in capabilities. Helix can generalise to virtually any household item," Adcock said in a social media post.
The launch follows Figure’s decision to separate from OpenAI in February. At that time, Adcock stated that Figure had achieved a breakthrough in fully end-to-end robot AI, developed entirely in-house.
Advanced Capabilities of Helix
Helix introduces a new approach to upper-body manipulation, offering high-rate continuous control of the wrists, torso, head, and fingers. This allows for more precise movements and interactions.
The model also enables multi-robot collaboration, allowing two robots to work together on shared tasks involving unfamiliar objects. This expands the potential applications of humanoid robots in complex environments.
Helix-equipped robots can pick up a wide range of small household items using natural language prompts, improving ease of use. The model employs a single set of neural network weights to learn various behaviours, such as handling drawers and refrigerators, without task-specific fine-tuning.
Additionally, Helix operates on embedded low-power GPUs, making it suitable for commercial deployment.
Helix’s AI Architecture
Figure designed Helix to address challenges in robotic adaptability. Traditional robotic systems require extensive programming or demonstrations to learn new tasks. Helix overcomes this by leveraging Vision Language Models (VLMs) to generalise behaviours and execute tasks through natural language instructions.
Helix consists of two systems: System 1 (S1) and System 2 (S2). S2 is a slower, internet-pre-trained VLM that focuses on scene understanding and language comprehension. S1 is a fast visuomotor policy that translates S2’s information into real-time robot actions.
This separation allows S2 to process information thoughtfully while S1 executes actions quickly. It also enables independent improvements to each system without requiring shared observation or action spaces.
To train Helix, Figure collected around 500 hours of teleoperated behaviours, using an auto-labelling VLM to generate natural language instructions. The architecture includes a 7-billion-parameter VLM and an 80-million-parameter transformer for control, processing visual inputs to enable responsive actions.
Figure has launched Helix, a Vision-Language-Action model for humanoid robots.
Helix enables precise upper-body control and multi-robot collaboration.
The model allows robots to pick up household items using natural language prompts.
Source: INTERESTING ENGINEERING