New Humanoid Robot Demonstrates Mastery of Task Chaining and Approaches Autonomous Function

1X, a cutting-edge robotics company with backing from OpenAI, is making strides in its mission to revolutionize physical labor through the use of intelligent androids. Recently, their humanoid robot Eve showcased its ability to autonomously complete a series of tasks, marking a pivotal moment in 1X’s quest to develop a sophisticated AI system. This system allows for chaining together simple tasks into complex actions using voice commands, enabling seamless control of multiple robots and remote operation.

Embodying the concept of Embodied Learning, the androids created by 1X integrate AI software directly into their physical structures, enhancing their overall capabilities. While earlier demonstrations focused on the robots’ capability to manipulate basic objects, the team underscores the significance of mastering task chaining to transform them into efficient service robots.

Researchers at 1X encountered challenges in consolidating multiple tasks into a single neural network model. Smaller multi-task models with less than 100M parameters experienced a forgetting problem where improving one task’s performance had a detrimental effect on others. While increasing model parameters could address this issue, it also led to extended training times, impeding progress.

To tackle this issue, 1X developed a voice-controlled natural language interface to chain short-horizon capabilities across various small models into longer ones. By involving human input in skill chaining, long-horizon behaviors can be achieved efficiently.

Eric Jang, the vice president of AI at 1X Technologies, explained, “To accomplish this, we’ve built a voice-controlled natural language interface to chain short-horizon capabilities across multiple small models into longer ones. With humans directing the skill chaining, this allows us to accomplish the long-horizon behaviors.”

The complexity of chaining autonomous robot skills lies in adapting subsequent skills to variations caused by preceding actions. This complexity compounds with each successive skill, necessitating solutions to address sequential variations.

By utilizing a high-level language interface, the user experience is greatly improved, allowing operators to effortlessly control multiple robots. This innovative approach streamlines data collection and evaluation by enabling operators to compare predictions of new models with existing baselines during testing.

Jang further elaborated, “From the user perspective, the robot is capable of doing many natural language tasks, and the actual number of models controlling the robot is abstracted away. This allows us to merge the single-task models into goal-conditioned models over time.”

Once the goal-conditioned model aligns well with the predictions of single-task models, researchers can seamlessly transition to a unified, more robust model. This method enhances efficiency and simplifies the user workflow.

By using this high-level language interface to direct robots, a unique user experience for data collection is provided. Jang noted, “Instead of using VR to control a single robot, an operator can direct multiple robots with high-level language and let the low-level policies execute low-level actions to realize those high-level goals.”