The robot dog trained on the yoga ball is more flexible than most exercisers

2024-05-08 12:05:00

The quadruped robot staggering along and trying to balance on a fitness ball is an interesting experiment, but at its core, it proves that artificial intelligence like GPT-4 can train robots to perform complex, real-world tasks more efficiently than we humans.

The robot dog trained on the yoga ball is more flexible than most exercisers

DrEureka is an open-source software package accessible to anyone for training bots to perform real-world tasks using large language models (LLMs) such as ChatGPT 4. It's an "simulation-to-reality" system, that is, it uses simulated physics to teach the robot in a virtual environment before implementing it in real space.

Dr. Jim Fan, one of the developers of DrEreka, made headlines when he deployed the Unitree Go1 quadruped robot. It's a "low-cost", well-supported, open-source bot – which is convenient because even with AI, robot pets are still vulnerable to falls. As for the "low cost," it sells for $5899 on Amazon and has a 1-star rating......

The "Dr" in DrEureka stands for "Domain Randomization", i.e., randomizing variables such as friction, mass, damping, center of gravity, etc., in a simulated environment.

Just by typing a few hints into an LLM like ChatGPT, the AI is able to write code that creates a reward/punishment system to train the bot in a virtual space where 0 = failure and anything above 0 is victory. The higher the score, the better.

It can create parameters by minimizing and maximizing the ball's bouncing force, movement intensity, limb freedom, and damping, among other things. As an LLM, it can effortlessly create these parameters in large quantities for the training system to run simultaneously.

After each simulation, GPT can also reflect on how the virtual robot is performing, and how it can be improved. If the parameters are exceeded or violated, such as the motor overheating or attempting to articulate the limb in a way that exceeds its capacity, it will result in a score of 0... No one likes to score zero, and AI is no exception.

Hinting that LLMs need security instructions to write code – otherwise, the research team found that GPT will strive for the best performance and will "cheat" in simulations without guidance. This is not a problem in simulation, but in real life it can cause the motor to overheat or the limb to overextend, damaging the robot – a phenomenon that the researchers call "degenerative behavior."

An example of the unnatural behavior of a virtual robot that is self-taught is that it finds that it can move faster by sticking its hips into the ground and dragging them around the floor with three feet. While this is an advantage in simulation, it's awkward when the robot tries it in the real world.

As a result, the researchers instructed GPT to be extra careful as the robot would be tested in the real world – and to do this, GPT created safety features such as smooth movements, torso orientation, torso height, and ensuring that the robot's motor did not torque too much. If the bot cheats, violating these parameters, its reward function lowers the score. Safety features can reduce degeneration and unnatural behavior, such as unnecessary pelvic thrusts.

So how does it perform? Stronger than us. DrEureka defeated humans in the process of training the robot "pooch", which increased its forward speed and travel distance by 34% and 20% respectively in actual mixed terrain.

DrEureka's GPT-based training system easily beats human-trained robots in the real world

How? Researchers believe it has something to do with the way it is taught. Humans tend to gravitate towards a curriculum-style teaching environment – breaking down tasks into small steps and trying to explain them in isolation, whereas GPT is able to effectively impart all knowledge at once. That's something we simply can't do.

DrEureka is the first of its kind. It is able to go from the simulated world to the real world "zero distance". Imagine being pushed out of your lair with little to no knowledge of the world around you, leaving you to grope for yourself. This is the "zero shot".

The creators of DrEureka believe that if they can provide real-world feedback to GPT, they can further improve the simulation-to-reality training. Currently, all simulation training is done using data from the robot's own proprioceptive system, but GPT can refine its instructions more effectively if it can see what went wrong through real-world video footage, rather than just reading execution failures from the robot's logs.

It takes an average human to learn to walk in a year and a half, and only about one percent of humans learn to walk on a yoga ball.

You can watch an unedited 4 minute 33 second video here of a robot dog taking a relaxing walk on a yoga ball without stopping to pee on a fire hydrant:

The robot dog trained on the yoga ball is more flexible than most exercisers

Read on