Google’s Gemini Robotics AI Model Reaches Into the Physical World

In sci-fi tales, artificial intelligence often powers all sorts of clever, capable, and occasionally homicidal robots. A revealing limitation of today’s best AI is that, for now, it remains squarely trapped inside the chat window.
Google DeepMind signaled a plan to change that today—presumably minus the homicidal part—by announcing a new version of its AI model Gemini that fuses language, vision, and physical action together to power a range of more capable, adaptive, and potentially useful robots.
In a series of demonstration videos, the company showed several robots equipped with the new model, called Gemini Robotics, manipulating items in response to spoken commands: Robot arms fold paper, hand over vegetables, gently put a pair of glasses into a case, and complete other tasks. The robots rely on the new model to connect items that are visible with possible actions in order to do what they’re told. The model is trained in a way that allows behavior to be generalized across very different hardware.
Google DeepMind also announced a version of its model called Gemini Robotics-ER (for embodied reasoning), which has just visual and spatial understanding. The idea is for other robot researchers to use this model to train their own models for controlling robots’ actions.
In a video demonstration, Google DeepMind’s researchers used the model to control a humanoid robot called Apollo, from the startup Apptronik. The robot converses with a human and moves letters around a tabletop when instructed to.
“We've been able to bring the world-understanding—the general-concept understanding—of Gemini 2.0 to robotics,” said Kanishka Rao, a robotics researcher at Google DeepMind who led the work, at a briefing ahead of today’s announcement.
Google DeepMind says the new model is able to control different robots successfully in hundreds of specific scenarios not previously included in their training. “Once the robot model has general-concept understanding, it becomes much more general and useful,” Rao said.
The breakthroughs that gave rise to powerful chatbots, including OpenAI’s ChatGPT and Google’s Gemini, have in recent years raised hope of a similar revolution in robotics, but big hurdles remain.
The large language models (LLMs) that power modern chatbots were created using more general learning algorithms, internet-scale training data, and vast amounts of computer power. While it is not yet possible to gather robot training data on that scale, LLMs can be used as a foundation for more capable robot models, because they contain a wealth of information about the physical world and can communicate so well. Robotics researchers are now combining LLMs with new approaches to learning through teleoperation or simulation that allow models to practice physical actions more efficiently.
In recent years, Google has revealed a number of robotics research projects that show the potential of these approaches. As WIRED detailed in a recent profile, several key researchers involved with this earlier work have left the company to found a startup called Physical Intelligence. As WIRED first reported, a lab run by the Toyota Research Institute is doing similar work.
Google DeepMind showed that it is keeping pace with these efforts in September 2024, revealing a robot that combines LLMs and new training methods to perform dexterous tasks like tying shoelaces and folding clothes on command.
Rao said that Google DeepMind’s new robot model has even broader abilities. Physical Intelligence and the Toyota Research Institute have released similar demonstration videos.
Gemini Robotics also hints at where Google DeepMind expects AI to go in coming years, as the race to advance the technology continues to intensify. The company appeared to be caught flat-footed by the introduction of ChatGPT in November 2022, but since then it has ramped up efforts to gain an edge by pursuing advances that take AI beyond just text and conversation.
When Google announced Gemini in December 2023, the company emphasized the fact that the model was multimodal, meaning that it was trained from scratch to handle images and audio as well as text. Robotics will take AI into the realm of physical action as well. Some researchers argue that a form of embodiment may be needed for AI to match or exceed human capabilities.
Google said at its briefing that it is currently collaborating with a number of robotics companies, including Agility Robotics and Boston Dynamics, which make legged robots, and Enchanted Tools, which makes robots for the service industry.
OpenAI shut down a robotics research effort in 2021, but restarted it in 2024, according to The Robot Report. OpenAI currently lists several job openings for robotics researchers on its website.
Using today’s AI models to control robots introduces new risks, however. In December 2024, for example, a team of roboticists at the University of Pennsylvania showed that so-called jailbreaks that get AI models to misbehave can have unexpected and serious consequences when the model operates a robot. The researchers targeted several commercial robots, none of which use DeepMind’s technology, and were, for example, able to use such an attack to get a wheeled robot to deliver an imaginary bomb.
To mitigate such risks—as well as more sci-fi worries about supersmart robots going rogue—Google DeepMind also today announced a new benchmark for gauging risks with AI-powered robots.
The benchmark is called ASIMOV, after the science-fiction author Issac Asimov, who envisioned four foundational rules for guiding robot behavior. As Asimov wrote, a set of simple rules fails to account for the vast number of different scenarios that a truly capable robot might encounter in the wild.
Parada emphasized that the work is at an early stage and said it may take years for robots to learn to become significantly more capable. She noted that, unlike humans, robots using the Gemini Robotics models do not learn as they do things. And she said there are currently no firm plans to commercialize or deploy the technology.
What do you make of Google’s robot model? Is it the path to more advanced AI, or should we worry about today’s models operating in the physical world?
In your inbox: Upgrade your life with WIRED-tested gear
Musk takeover: DOGE’s race to the bottom
Big Story: The bust that took down Durov and upended Telegram
WIRED’s favorite ‘buy it for life’ gear
Event: Join us for WIRED Health on March 18 in London
EMEA Tribune is not responsible for this news, news agencies have provided us this news.
Follow us on our WhatsApp channel here .