Google SIMA 2 uses Gemini Agent to think and act in virtual worlds

Google DeepMind subscriber Thursday a research preview of SIMA 2, the next-generation artificial general intelligence agent that integrates language and the logical powers of Gemini, Google’s grand language model, to go beyond simply following instructions to understanding and interacting with its environment.

Like many DeepMind projects, including AlphaFold, the… First version of SIMA It’s trained on hundreds of hours of video game data to learn how to play multiple 3D games like a human, even some it’s not trained for. Sima 1, It was unveiled in March 2024can follow basic instructions across a wide range of virtual environments, but has only a 31% success rate at completing complex tasks, compared to 71% for humans.

“SIMA 2 is a step change and improvement in capabilities compared to SIMA 1,” Joe Marino, chief research scientist at DeepMind, said in a press conference. “It’s a more general agent. It can complete complex tasks in previously unseen environments. It’s a self-improving agent. So it can actually improve itself based on its own experience, which is a step toward more general-purpose robots and AGI systems in general.”

DeepMind says SIMA 2 doubles the performance of SIMA 1.Image credits:Google DeepMind

SIMA 2 is powered by a Gemini 2.5 flash-lite model, and AGI stands for Artificial General Intelligence, which DeepMind defines as a system capable of a wide range of intellectual tasks with the ability to learn new skills and generalize knowledge across different domains.

Working with so-called “embodied agents” is crucial for generalized intelligence, DeepMind researchers say. Marino explained that an embodied agent interacts with the physical or virtual world via the body — monitoring inputs and taking actions much like a robot or human — while a disembodied agent might interact with your calendar, take notes, or execute code.

SIMA 2 goes beyond gaming, Jane Wang, a research scientist at DeepMind with a background in neuroscience, told TechCrunch.

“We’re asking it to actually understand what’s going on, to understand what the user is asking it to do, and then be able to respond in a logical way which is actually very difficult,” Wang said.

TechCrunch event

San Francisco
|
October 13-15, 2026

By incorporating Gemini, SIMA 2 doubled the performance of its predecessor, uniting Gemini’s advanced language and reasoning abilities with the embodied skills developed through training.

Marino demonstrated SIMA 2 in No Man’s Sky, where the agent described his surroundings – the surface of a rocky planet – and determined his next steps by recognizing and interacting with a distress beacon. SIMA 2 also uses Gemini to think internally. In another game, when asked to walk to a house that is the color of a ripe tomato, the agent showed his reasoning – Ripe tomatoes are red, so I should go to the red house – and then found it and approached it.

Being powered by Gemini also means that SIMA 2 follows instructions based on emojis: “Guide it 🪓🌲, and it will chop down a tree,” Marino said.

Marino also demonstrated how SIMA 2 can navigate the newly created photorealistic worlds it produces The geniea DeepMind world model, that correctly identifies and interacts with objects such as benches, trees, and butterflies.

DeepMind says SIMA 2 is a self-optimizing agent.Image credits:Google DeepMind

Marino added that Gemini also allows for self-improvement without the need for a lot of human data. Where SIMA 1 is fully trained on human gameplay, SIMA 2 uses it as a baseline to provide a robust prototype. When the team places the agent in a new environment, it asks the other Gemini model to create new tasks and a separate reward model to record the agent’s attempts. Using these self-generated experiences as training data, the agent learns from its mistakes and performs progressively better, essentially teaching itself new behaviors through trial and error as a human would, guided by AI-based feedback rather than humans.

DeepMind sees SIMA 2 as a step toward opening up more general-purpose technology Robots.

“If we think about what a system needs to do to perform tasks in the real world, like a robot, I think there are two components to it,” Frederic Pace, a senior research engineer at DeepMind, said during a press conference. “First, there is a high-level understanding of the real world and what needs to be done, as well as some logic.”

If you ask a robot in your home to check how many cans of beans are in the cupboard, the system will need to understand all the different concepts — what are beans, what is a cupboard — and navigate to that location. Besse says SIMA 2 addresses this higher-level behavior more than it does lower-level actions, which he refers to as controlling things like physical joints and wheels.

The team declined to share a specific timeline for implementing SIMA 2 in physical robotics systems. Besse told TechCrunch that DeepMind recently unveil Basic models of robots — which can also reason about the physical world and create multi-step plans to complete a task — are trained differently and separately from SIMA.

Although there is no timeline for releasing more than a preview of SIMA 2, Wang told TechCrunch that the goal is to show the world what DeepMind is working on and see what types of collaborations and potential uses are possible.

Leave a ReplyCancel Reply