Gemini-Powered AI Agents Learn Real-World Skills in Virtual Worlds

Gemini-Powered AI Agents Learn Real-World Skills in Virtual Worlds

Photo by Omer Unlu on Pexels

Google DeepMind is advancing the development of versatile AI agents by using its Gemini model to train them in diverse virtual environments. The latest agent, SIMA 2, showcases enhanced navigation and problem-solving capabilities across a variety of 3D simulated worlds, paving the way for more adaptable AI in robotics and real-world applications.

Building on the foundation of the original SIMA, the integration of Gemini significantly amplifies SIMA 2’s capabilities. Researchers highlight its ability to tackle complex tasks, independently generate solutions, and engage in interactive conversations with users. Through repeated practice and learning from mistakes, SIMA 2 continually refines its skillset.

Unlike game-specific AI like AlphaZero and AlphaStar, which were trained for fixed objectives, SIMA learns open-ended gameplay based on human instructions. Users can interact with SIMA 2 through text, voice commands, or visual annotations, enabling the agent to analyze the game’s visual data frame by frame and determine the appropriate actions to achieve the assigned task.

The training process for SIMA 2 involved analyzing footage of human gameplay across eight commercial video games, including titles like No Man’s Sky and Goat Simulator 3, alongside specifically designed virtual environments. The connection with Gemini has led to improvements in instruction following, proactive questioning, and autonomous problem-solving.

While SIMA 2 still faces challenges in handling intricate, multi-stage tasks and retaining information over extended periods, Google DeepMind plans to further refine its capabilities within a simulated “endless virtual training dojo.” This environment will leverage their world model, Genie 3, with Gemini providing feedback to guide the learning process.