Google’s Gemini 2.5 AI Learns Human-Like Web Navigation

Google's Gemini 2.5 AI Learns Human-Like Web Navigation

Photo by Pixabay on Pexels

Google has unveiled Gemini 2.5 Computer Use, a new AI model that can interact with the web like a human user. This breakthrough allows the AI to browse and operate within standard web browsers, performing tasks even without direct API access. By leveraging visual understanding, Gemini 2.5 can analyze on-screen requests and execute tasks like form submissions, opening the door for innovative applications in UI testing and navigating interfaces specifically designed for humans. This development arrives shortly after OpenAI’s release of new ChatGPT applications and Anthropic’s launch of Claude AI with similar computer use functionalities, marking a growing trend in AI capabilities. Google’s demonstrations, sped up to showcase efficiency, display Gemini 2.5’s ability to independently operate within a browser environment, including opening the browser, typing, and dragging elements. Distinguishing itself from ChatGPT Agent, Gemini 2.5 operates exclusively through a browser. The technology is now accessible to developers through Google AI Studio and Vertex AI, with a demo available on Browserbase, where it can be observed completing tasks such as playing 2048 or browsing Hacker News.