GPT-5 Excels in Computer Agent Task Performance, Outperforming GPT-4o

Photo by energepic.com on Pexels

Emerging data suggests that GPT-5 is demonstrating enhanced capabilities as a computer use agent compared to GPT-4o. In recent experiments, researchers evaluated both models in tasks requiring navigation to a random URL and successful gameplay to a specific target score (5/5). The setup involved using GPT-5 in place of GPT-4o as the primary ‘thinking’ model, while maintaining a consistent grounding model (Salesforce GTA1-7B) for both. Task prompts were generated using Claude, drawing from a predefined set. Initial findings indicate that GPT-5 exhibits a notable improvement in task completion. For those interested in replicating the experiments or exploring the associated documentation, resources are available at https://github.com/trycua/cua and https://docs.trycua.com/docs/agent-sdk/supported-agents/composed-agents. The findings have also sparked discussion on platforms like Reddit, as seen in this thread: https://old.reddit.com/r/artificial/comments/1mmlw38/gpt_5_for_computer_use_agents/.

Huge AI News

GPT-5 Excels in Computer Agent Task Performance, Outperforming GPT-4o

More posts

The AI Control Conundrum: Why More AI Isn’t the Solution

SysSignal: Your Central Hub for AI and Data Center News

Solving ChatGPT’s Top User Frustrations with a Revolutionary Toolbox

Autonomous Trucks Pave the Way for a Self-Driving Future