Photo by energepic.com on Pexels
Emerging data suggests that GPT-5 is demonstrating enhanced capabilities as a computer use agent compared to GPT-4o. In recent experiments, researchers evaluated both models in tasks requiring navigation to a random URL and successful gameplay to a specific target score (5/5). The setup involved using GPT-5 in place of GPT-4o as the primary ‘thinking’ model, while maintaining a consistent grounding model (Salesforce GTA1-7B) for both. Task prompts were generated using Claude, drawing from a predefined set. Initial findings indicate that GPT-5 exhibits a notable improvement in task completion. For those interested in replicating the experiments or exploring the associated documentation, resources are available at https://github.com/trycua/cua and https://docs.trycua.com/docs/agent-sdk/supported-agents/composed-agents. The findings have also sparked discussion on platforms like Reddit, as seen in this thread: https://old.reddit.com/r/artificial/comments/1mmlw38/gpt_5_for_computer_use_agents/.