Photo by KATRIN BOLOVTSOVA on Pexels
In a creative stress test, leading language models, including GPT-5, Claude Opus 4.1, GPT o3-pro, Grok 4, and Gemini 2.5 Pro, recently engaged in a battle of wits orchestrated by a Reddit user. The challenge: to craft a 650-word scripted debate between Cleopatra and Albert Einstein, transported to 2025 to argue the pros and cons of TikTok. This exercise demanded not only creative writing but also faithful character representation and strict adherence to formatting guidelines.
The experiment, detailed on ModelArena, revealed that while all models successfully navigated the structural requirements, significant variations emerged in tone, depth, and stylistic choices. The Reddit user who conducted the test provided access to the full outputs and insightful summaries, allowing for a detailed comparison of the AI giants’ performance. The original discussion and outputs can be found on Reddit: https://old.reddit.com/r/artificial/comments/1mllllv/which_llm_is_king_right_now_i_ran_a_creative/