Aletheia AI, powered by Gemini 3 Deep Think, has made a significant breakthrough in solving complex math problems, successfully solving 6 out of 10 novel challenges in the inaugural FirstProof challenge.
The AI agent’s performance was assessed by a panel of experts, who verified the correctness of its solutions for problems 2, 5, 7, 8, 9, and 10. Although some disagreement existed regarding problem 8, Aletheia’s overall performance marks a notable achievement in the development of AI-powered math research tools.
The FirstProof challenge is designed to evaluate the ability of current AI systems to answer research-level mathematics questions, providing a set of 10 previously unreleased math problems with encrypted answers.
Aletheia’s raw prompts and outputs from the challenge are available for review, offering valuable insights into the capabilities and limitations of AI-powered math research tools.
Photo by Pixabay on Pexels
Photos provided by Pexels
