Doubts Cast on GPT-4’s Cognitive Prowess: Study Methodology Questioned

Doubts Cast on GPT-4's Cognitive Prowess: Study Methodology Questioned

Photo by Christina Morillo on Pexels

A new study assessing GPT-4’s performance on cognitive psychology benchmarks is facing scrutiny over its methodology. While the research initially indicated strong results, with scores ranging from approximately 83% to 91%, experts are raising concerns. Key issues include inconsistencies observed across various datasets, the possibility that GPT-4 may have memorized training data, and the study’s reliance on the ChatGPT interface instead of a more controlled application programming interface (API). These methodological flaws, critics argue, jeopardize the study’s overall validity and limit the extent to which we can accurately assess GPT-4’s cognitive abilities. The debate began on the r/artificial intelligence subreddit ([https://old.reddit.com/r/artificial/comments/1nf0ifw/gpt4_scores_high_on_cognitive_psychology/]) which linked to the study: https://arxiv.org/abs/2303.11436.