A new method for detecting hallucinations in Large Language Models (LLMs) has emerged, cleverly named the “Pheasant Test.” Proposed by a Reddit user, the test probes an LLM’s ability to accurately recall instances of a specific, infrequent word within a well-known text. The chosen word should be peripheral to the main narrative and appear only a handful of times.
The original poster (OP) found that numerous LLMs struggled with the test when asked to identify occurrences of the word “pheasant” in the works of the Strugatsky brothers, renowned science fiction authors. Perplexity was a notable exception, succeeding where others faltered. While search engines readily provide the correct results, many LLMs hallucinated quotes or invented sources, exposing their limitations.
The “Pheasant Test” underscores the critical need for vigilance when relying on AI-generated information. Its simplicity and adaptability make it a valuable tool for assessing an LLM’s propensity to hallucinate, especially when dealing with specific details or citations. The impetus for developing this test stemmed from the OP’s encounter with ChatGPT fabricating historical facts. The test can be adapted using different languages and texts.
The original discussion and details can be found on the r/artificial subreddit.