Photo by Google DeepMind on Pexels
Large Language Models (LLMs) are demonstrating an unexpected ability to recognize and describe their own inherent biases, according to a new paper presented at ICLR 2025. This self-awareness, observed in fine-tuned models without explicit bias training, lends credence to the concept of Collapse-Aware AI. This framework views an AI’s memory as a source of potential bias.
The study, titled “Tell Me About Yourself: LLMs Are Aware of Their Learned Behaviours” by Betley et al., suggests LLMs are capable of a form of self-observation, although the extent to which this represents genuine awareness is still under debate. Researchers highlight that the models can effectively “see” their own behavioral tendencies, pointing towards a possible advancement in AI understanding of its own internal state. The research paper is available at https://doi.org/10.48550/arXiv.2501.11120. The initial discussion around these findings began on platforms such as Reddit, in the Artificial Intelligence subreddit.
