Large Language Models (LLMs) rely on a context window that serves as working memory, with limited capacity and rapid access. However, when task-relevant information exceeds this window, the LLM’s coherence is compromised. The conventional solution, known as RAG, involves offloading information to a vector store and retrieving it via embedding similarity search. Nevertheless, this approach has its limitations, as embedding similarity is semantically shallow, matching on surface-level likeness rather than reasoning.
A proposed solution involves having an LLM save the content of its context window in a citation-grounded document store, such as NotebookLM, and then query it with natural language prompts. This approach enables the LLM to ask questions about its own prior work, replacing vector similarity with natural language reasoning as the retrieval mechanism. By leveraging the full reasoning capability of the retrieval model, this method yields higher-quality retrieval for nuanced, context-dependent information that is crucial in extended tasks.
Efficiency concerns can be addressed with a vector cache layer for previously-queried results. This innovative approach has the potential to significantly enhance the cognitive architectures of LLMs, and feedback is welcome to further explore and refine this concept.
Photo by Pixabay on Pexels
Photos provided by Pexels
