Photo by Tima Miroshnichenko on Pexels
A comprehensive evaluation of local AI models has revealed promising capabilities in Retrieval-Augmented Generation (RAG) tasks. The study, conducted on a standard 16GB Macbook Air M2, simulated real-world scenarios relevant to knowledge workers, developers, students, analysts, and consultants who regularly engage with document-intensive question answering. The focus was on privacy-conscious applications.
Over 100 RAG tasks were used to assess model performance, including fact extraction with citation, comparative evidence analysis across multiple documents, timeline generation, document summarization, and themed collection organization.
LMF2–1.2B-MLX-8bit and Qwen3–1.7B-MLX-8bit emerged as the top performers. LMF2 demonstrated particular strength in comparing evidence, while Qwen3 excelled at pinpointing factual information. Gemma-3–1b-1t-8bit showed less impressive results across all evaluated tasks.
The study’s creator has provided a detailed workflow, enabling others to replicate the tests. This process involves setting up a local file agent, downloading the models, executing prompts, and analyzing the generated answers.
Further testing of new models is planned. More details and community contributions can be found on Reddit: [https://old.reddit.com/r/artificial/comments/1o60yq3/i_tested_local_models_on_100_real_rag_tasks_here/]
