Photo by Samer Daboul on Pexels
DeepSeek, a Chinese AI firm, is making waves with a groundbreaking AI model that dramatically improves memory retention through an innovative image-based approach. Eschewing traditional text tokens, the model uses optical character recognition (OCR) – similar to that found in scanner apps – to translate images into machine-readable text. While the OCR component boasts competitive performance, the real breakthrough lies in how the model processes and stores information as images. This method significantly reduces the computational resources needed, leading to a lower carbon footprint compared to conventional text-based AI models.
Industry luminary Andrej Karpathy, a former figure at Tesla and OpenAI, has lauded DeepSeek’s strategy, even speculating that images might offer advantages over text as input for large language models. Researchers anticipate this technology could revolutionize AI memory and reasoning capabilities. Furthermore, the system’s ability to generate over 200,000 pages of training data daily using a single GPU could empower developers to train more robust models. The improved memory promises more effective assistance for users. Current research is focusing on applying visual tokens to reasoning processes, furthering the implications of the breakthrough.
