Particle News: DeepSeek-OCR Turns Text Into Images to Slash Context Tokens

Overview

DeepSeek AI published code and weights for DeepSeek-OCR, an open-source system that renders long-form text as images and compresses them into a small set of vision tokens.
The team reports roughly 10x compression with about 97% OCR decoding accuracy, and around 60% accuracy at 20x compression, figures that await independent replication.
Its architecture pairs a ~380M-parameter DeepEncoder with a DeepSeek3B-MoE decoder that activates about 570M parameters per step, aiming for high throughput with fewer active experts.
Benchmarks cited by the authors show stronger results on OmniDocBench using far fewer tokens than GOT-OCR2.0 and MinerU2.0, plus a claimed throughput of 200,000+ pages per day on a single A100-40G.
Developers and prominent technologists, including Andrej Karpathy, are testing the approach for long-context tasks such as large document sets and codebases, while noting open questions about robustness and downstream reasoning over visual tokens.