Overview
- DeepSeek AI published code and weights for DeepSeek-OCR, an open-source system that renders long-form text as images and compresses them into a small set of vision tokens.
- The team reports roughly 10x compression with about 97% OCR decoding accuracy, and around 60% accuracy at 20x compression, figures that await independent replication.
- Its architecture pairs a ~380M-parameter DeepEncoder with a DeepSeek3B-MoE decoder that activates about 570M parameters per step, aiming for high throughput with fewer active experts.
- Benchmarks cited by the authors show stronger results on OmniDocBench using far fewer tokens than GOT-OCR2.0 and MinerU2.0, plus a claimed throughput of 200,000+ pages per day on a single A100-40G.
- Developers and prominent technologists, including Andrej Karpathy, are testing the approach for long-context tasks such as large document sets and codebases, while noting open questions about robustness and downstream reasoning over visual tokens.