Particle.news
Download on the App Store

DeepSeek-OCR Turns Text Into Images to Slash Context Tokens

The open release is drawing rapid experiments, with claims of steep compression facing questions about accuracy and reasoning.

Overview

  • DeepSeek AI published code and weights for DeepSeek-OCR, an open-source system that renders long-form text as images and compresses them into a small set of vision tokens.
  • The team reports roughly 10x compression with about 97% OCR decoding accuracy, and around 60% accuracy at 20x compression, figures that await independent replication.
  • Its architecture pairs a ~380M-parameter DeepEncoder with a DeepSeek3B-MoE decoder that activates about 570M parameters per step, aiming for high throughput with fewer active experts.
  • Benchmarks cited by the authors show stronger results on OmniDocBench using far fewer tokens than GOT-OCR2.0 and MinerU2.0, plus a claimed throughput of 200,000+ pages per day on a single A100-40G.
  • Developers and prominent technologists, including Andrej Karpathy, are testing the approach for long-context tasks such as large document sets and codebases, while noting open questions about robustness and downstream reasoning over visual tokens.