Particle.news
Download on the App Store

DeepSeek Open-Sources OCR System That Compresses LLM Contexts With Visual Tokens

By mapping text to images, the model targets long-context costs with claimed 7–20× token cuts now under community testing.

Overview

  • DeepSeek released code and weights on GitHub and Hugging Face, quickly attracting thousands of stars and active developer interest.
  • The architecture combines a ~380M-parameter DeepEncoder with a 3B-parameter MoE decoder that uses about 570M active parameters.
  • On OmniDocBench, the team reports surpassing GOT-OCR 2.0 using 100 vision tokens per page and outperforming MinerU 2.0 while staying under 800 tokens.
  • Throughput is reported at more than 200,000 pages per day on a single Nvidia A100, suggesting substantial processing and cost-efficiency gains.
  • Reported precision is about 97% at under 10× compression but falls to roughly 60% at 20×, highlighting trade-offs that await independent validation.