Particle News: DeepSeek Releases DeepSeek‑OCR 2 With Encoder That Learns Human‑Like Reading Order

Overview

DeepSeek‑OCR 2 replaces a CLIP‑style visual encoder with DeepEncoder V2 that uses learnable causal‑flow query tokens and mixes bidirectional and causal attention to reorder visual tokens by semantics.
On OmniDocBench v1.5, the model achieved a 91.09% overall score, improving by 3.73 percentage points over its predecessor.
Reading‑order accuracy improved as the reported edit distance dropped from 0.085 to 0.057 in the paper’s evaluation.
Production logs cited by DeepSeek show duplicate rates falling to 4.17% from 6.25% for online images and to 2.88% from 3.69% for batch PDFs.
The system retains an encoder–decoder pipeline, decodes with a Mixture‑of‑Experts language model, and caps per‑page visual tokens between 256 and 1120 to keep resource use comparable to prior systems.