Overview
- DeepSeek‑OCR 2 replaces a CLIP‑style visual encoder with DeepEncoder V2 that uses learnable causal‑flow query tokens and mixes bidirectional and causal attention to reorder visual tokens by semantics.
- On OmniDocBench v1.5, the model achieved a 91.09% overall score, improving by 3.73 percentage points over its predecessor.
- Reading‑order accuracy improved as the reported edit distance dropped from 0.085 to 0.057 in the paper’s evaluation.
- Production logs cited by DeepSeek show duplicate rates falling to 4.17% from 6.25% for online images and to 2.88% from 3.69% for batch PDFs.
- The system retains an encoder–decoder pipeline, decodes with a Mixture‑of‑Experts language model, and caps per‑page visual tokens between 256 and 1120 to keep resource use comparable to prior systems.