Particle.news
Download on the App Store

DeepSeek Releases DeepSeekOCR 2 With Encoder That Learns Human‑Like Reading Order

Company-reported tests show gains on OmniDocBench v1.5 with improved production stability.

Overview

  • DeepSeekOCR 2 replaces a CLIP‑style visual encoder with DeepEncoder V2 that uses learnable causal‑flow query tokens and mixes bidirectional and causal attention to reorder visual tokens by semantics.
  • On OmniDocBench v1.5, the model achieved a 91.09% overall score, improving by 3.73 percentage points over its predecessor.
  • Reading‑order accuracy improved as the reported edit distance dropped from 0.085 to 0.057 in the paper’s evaluation.
  • Production logs cited by DeepSeek show duplicate rates falling to 4.17% from 6.25% for online images and to 2.88% from 3.69% for batch PDFs.
  • The system retains an encoder–decoder pipeline, decodes with a Mixture‑of‑Experts language model, and caps per‑page visual tokens between 256 and 1120 to keep resource use comparable to prior systems.