Particle News: PaddlePaddle Releases PaddleOCR-VL, a 0.9B Vision-Language Model for Multilingual Document Parsing

Overview

The open-source release is live on Hugging Face with model weights, code, documentation, and a dockerized vllm-server example for GPU inference.
The architecture integrates a dynamic-resolution NaViT-style visual encoder with ERNIE-4.5-0.3B to deliver a compact VLM optimized for low-resource deployment.
Supported use cases include page and element parsing across 109 languages covering text, tables, formulas, charts, handwritten content, and historical documents.
The project cites state-of-the-art results on OmniDocBench with additional metrics from MinerU and internal evaluations, including claims of outperforming some 72B multimodal models on chart recognition.
Practical usage is provided through CLI and Python APIs with instructions to launch an inference server, while the reported performance advantages await independent verification.