Baidu's Unlimited OCR parses dozens of pages in a single forward pass

Built on top of DeepSeek OCR, the open-source model uses a constant KV cache design to sidestep context-length limits rather than extend them.

Alessandro Benigni

PUBLISHED JUN 25, 2026

3 MIN READ

Follow on Google

-1042 MIN AGO

Baidu's Unlimited OCR parses dozens of pages in a single forward pass — featured image for AI Insiders

Baidu published Unlimited OCR on GitHub on June 22, releasing both the model weights and an arXiv paper that describes its approach to long-document parsing. The project’s core claim is that a single forward pass can process dozens of pages without hitting the 32K token ceiling that ordinarily stops long-context models cold.

The key mechanism is a constant KV cache architecture. Most long-context work tries to push the context window further out, which drives up memory and compute proportionally. Unlimited OCR takes a different path: the cache size stays fixed regardless of how many pages the model processes. That design choice is what lets the project advertise multi-page parsing under a standard 32K maximum length rather than requiring a special long-context deployment.

The project acknowledges DeepSeek OCR as its direct foundation and explicitly frames itself as an attempt to extend that baseline. DeepSeek, the Hangzhou-based lab whose V3 model shipped at a fraction of frontier training costs, has become a common departure point for applied document-understanding work in the open-source community. Building on DeepSeek OCR is a deliberate vote of confidence in that lineage, and it also means Unlimited OCR inherits whatever limitations and biases that baseline carries.

Baidu’s documentation states that the underlying technique generalizes beyond optical character recognition. Automatic speech recognition and machine translation are cited as domains where the same constant-cache approach could apply. No results for those adjacent tasks appear in the current release; the claim is architectural and prospective.

The project ships with inference code for two backends: Hugging Face Transformers for single-GPU usage, and SGLang for higher-throughput deployments. Multi-page and PDF workflows are both supported, with the latter handled by converting pages to images via PyMuPDF before passing them to the model. Two inference modes, labeled “gundam” and “base,” differ in image resolution and cropping behavior.

One thing the release announcement does not include is independent benchmark results. All performance characterizations come from Baidu’s own project documentation. The arXiv paper (2606.23050) is available for closer inspection of methodology, but external replication is not yet on record. That absence is worth noting before treating the “dozens of pages in a single pass” framing as a settled capability claim rather than a reported one.

The Hugging Face Spaces demo, added June 24 by a community contributor credited as “AK,” gives developers a low-friction way to probe actual output quality on their own documents before committing to a self-hosted deployment.

Teams running document ingestion pipelines that currently stitch together multiple short-context calls to handle long PDFs should treat this release as a candidate worth evaluating. If the constant-KV-cache behavior holds at production page counts, the operational simplification alone justifies a benchmark run against your current chunking stack.

Source: Baidu, “Unlimited OCR Works” GitHub repository and arXiv preprint 2606.23050, published June 22-23, 2026.

Baidu's Unlimited OCR parses dozens of pages in a single forward pass

The morning brief for people inside the AI industry.

More in Models

ByteDance's Seedance 2.5 Generates 30-Second 4K Video From One Prompt

Krea releases open-weight image models built to escape default aesthetics

A 0.22B inpainting model that matches an 11.9B generalist