Mistral's OCR 4 Adds Bounding Boxes and 170-Language Support

The Paris-based lab ships a compact document extraction model with structured output, confidence scores, and single-container self-hosting aimed at RAG and agentic pipelines.

Alessandro Benigni

PUBLISHED JUN 25, 2026

4 MIN READ

Follow on Google

-1044 MIN AGO

Mistral's OCR 4 Adds Bounding Boxes and 170-Language Support — featured image for AI Insiders

Mistral released OCR 4, a document intelligence model that returns structured representations of scanned and digital files, not just clean text. Each extracted block carries a bounding box, a type classification (titles, tables, equations, signatures), and confidence scores at the word and page level. The addition of bounding boxes, which Mistral describes as the most-requested capability since OCR 3, is the signal that this release is aimed squarely at developers building on top of the output, not just at end users who want searchable PDFs.

The structured output design matters for at least three downstream use cases that basic text extraction cannot support well. For retrieval-augmented generation, typed blocks become cleaner retrieval units than flat text dumps: a table stays a table, a title stays a title, and semantic chunking can follow document logic rather than character count. For agentic workflows, confidence scores and block coordinates give an agent enough grounding to act on a document, fill a form, flag a redaction zone, or route low-confidence regions to a human reviewer. For data pipelines, consistent typed output means ingestion connectors do not need to reverse-engineer structure on the downstream end.

Mistral priced OCR 4 at $4 per 1,000 pages via the standard API, $2 per 1,000 pages through the Batch API, and $5 per 1,000 pages for Document AI mode, which pipes the OCR output through mistral-small-2603 to return a caller-defined JSON schema. That last option means teams can define exactly the fields they need extracted and get a structured JSON object back in one call, without writing a separate parsing layer.

On benchmarks, the picture deserves a close read. Mistral reports a top score of 85.20 on OlmOCRBench and 93.07 on OmniDocBench. The company also conducted a human preference evaluation across 600 documents, finding its model preferred over competitors in the majority of head-to-head comparisons. What the release announcement does not include is verification from an independent testing organization. Every benchmark figure in the post comes from Mistral’s own internal reproduction of competitor results, and the post itself flags known scoring artifacts: ground-truth errors in the reference annotations, LaTeX rendering mismatches counted as failures, and column-ordering assumptions that penalize correct output. Mistral’s candor about these limitations is unusual and useful, but it does not substitute for a third-party audit. One customer quoted in the post, Rogo AI engineer Aidan Donohue, cited equivalent accuracy to a prior agentic parser at roughly 8x lower cost and 17x lower latency. That figure is unverified and came from a customer testimonial, not a controlled benchmark.

The 170-language coverage spans eight language groups, and Mistral says the gap over competitors is widest for specialized and low-resource languages, specifically naming Hindi, Georgian, Bengali, Armenian, and others in that category. Teams processing multilingual archives or non-European document sets have historically had to accept significant accuracy degradation from cloud OCR services on these languages. If Mistral’s internal multilingual evaluation holds under external scrutiny, that is a genuine differentiator for enterprises with global document workflows.

The self-hosting option is worth noting separately. OCR 4 is compact enough to run in a single container, which means organizations with data-residency requirements can keep documents inside their own infrastructure without routing to a cloud API. That closes a gap that has forced many regulated enterprises to maintain separate, often inferior, on-premise OCR stacks. Availability through Amazon SageMaker and Microsoft Foundry extends the reach into environments where teams already manage cloud spend through existing agreements.

OCR 4 integrates with Mistral’s Search Toolkit, the open-source composable search framework the company announced at the AI Now Summit. The connection is practical: OCR output feeds directly into Search Toolkit’s ingestion and retrieval pipeline, providing citation-ready structured blocks for enterprise search builds.

Teams evaluating document extraction infrastructure should run OCR 4 on their own document corpus, particularly if the corpus is multilingual or math-heavy. Mistral explicitly recommends this and the cost structure at $2 per 1,000 pages on the Batch API makes a substantial evaluation sample affordable before any production commitment.

Source: Mistral, published June 24, 2026, at mistral.ai/news/ocr-4/.

Mistral's OCR 4 Adds Bounding Boxes and 170-Language Support

The morning brief for people inside the AI industry.

More in Tools

AWS adds RTX PRO 4500 Blackwell GPUs to EC2 G7 instances for inference

Fluree DB packs graph, vector, text, and geo search into one engine

Graphsignal brings production inference profiling to every GPU in the stack