LiteParse, an open-source PDF parsing tool, shipped version 2 on May 28 with a focus on offline, local-only extraction. The release positions it as an alternative to the LLM-based PDF parsing tools (Mistral OCR, Adobe Acrobat AI, the GPT-4o vision API) that have come to dominate the document-processing space over the past 12 months.

The tradeoffs are explicit. LiteParse does not use any LLM for extraction. It applies spatial text parsing with bounding box detection to produce structured output, with screenshot generation for visual verification, multilingual support, and cross-platform compatibility. Everything runs on the user’s machine. No cloud dependency. No subscription. No data leaves the device.

The use case this fits is regulated document processing where data residency or vendor-trust constraints rule out cloud LLM extraction. Healthcare records, legal discovery, government filings. For those teams, an open-source local-only parser is worth its lower extraction quality on edge cases.

Posted on the LiteParse X thread on 2026-05-28.