🕰️ OCR Time Machine

Travel through time to see how OCR technology has evolved!

For decades, galleries, libraries, archives, and museums (GLAMs) have used Optical Character Recognition to transform digitized books, newspapers, and manuscripts into machine-readable text. Traditional OCR produces complex XML formats like ALTO, packed with layout details but difficult to use. Now, Vision-Language Models (VLMs) are revolutionizing OCR with simpler, cleaner output. This Space lets you compare four leading VLM-based OCR models against traditional approaches. Upload a historical document image and its XML file to see them side-by-side. We'll extract the reading order from your XML for an apples-to-apples comparison of the actual text content.

Available models: • RolmOCR - Fast & general-purpose • Nanonets-OCR-s - Advanced with table/math support • olmOCR - Allen AI's pioneering 7B document specialist • OCRFlux-3B - Document specialist with table parsing & cross-page merging • Ovis2.5-9B - Native-resolution multimodal model with advanced reasoning

🚀 How it works

📤 Upload Image: Select a historical document image (JPG, PNG, JP2)
📄 Upload XML (Optional): Add the corresponding ALTO or PAGE XML file for comparison
🤖 Choose Model: Select between RolmOCR (new) or Nanonets-OCR-s (even newer!)
🔍 Compare: Click 'Compare OCR Methods' to process
💾 Download: Save the results for further analysis

📥 Upload Files

📤 Step 1: Upload your document

Historical Document Image

XML File (Optional - ALTO or PAGE format)

🤖 Step 2: Select OCR Model

Choose Model

RolmOCR: Fast & general-purpose | Nanonets: Advanced with table/math support | olmOCR: 7B specialized for documents | OCRFlux-3B: Document specialist with cross-page merging | Ovis2.5-9B: Native-resolution with advanced reasoning

📊 Results

🖼️ Document Image

Uploaded Document

🤖 Modern VLM OCR Output

📜 Traditional OCR Output

XML Reading Order

🎯 Try an Example

Examples

Historical Document Image	XML File (Optional - ALTO or PAGE format)	Choose Model

Example from 'A Medical History of British India' collection, National Library of Scotland

Built with ❤️ for the GLAM community | Learn more about OCR formats | Questions? Open an issue