๐Ÿ•ฐ๏ธ OCR Time Machine

Travel through time to see how OCR technology has evolved!

For decades, galleries, libraries, archives, and museums (GLAMs) have used Optical Character Recognition to transform digitized books, newspapers, and manuscripts into machine-readable text. Traditional OCR produces complex XML formats like ALTO, packed with layout details but difficult to use. Now, cutting-edge Vision-Language Models (VLMs) are revolutionizing OCR with simpler, cleaner Markdown output. This Space makes it easy to compare these two approaches and see which works best for your historical documents. Upload a historical document image and its XML file to compare these approaches side-by-side. We'll extract the reading order from your XML for an apples-to-apples comparison of the actual text content.

Available models: RolmOCR | Nanonets-OCR-s


๐Ÿš€ How it works

  1. ๐Ÿ“ค Upload Image: Select a historical document image (JPG, PNG, JP2)
  2. ๐Ÿ“„ Upload XML (Optional): Add the corresponding ALTO or PAGE XML file for comparison
  3. ๐Ÿค– Choose Model: Select between RolmOCR (new) or Nanonets-OCR-s (even newer!)
  4. ๐Ÿ” Compare: Click 'Compare OCR Methods' to process
  5. ๐Ÿ’พ Download: Save the results for further analysis

๐Ÿ“ฅ Upload Files

๐Ÿ“ค Step 1: Upload your document

๐Ÿค– Step 2: Select OCR Model

Choose Model

RolmOCR: Fast & general-purpose | Nanonets: Advanced with table/math support

๐Ÿ“Š Results

๐Ÿ–ผ๏ธ Document Image

๐Ÿค– Modern VLM OCR Output

๐Ÿ“œ Traditional OCR Output


๐ŸŽฏ Try an Example

Examples
Historical Document Image XML File (Optional - ALTO or PAGE format) Choose Model

Example from 'A Medical History of British India' collection, National Library of Scotland


๐Ÿ“š About ALTO/PAGE XML

  • ALTO (Analyzed Layout and Text Object) and PAGE are XML formats that store OCR results with detailed layout information
  • These files are typically generated by traditional OCR software and include position data for each text element
  • This tool extracts just the reading order text for easier comparison

๐ŸŽฏ Best Practices

  • Use high-resolution scans (300+ DPI) for best results
  • Historical documents with clear text work best
  • The VLM models can handle complex layouts, tables, and mathematical notation

Built with โค๏ธ for the GLAM community | Learn more about OCR formats | Questions? Open an issue