OdiaOCR Logo

About

This initiative builds on the Odia Lipi — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages.

The goal is to host open OCR datasets, models, tools, and benchmarks that empower researchers, developers, linguists, and archivists to extract machine‑readable text from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications.

Vision

To build robust, open, and community‑driven Odia OCR datasets and models that can accurately recognize both printed and handwritten Odia script, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows.

Problem Statement

Odia, like many other Indic languages, is underserved by existing OCR systems, which struggle with:

Complex ligatures and diacritics in Odia script
Limited high‑quality annotated OCR datasets
Lack of reliable handwritten text recognition
Inadequate open‑source OCR models for Indic scripts
Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing.

What We Work On

Odia and Indic OCR Dataset Creation & Curation
OCR Model Training & Evaluation (Printed + Handwritten)
OCR Annotation Tools & Workflows
Benchmarks & Quality Metrics
Integration with Multimodal NLP and Language Models (text + image)

This project aims to make Odia text searchable, editable, and machine‑interpretable, enabling downstream language technologies such as translation, summarization, and speech‑to‑text.

How to Contribute

We welcome contributions from researchers, students, linguists, and developers for:

Dataset annotation and quality verification
Model training and evaluation
Benchmark creation
Tool development for OCR preprocessing and postprocessing

Feel free to open issues, share data sources, or propose collaborations.

🧩 Visit the org page: https://huggingface.co/OdiaGenAIOCR