This initiative builds on the Odia Lipi — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages.
The goal is to host open OCR datasets, models, tools, and benchmarks that empower researchers, developers, linguists, and archivists to extract machine‑readable text from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications.
To build robust, open, and community‑driven Odia OCR datasets and models that can accurately recognize both printed and handwritten Odia script, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows.
Odia, like many other Indic languages, is underserved by existing OCR systems, which struggle with:
This project aims to make Odia text searchable, editable, and machine‑interpretable, enabling downstream language technologies such as translation, summarization, and speech‑to‑text.
We welcome contributions from researchers, students, linguists, and developers for:
Feel free to open issues, share data sources, or propose collaborations.
🧩 Visit the org page: https://huggingface.co/OdiaGenAIOCR