Projects

Optical Character and Hand-Written Text Recognition for Indian Languages

Co-PI: Chetan Arora, CSE, IIT Delhi
The project aimed at developing the capacity for rendering of digital text from print and manuscripts in Indian languages through Optical Character Recognition (OCR) and Hand-Written Text Recognition (HTR). We have created a browser-based took Lipikar that allows the user to upload scanned documents and obtain highly accurate machine readable text.

At present we have reached state of the art capability for Hindi, Urdu, Kannada, Bengali, Gujarati, Malayalam, Marathi, Nepali, Sanskrit, Punjabi, Sindhi, Tamil, Kashmiri, Manipuri, Assamese, Santali asa script, Santali bn script, Konkani, Dogri, Maithili, Bodo, Oriya, Telugu.

This project has been recognized as one among the Most Impactful Research Projects at IIT Delhi.

UTRNet: High-Resolution Urdu Text Recognition In Printed Documents Chetan Arora Abdur Rahman, Arjun Ghosh. 17th International Conference on Document Analysis and Recognition (ICDAR 2023). [Springer Link].