OCR (Optical Character Recognition) App using ColPali implementation of the new Byaldi library + Huggingface transformers for Qwen2-VL. Made By Prasoon Sharma
This project is an Optical Character Recognition (OCR) app built using Gradio and powered by deep learning models such as Qwen2-VL. The app allows users to upload an image, extract text in both Hindi and English, and optionally highlight keywords in the extracted text.
- Upload an image and extract text in multiple languages (supports Hindi and English).
- Optional keyword search: Highlights specific keywords in the extracted text.
- Uses OpenCV for image preprocessing.
- Intuitive web-based UI with Gradio for easy interaction.
- Python 3.7+
- PyTorch
- OpenCV
- Gradio
- Transformers (Hugging Face)
Install dependencies using the following command:
pip install -r requirements.txt
- Run the Gradio app on Huggingface
- Run on Google Collab
- Run the Python file locally