Text extraction form Images, OCR, Tesseract, Basic Image manipulation are all important yet very basic scripts.
This script uses pytesseract
for text extraction from images, considering it only recognizes text and can
only print it, this script additionally adds a functionality to write the text in a txt
and/or csv
file.
- Setup a
python 3.x
virtual environment. Activate
the environment- Install the dependencies using
pip3 install -r requirements.txt
- You are all set and the script is Ready to run.
- Carefully follow the Instructions.
Some newcomers for the first time struggle with Tesseract, this is a direct link to the installer
Setting up OCR can be found here
PATH env variable can help in optimizing the code. This and this link will help you in order to achieve that.
Just make sure that Tesseract is in proper directory, run the code according the comments and guidelines.
Smaple -
Enter the Folder name containing Images: <Name of Folder>
Enter your desired output location: <Name of Folder>
Output
Image containing Text
After Extraction
Made by Vybhav Chaturvedi