Add script that uses pdf2image and pytesseract to extract text from PDFs #10

jfoo1984 · 2024-09-19T19:11:39Z

These need to be installed for the script to work

brew install poppler
pip install pdf2image
brew install tesseract
pip install pytesseract

…ctory into text files

…m PDFs

jfoo1984 added 4 commits September 19, 2024 12:10

Add script that uses pdf2image and pytesseract to extract text from PDFs

7e351b3

Update script to take directory as input and process all pdfs in dire…

6169b74

…ctory into text files

Add pip packages to requirements_dev

6a66fe8

Rename ocr script, add script that uses pdftotext to extract text fro…

3e3efae

…m PDFs

Provide feedback