Skip to content

Latest commit

 

History

History
12 lines (12 loc) · 308 Bytes

README.md

File metadata and controls

12 lines (12 loc) · 308 Bytes

Extract text from images

Goals

  • Able to read PDF, JPEG, PNG
  • Maintain formatting
  • Output to unique .html file
  • Run in terminal
  • Minimize packages to install

Layout

  • pdf2image and PIL to read files
  • pytesseract to read characters
    • 2 niche installs is too many
  • train TensorFlow model