Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 285 Bytes

README.md

File metadata and controls

5 lines (3 loc) · 285 Bytes

PDF-Text-Extraction

Notebook showing 4 methods for extracting text from pdf files using the python packages PyPdf2, Pdfminer.six, PyMuPdf, and Grobid.

Levenshtein distance, cosine similarity, tf-idf similarity, and processing time are compared for the text output of each method.