-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADD] Pdfplumber support #391
Conversation
Propably need to tell the tests to install pdfplumber. |
65f3d3c
to
61d493e
Compare
All ✔️ now! |
depends upon: #395 |
f5e4c18
to
dd29930
Compare
5b5ea25
to
e07fc07
Compare
Looks good except for the list of commits in your In my personal taste we shouldn't merge such commits into My personal suggestion / preference would be to squash all those commits into a single one with some clear subject. |
@@ -56,6 +56,7 @@ Choose any of the following input readers: | |||
- pdftotext `invoice2data --input-reader pdftotext invoice.pdf` | |||
- tesseract `invoice2data --input-reader tesseract invoice.pdf` | |||
- pdfminer.six `invoice2data --input-reader pdfminer invoice.pdf` | |||
- pdf plumber `invoice2data --input-reader pdfplumber invoice.pdf` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: I'd replace pdf plumber
with pdfplumber
for consistency.
@rmilecki |
This is a highly opinionated basic implementation of pdfplumber.
I've created this in an attempt to parse invoices with an html style table as mentioned in #359
It's working, but currently I'm developing an integration with camelot.
As Camelot allows more complex table structures to be parsed and a gui to help build the templates.
Creating this PR in case someone needs it.