-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
list index out of range in to_pandas() #2979
Comments
And finally what data can be extracted:
outputs
close, but no cigar. |
I just tested my theory adding:
which produces:
Before it fails as before. |
This is what the fix achieves: doc=fitz.open("118.pdf")
page=doc[0]
tab=page.find_tables()[0]
tab.to_pandas()
Severity class Exposure class Controllability class Col3 Col4
0 None None C1 C2 C3
1 S1 E1 QM QM QM
2 None E2 QM QM QM
3 None E3 QM QM A
4 None E4 QM A B
5 S2 E1 QM QM QM
6 None E2 QM QM A
7 None E3 QM A B
8 None E4 A B C
9 S3 E1 QM QM Aa
10 None E2 QM A B
11 None E3 A B C
12 None E4 B C D
13 a See 6.4.3.11. None None None None The label "fix developed" means that a rollout schedule still needs to be decided. |
Heh! Cool. Which branch is the fix in? I'll test it if I can find it :) But it can wait :) Thanks! |
Thanks for your willingness! |
The branch is named "Fix-table-issues". |
I dropped in the file and presto! It's now a complete representation of the table! Yeah! Thanks! I'll try to take some time in the coming days to look at your changes. I think they could benefit a number of other projects, too :) I'm working with a lot of industry docs. Should I report this kind of thing as a matter of course? I'll try to determine fixes myself if I grok what's happened in this case. |
Thank you for testing and confirmation!
Großartig wäre das 😉! I indeed made other changes. For example, I am now identifying areas that contain connected vector graphics elements and treat the rectangle hull as additional, "virtual" lines. This makes more tables detectable like this kind of thing Many table detectors fail because of missing left and right cell borders. |
Fixed in 1.23.13. |
Description of the bug
I am able to read the included pdf and extract tables but the to_pandas function produces:
118.pdf
How to reproduce the bug
Python 3.10.12
Using the uploaded file:
outputs
PyMuPDF version
1.23.8
Operating system
Linux
Python version
3.10
The text was updated successfully, but these errors were encountered: