Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLDR 531 pdf_txtlayer_reader table fix #380

Merged
merged 14 commits into from
Dec 1, 2023
Prev Previous commit
Next Next commit
TLDR-531 deleted my extra test
*corresponding tests are in test_module_table_detection
raxtemur committed Nov 29, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 25afbeb8a568661f922aba5af09974ebe6669841
7 changes: 0 additions & 7 deletions tests/unit_tests/test_format_pdf_reader.py
Original file line number Diff line number Diff line change
@@ -140,10 +140,3 @@ def test_pdf_text_layer(self) -> None:
annotations = line.annotations
annotations_set = {(a.name, a.value, a.start, a.end) for a in annotations}
self.assertEqual(len(annotations_set), len(annotations))

def test_table_extractor(self) -> None:
config = {} # Has to work without config
any_doc_reader = PdfTxtlayerReader(config=config)
path = os.path.join(os.path.dirname(__file__), "../data/pdf_with_text_layer/english_doc.pdf")
result = any_doc_reader.read(path, document_type=None, parameters={"need_pdf_table_analysis": "True"})
self.assertEqual(len(result.tables), 0)