Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLDR 531 pdf_txtlayer_reader table fix #380

Merged
merged 14 commits into from
Dec 1, 2023

Conversation

raxtemur
Copy link
Collaborator

No description provided.

@NastyBoget
Copy link
Collaborator

Давай еще сделаем unit тест с простеньким файликом с табличкой (можно из ишшьи взять)

@NastyBoget NastyBoget changed the base branch from master to develop November 30, 2023 09:39
* deleted path_cells at all
* fixed path creating
* one small bug fixed
* if debug_modeis not chossen in test config, then test doesn't pass
* path_detect forword fix
* some debug_mode and path_debug bugs fixed
@@ -52,7 +52,7 @@ def __init__(self, *, config: dict) -> None:
self.binarizer = AdaptiveBinarizer()
self.ocr = OCRLineExtractor(config=config)
self.logger = config.get("logger", logging.getLogger())
if self.config.get("debug_mode") and not os.path.exists(self.config["path_debug"]):
if self.config.get("debug_mode", False) and not os.path.exists(self.config["path_debug"]):
Copy link
Collaborator

@NastyBoget NastyBoget Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Насчет debug_mode:

выбери что тебе больше нравится: config.get("debug_mode") или config.get("debug_mode", False) (или что проще править), и давай сделаем везде одинаково. Хотя все равно в процессе разработки кто как будет писать.

Насчет path_debug лучше просто .get("path_debug")

raxtemur and others added 2 commits December 1, 2023 10:25
* path_detect forword style changed
* img_processing.pt:146 - get_config()["debug_mode"] fixed
* ocr_cell_extractor.py - changed if False to if NoneType is None
@ispras ispras deleted a comment from raxtemur Dec 1, 2023
@NastyBoget NastyBoget merged commit 1fefda5 into develop Dec 1, 2023
2 checks passed
@NastyBoget NastyBoget deleted the TLDR-531_PdfTxtlayerReader_fix branch December 1, 2023 13:00
NastyBoget added a commit that referenced this pull request Dec 25, 2023
* TLDR 531 pdf_txtlayer_reader table fix (#380)

* TLDR-538 tesseract trustai (#377)

* fixed training script (#383)

* TLDR-521 Fix splittext for file names with several dots (#385)

* TLDR-527 refactor methods and parameters for all main classes (#387)

* Add attach and table annotations to PPTX (#389)

* TLDR-544 docx bugs (#382)

* TLDR-516 GPU in docker (#384)

* new version 2.0 (#390)

---------

Co-authored-by: raxtemur <[email protected]>
Co-authored-by: Oksana Belyaeva <[email protected]>
Co-authored-by: Alexander Golodkov <[email protected]>
Co-authored-by: Alexander Golodkov <[email protected]>
Co-authored-by: Nikita Shevtsov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants