Skip to content

Commit

Permalink
TLDR-599 re-label datasets (#412)
Browse files Browse the repository at this point in the history
* TLDR-616 add parsing parameters saving to tasker, fix bugs

* TXT images creator fixed

* Fix tasker for diplomas

* Fix tests, speed up txt_images_creator
  • Loading branch information
NastyBoget authored Mar 14, 2024
1 parent 75b6720 commit c48b186
Show file tree
Hide file tree
Showing 16 changed files with 8,918 additions and 4,916 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test_labeling.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: CI
name: CI labeling

# Controls when the action will run.
on:
Expand Down
3 changes: 3 additions & 0 deletions dedoc/structure_extractors/abstract_structure_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ def _postprocess(self, lines: List[LineWithMeta], paragraph_type: List[str], reg
:param excluding_regexps: list of filtering garbage regular pattern according to list of paragraph types
:return: new post-processed list of LineWithMeta
"""
if self.config.get("labeling_mode", False):
return lines

result = []
for line in lines:
if line.metadata.hierarchy_level.is_raw_text() and len(line.line) == 0: # skip empty raw text
Expand Down
Loading

0 comments on commit c48b186

Please sign in to comment.