Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLDR-645 added grobid #422

Merged
merged 13 commits into from
Apr 17, 2024
Merged

TLDR-645 added grobid #422

merged 13 commits into from
Apr 17, 2024

Conversation

oksidgy
Copy link
Collaborator

@oksidgy oksidgy commented Apr 9, 2024

  • added ArticleReader (grobid parsing)
  • added checking GROBID service working
  • added ArticleStructureExctrator
  • added grobid service into docker-compose.yml
  • added ReferenceAnnotation class
  • added title field into Table class
  • added information about grobid into documentation
  • added test for ArticleReader checking

Additionally:

  • fixed bug with function with get_mime_extension (The return values mime and extension were mixed up)
  • rename function get_hl_list_using_regexp of class DefaultStructureExtractor

@oksidgy oksidgy requested a review from NastyBoget April 9, 2024 12:56
docs/source/index.rst Outdated Show resolved Hide resolved
@oksidgy oksidgy merged commit dc26240 into develop Apr 17, 2024
3 checks passed
@oksidgy oksidgy deleted the TLDR-645_added_grobbid branch April 17, 2024 08:21
NastyBoget added a commit that referenced this pull request Apr 17, 2024
* Improve speed of partial PDF extraction (#418)

* Some attachment refactoring (#420)

* Add PDF performance script (#419)

* Fix infinite loop in PdfTabbyReader (#421)

*  Added article type using grobid (#422)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants