Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: batch ner #1541

Open
wants to merge 5 commits into
base: chore/ner-only-nlp
Choose a base branch
from
Open

feature: batch ner #1541

wants to merge 5 commits into from

Conversation

ClemDoum
Copy link
Contributor

@ClemDoum ClemDoum commented Sep 5, 2024

TODO

PR description

Implemenent batch text processing for NER, this change is made in the context of #1452, as batch processing is necessary for Spacy.

Changes

datashare-api ⚠️

Added

  • added the batch text processing API List<List<NlpTag>> processText(Stream<String> batch, Language language) throws InterruptedException to Pipeline

Changed

  • made NlpTag a record and json serializable class

datashare-core-nlp

Added

  • implemented batch processing for stanford core nlp)

datashare-app

Added

  • added bool Pipeline.Type.extractFromDoc() which indicates if the pipeline should preferrably used on full documents or can be used on text chunks
  • implemented batch text processing inside the ExtractNlpTask for pipelines which do not require prediction on documents

@ClemDoum ClemDoum force-pushed the feature/batch-ner branch 7 times, most recently from e815dd3 to 15b51b8 Compare September 5, 2024 09:53
@ClemDoum ClemDoum marked this pull request as ready for review September 5, 2024 10:44
@ClemDoum ClemDoum requested a review from a team September 9, 2024 07:41
@ClemDoum ClemDoum self-assigned this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant