Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recognizers (et, lt, pl) #1215

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bckamil
Copy link
Contributor

@bckamil bckamil commented Nov 20, 2023

Change Description

Added:

  • ET IK recognizer
  • LT national identification number recognizer
  • PL identity card recognizer

You can verify recognizers using:

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

@omri374
Copy link
Contributor

omri374 commented Nov 21, 2023

/azp run

@omri374
Copy link
Contributor

omri374 commented Nov 21, 2023

Thanks! We'll review this shortly. Please also add the new entities to docs/supported_entities.md

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@omri374 omri374 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some minor comments.

from presidio_analyzer import Pattern, PatternRecognizer


class EtIkRecognizer(PatternRecognizer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we / should we create a class which supports all those entities with the same logic, and have the different specific country implementations inherit those? What are your thoughts?

),
]

CONTEXT = ["asmens kodas"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of context works better with unigrams. Can we separate this into "asmens", "kodas" or one of those, in addition to the existing "asmens kodas"?

("37102250382", 1, ((0, 11),),),
# invalid identity card scores
("37132250382", 0, ()),
("99999999999", 0, ()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case with surrounding text

("33309240064", 1, ((0, 11),),),
# invalid identity card scores
("33309240063", 0, ()),
("99999999999", 0, ()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, please add a test case with surrounding text

@omri374
Copy link
Contributor

omri374 commented Dec 24, 2023

@bckamil are you interested in completing this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants