Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzer identifies Portuguese phone number as US bank account #1341

Open
Gasewtag opened this issue Mar 22, 2024 · 1 comment
Open

Analyzer identifies Portuguese phone number as US bank account #1341

Gasewtag opened this issue Mar 22, 2024 · 1 comment

Comments

@Gasewtag
Copy link

Describe the bug
Analyzer identifies Portuguese phone number as US bank account

To Reproduce
Steps to reproduce the behavior:

  1. Execute analyzer with the following text: "my name is John Doe my phone number is +351000000000" (please replace zeros with random digits 0-9)

  2. Execute anonymizer and retrieve the following result:

text: my name is my phone number is <US_BANK_NUMBER>
items:
[
{'start': 41, 'end': 57, 'entity_type': 'US_BANK_NUMBER', 'text': '<US_BANK_NUMBER>', 'operator': 'replace'},
{'start': 11, 'end': 20, 'entity_type': 'PERSON', 'text': '', 'operator': 'replace'}
]

Expected behavior:
my name is my phone number is <PHONE_NUMBER>

@omri374
Copy link
Contributor

omri374 commented Mar 24, 2024

The vanilla phone numbers recognizer supports a subset of the countries:

DEFAULT_SUPPORTED_REGIONS = ("US", "UK", "DE", "FE", "IL", "IN", "CA", "BR")

Could you please try to add Portugal (if I got the country code right) and check again?
Example code:

from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.predefined_recognizers import PhoneRecognizer

analyzer = AnalyzerEngine()

# Remove default phone recognizer
analyzer.registry.remove_recognizer("PhoneRecognizer")

# Add custom one (which supports numbers starting with +351)
pt_phone_recognizer = PhoneRecognizer(supported_regions=["PT"])
analyzer.registry.add_recognizer(pt_phone_recognizer)

analyzer.analyze("my name is John Doe my phone number is +351000000000", language="en")

# Note that this is still not detected as a phone number because the number is not a valid Portuguese phone number. If I try another phone number, it works:

analyzer.analyze(text="my name is John Doe my phone number is +351210493000", language="en", score_threshold=0.4)

Output:

[type: PERSON, start: 11, end: 19, score: 0.85,
 type: PHONE_NUMBER, start: 39, end: 52, score: 0.75]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants