-
-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NER PII Masking module #319
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # autorag/nodes/passagefilter/__init__.py # autorag/support.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus, are you sure NER PII Masking is 'passage filter' node? I think it is not.
Passage filter means that 'decreasing the context' while not modifying each contexts content itself.
I think we have to make new node... like privacy filter or something...?
Plus, it is tricky to setup metric for privacy filter isn't it?
response = model(text) | ||
for entry in response: | ||
entity_group_tag = f"[{entry['entity_group']}_{entry['start']}]" | ||
new_text = new_text.replace(entry["word"], entity_group_tag).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there 'start' and 'end' index of detected entity? Try to use that because there can be duplicated entities in one sentence.
def mask_pii(model, text: str) -> str: | ||
new_text = text | ||
response = model(text) | ||
for entry in response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus, you are trying to mask 'ALL' entity types? It can mask Location, person name, or any private entity types. But masking all entity types.
It can be remained literally 'no knowledge' in the sentence.
I agree with you ... I'll give it a little more time and think about it ,,,, |
return masked_contents_list, ids_list, scores_list | ||
|
||
|
||
async def mask_pii(model, contents: List[str]) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is useless async operation.
I'll make a PR for resolving this kind of issue in the model rerankers...
close #314
Use a Hugging Face NER model for PII Masking.