Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - Presidio Medical Recognizer #1491

Open
RKapadia01 opened this issue Nov 28, 2024 · 2 comments
Open

Feature Request - Presidio Medical Recognizer #1491

RKapadia01 opened this issue Nov 28, 2024 · 2 comments

Comments

@RKapadia01
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I’m working with Presidio in a context where users may input medical and dietary information into a chatbot.

Currently, Presidio does not have built-in support for detecting medical entities such as diseases, medications, and clinical procedures. This limitation required me to implement a custom recognizer to address the need for medical PII detection.

Below is an example of the custom recognizer I’ve built:

from presidio_analyzer import EntityRecognizer, RecognizerResult
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

class ClinicalBERTRecognizer(EntityRecognizer):
    def __init__(self):
        # Download the model from Hugging Face's model hub
        model_name = "blaze999/Medical-NER"

        # Load the model and tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForTokenClassification.from_pretrained(model_name)

        # Create a pipeline for named entity recognition
        self.ner_pipeline = pipeline("ner", model=self.model, tokenizer=self.tokenizer)

        # Define the supported entities
        self.supported_entities = [
            "BIOLOGICAL_ATTRIBUTE",
            "BIOLOGICAL_STRUCTURE",
            "CLINICAL_EVENT",
            "DISEASE_DISORDER",
            "FAMILY_HISTORY",
            "HISTORY",
            "MEDICATION",
            "THERAPEUTIC_PROCEDURE"
        ]

        super().__init__(supported_entities=self.supported_entities)


    def analyze(self, text, entities, nlp_artifacts=None):
        results = []

        # Perform named entity recognition on the input text
        ner_results = self.ner_pipeline(text)

        for entity in ner_results:
            entity_type = entity["entity"].replace("B-", "").replace("I-", "")

            # Check if the entity type is in  the list of supported entities
            if entity_type in self.supported_entities:
                recognizer_result = RecognizerResult(
                    entity_type=entity_type,
                    start=entity["start"],
                    end=entity["end"],
                    score=entity["score"]
                )

                # Create a RecognizerResult object for the entity
                results.append(recognizer_result)

        return results

Describe the solution you'd like
Would there be interest in incorporating a medical domain recognizer into Presidio? If so, I’d be happy to submit a PR with this implementation or a more generalized version.

The recognizer leverages transformer-based models, from Hugging Face, in order to identify clinical entities like diseases, medications, and procedures. This would allow Presidio to support medical and healthcare-related use cases out of the box.

Describe alternatives you've considered

Additional context

@RKapadia01 RKapadia01 changed the title Presidio Medical Recognizer Feature Request - Presidio Medical Recognizer Nov 28, 2024
@omri374
Copy link
Contributor

omri374 commented Nov 30, 2024

Hi, I think that would be a great addition!
It offers a great addition for those interested in PHI and not just PII. Having said that, we should also think about the computational performance and development complexity. Therefore, our suggestion would be to add this to the repo, but not run by default. In addition, we don't install transformers by default to reduce development complexity, so the code should first check if transformers is installed, and skip if it isn't, so that it doesn't break the rest of the package.
For example:

try:
    import transformers
except ImportError:
    transformers = None


is_available = bool(transformers)

We're doing something similar here:

If you're interested in creating a PR, we would be happy to help with anything needed. Thanks!

@RKapadia01
Copy link
Contributor Author

Thanks for the response. Agree with your inputs. Will raise a PR for some feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants