-
Notifications
You must be signed in to change notification settings - Fork 9
Home
Welcome to the MASK_public wiki!
MASK Framework is an open-source framework for de-identification of medical free-text data
In this project, we will develop an open-source framework for automated de-identification of medical textual data. Such data contains information that can be utilized to support clinical research, but its native form contains sensitive personal identifiable information (PII) that should not be accessed by anyone who does not provide direct clinical care.
The project aims to enhance the current processes and build an open-source platform that can be used for flexible masking of personal information, ensuring that de-identified medical text still contains enough information to facilitate research.
In order to facilitate flexibility, the de-identification system has to be configurable by the user in terms of:
- Types of PII that have to be identified in free-text data;
- Approaches to masking of the identified data (keep, redact, map, etc.);
- Disclosure risk analysis that is performed on the data;
- The methodology that is applied for each of the steps.
Mask framework contains two main components, out of which one is used for training and the other is used for application of the framework in order to de-identify texts.
The architecture of the training part is presented in the following image:
The training_framework takes input files and forwards them to the algorithm that user selected for the training from the set of possible algorithms. The output of training is saved as a file (model). This is named entity recognition model.
The architecture of the component for the applying algorithm is presented in the following image:
The input is processes by the framework and forwarded to the configured algorithm for named entity recognition. The recognized entities are then forwarded to configured algorithms for masking and outputted to the configured location.