Refactor PHI deidentifier DeidentificationStep schema #168

tschaffter · 2021-02-12T22:41:41Z

Is your proposal related to a problem?

There are several issues with the current design of DeidentificationStep :

Issue 1. Currently the user can specify multiple masking strategy for this step, but only one must be allowed.
Issue 2. The user can specify to use the "dateOffsetConfig" for non-date annotation.

Additional point to review:

The current design allows to specify one confidence level threshold for multiple annotator, but the current design is flexible and could be used to specify one different threshold for each annotator.

Even though we ask the annotators to output a confidence value between 0 and 100, not all annotators may distribute their confidence levels using this full range. This has been observed in DREAM Challenges, where one method may distribute its confidence level around 30 (arbitrary example), while another method may distribute its confidence level differently. Therefore, the user of the PHI Deidentifier API would need to know information about the distribution of the confidence level of a given annotator in order to identify a meaningful confidence level threshold for it.

Describe the solution you'd like

Come up with a schemas update that fixes Issue 1 and Issue 2.

tschaffter added Enhancement New feature or request Priority: Low labels Feb 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor PHI deidentifier DeidentificationStep schema #168

Refactor PHI deidentifier DeidentificationStep schema #168

tschaffter commented Feb 12, 2021

Refactor PHI deidentifier DeidentificationStep schema #168

Refactor PHI deidentifier DeidentificationStep schema #168

Comments

tschaffter commented Feb 12, 2021

Is your proposal related to a problem?

Describe the solution you'd like