-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add fuzzy name matching during id checks #189
Conversation
|
0235a06
to
2462381
Compare
The identity checks rely on matching the forename and surname between TIS and the provided DSP credential. As there is a high percentage chance that names will mismatch between spelling mistakes and inclusive/exclusion of middle name, the name checking needs to be more forgiving. Update the VerificationService to check the phonetic and text accuracies and make a decision based on the similarity rather than exact matching. Split names on spaces and hypens to allow matching when middle names, or double barrelled names are used. Publish metrics for both phonetic and text accuracy to Cloudwatch to allow monitoring and ongoing improvements to be made to the level of fuzziness allowed. TIS21-
2462381
to
c6d3e1c
Compare
I was wondering whether the cloudwatch metrics will make it easy to see what names were being compared when the closeness score is logged (and maybe the TIS id for the trainee as well)? |
We could use dimensions, but I'm not sure if it really fits. I don't think having the names available in cloudwatch is really that important though, it's more about whether we're (broadly) too lax or too strict. There a definite argument for separating by first name and surname though. |
Ok, that makes sense. I wonder if including a general log (maybe at debug level) of the names and outcome details would be useful though, to help identify any unexpected mis/matches? |
I did intend to do that for failures 🙈 |
Split the identity accuracy metrics in to forename and surname instead of combined. Two dimensions should be used, combining in to four sets of metrics. - `AnalysisType = phonetic|text` - `NameType = forename|surname` TIS21-6065 TIS21-6092
The micrometer meters do not support publishing of minimum values, only max and average (of those we care about for accuracy). As a result if accuracy is reported on then we can only ever see the best matches, but being able to monitor the worst matches is more useful to us. Update the meter configuration to rename the metric from `identity.accuracy` to `identity.inaccuracy`. Update the MetricsService to invert the accuracy value before publishing. TIS21-6065 TIS21-6092
9c53381
to
5f82740
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The identity checks rely on matching the forename and surname between TIS and the provided DSP credential. As there is a high percentage chance that names will mismatch between spelling mistakes and inclusive/exclusion of middle name, the name checking needs to be more forgiving.
Update the VerificationService to check the phonetic and text accuracies and make a decision based on the similarity rather than exact matching. Split names on spaces and hypens to allow matching when middle names, or double barrelled names are used.
Publish metrics for both phonetic and text accuracy to Cloudwatch to allow monitoring and ongoing improvements to be made to the level of fuzziness allowed.
TIS21-6065