feat: add fuzzy name matching during id checks #189

Judge40 · 2024-08-04T08:31:56Z

The identity checks rely on matching the forename and surname between TIS and the provided DSP credential. As there is a high percentage chance that names will mismatch between spelling mistakes and inclusive/exclusion of middle name, the name checking needs to be more forgiving.

Update the VerificationService to check the phonetic and text accuracies and make a decision based on the similarity rather than exact matching. Split names on spaces and hypens to allow matching when middle names, or double barrelled names are used.

Publish metrics for both phonetic and text accuracy to Cloudwatch to allow monitoring and ongoing improvements to be made to the level of fuzziness allowed.

TIS21-6065

Judge40 · 2024-08-05T08:05:28Z

~~Converting to draft as I've missed some changes from the commit~~

The identity checks rely on matching the forename and surname between TIS and the provided DSP credential. As there is a high percentage chance that names will mismatch between spelling mistakes and inclusive/exclusion of middle name, the name checking needs to be more forgiving. Update the VerificationService to check the phonetic and text accuracies and make a decision based on the similarity rather than exact matching. Split names on spaces and hypens to allow matching when middle names, or double barrelled names are used. Publish metrics for both phonetic and text accuracy to Cloudwatch to allow monitoring and ongoing improvements to be made to the level of fuzziness allowed. TIS21-

ReubenRobertsHEE · 2024-08-06T08:49:28Z

I was wondering whether the cloudwatch metrics will make it easy to see what names were being compared when the closeness score is logged (and maybe the TIS id for the trainee as well)?

Judge40 · 2024-08-06T08:58:29Z

I was wondering whether the cloudwatch metrics will make it easy to see what names were being compared when the closeness score is logged (and maybe the TIS id for the trainee as well)?

We could use dimensions, but I'm not sure if it really fits. I don't think having the names available in cloudwatch is really that important though, it's more about whether we're (broadly) too lax or too strict.

There a definite argument for separating by first name and surname though.
Maybe a switch to metric names of identity.accuracy.forenames and identity.accuracy.surname with dimensions of AccuracyType:text and AccuracyType:phonetic.
It's hard to know when we don't actually have a reporting requirement, just trying to capture data for future needs.

ReubenRobertsHEE · 2024-08-06T09:01:38Z

Ok, that makes sense. I wonder if including a general log (maybe at debug level) of the names and outcome details would be useful though, to help identify any unexpected mis/matches?

Judge40 · 2024-08-06T09:05:13Z

Ok, that makes sense. I wonder if including a general log (maybe at debug level) of the names and outcome details would be useful though, to help identify any unexpected mis/matches?

I did intend to do that for failures 🙈
I'll add a bit of debug logging.

Split the identity accuracy metrics in to forename and surname instead of combined. Two dimensions should be used, combining in to four sets of metrics. - `AnalysisType = phonetic|text` - `NameType = forename|surname` TIS21-6065 TIS21-6092

The micrometer meters do not support publishing of minimum values, only max and average (of those we care about for accuracy). As a result if accuracy is reported on then we can only ever see the best matches, but being able to monitor the worst matches is more useful to us. Update the meter configuration to rename the metric from `identity.accuracy` to `identity.inaccuracy`. Update the MetricsService to invert the accuracy value before publishing. TIS21-6065 TIS21-6092

sonarqubecloud · 2024-08-08T16:16:00Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

ReubenRobertsHEE

LGTM

Judge40 requested a review from a team August 4, 2024 11:40

Judge40 marked this pull request as draft August 5, 2024 08:05

Judge40 force-pushed the feat/fuzzyNameVerification branch 2 times, most recently from 0235a06 to 2462381 Compare August 5, 2024 14:07

Judge40 force-pushed the feat/fuzzyNameVerification branch from 2462381 to c6d3e1c Compare August 5, 2024 14:11

Judge40 marked this pull request as ready for review August 5, 2024 14:14

Judge40 added 2 commits August 8, 2024 16:06

refactor: split forename and surname metrics

9772e89

Split the identity accuracy metrics in to forename and surname instead of combined. Two dimensions should be used, combining in to four sets of metrics. - `AnalysisType = phonetic|text` - `NameType = forename|surname` TIS21-6065 TIS21-6092

Judge40 force-pushed the feat/fuzzyNameVerification branch from 9c53381 to 5f82740 Compare August 8, 2024 16:13

ReubenRobertsHEE approved these changes Aug 9, 2024

View reviewed changes

Judge40 merged commit 7d6f73f into main Aug 9, 2024
3 checks passed

Judge40 deleted the feat/fuzzyNameVerification branch August 9, 2024 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fuzzy name matching during id checks #189

feat: add fuzzy name matching during id checks #189

Judge40 commented Aug 4, 2024

Judge40 commented Aug 5, 2024 •

edited

Loading

ReubenRobertsHEE commented Aug 6, 2024

Judge40 commented Aug 6, 2024

ReubenRobertsHEE commented Aug 6, 2024

Judge40 commented Aug 6, 2024

sonarqubecloud bot commented Aug 8, 2024

ReubenRobertsHEE left a comment

feat: add fuzzy name matching during id checks #189

feat: add fuzzy name matching during id checks #189

Conversation

Judge40 commented Aug 4, 2024

Judge40 commented Aug 5, 2024 • edited Loading

ReubenRobertsHEE commented Aug 6, 2024

Judge40 commented Aug 6, 2024

ReubenRobertsHEE commented Aug 6, 2024

Judge40 commented Aug 6, 2024

sonarqubecloud bot commented Aug 8, 2024

Quality Gate passed

ReubenRobertsHEE left a comment

Choose a reason for hiding this comment

Judge40 commented Aug 5, 2024 •

edited

Loading