Skip to content
This repository has been archived by the owner on Jan 31, 2025. It is now read-only.

feat: add fuzzy name matching during id checks #189

Merged
merged 3 commits into from
Aug 9, 2024

Conversation

Judge40
Copy link
Contributor

@Judge40 Judge40 commented Aug 4, 2024

The identity checks rely on matching the forename and surname between TIS and the provided DSP credential. As there is a high percentage chance that names will mismatch between spelling mistakes and inclusive/exclusion of middle name, the name checking needs to be more forgiving.

Update the VerificationService to check the phonetic and text accuracies and make a decision based on the similarity rather than exact matching. Split names on spaces and hypens to allow matching when middle names, or double barrelled names are used.

Publish metrics for both phonetic and text accuracy to Cloudwatch to allow monitoring and ongoing improvements to be made to the level of fuzziness allowed.

TIS21-6065

@Judge40 Judge40 requested a review from a team August 4, 2024 11:40
@Judge40 Judge40 marked this pull request as draft August 5, 2024 08:05
@Judge40
Copy link
Contributor Author

Judge40 commented Aug 5, 2024

Converting to draft as I've missed some changes from the commit

@Judge40 Judge40 force-pushed the feat/fuzzyNameVerification branch 2 times, most recently from 0235a06 to 2462381 Compare August 5, 2024 14:07
The identity checks rely on matching the forename and surname between
TIS and the provided DSP credential. As there is a high percentage
chance that names will mismatch between spelling mistakes and
inclusive/exclusion of middle name, the name checking needs to be more
forgiving.

Update the VerificationService to check the phonetic and text accuracies
and make a decision based on the similarity rather than exact matching.
Split names on spaces and hypens to allow matching when middle names, or
double barrelled names are used.

Publish metrics for both phonetic and text accuracy to Cloudwatch to
allow monitoring and ongoing improvements to be made to the level of
fuzziness allowed.

TIS21-
@Judge40 Judge40 force-pushed the feat/fuzzyNameVerification branch from 2462381 to c6d3e1c Compare August 5, 2024 14:11
@Judge40 Judge40 marked this pull request as ready for review August 5, 2024 14:14
@ReubenRobertsHEE
Copy link
Contributor

I was wondering whether the cloudwatch metrics will make it easy to see what names were being compared when the closeness score is logged (and maybe the TIS id for the trainee as well)?

@Judge40
Copy link
Contributor Author

Judge40 commented Aug 6, 2024

I was wondering whether the cloudwatch metrics will make it easy to see what names were being compared when the closeness score is logged (and maybe the TIS id for the trainee as well)?

We could use dimensions, but I'm not sure if it really fits. I don't think having the names available in cloudwatch is really that important though, it's more about whether we're (broadly) too lax or too strict.

There a definite argument for separating by first name and surname though.
Maybe a switch to metric names of identity.accuracy.forenames and identity.accuracy.surname with dimensions of AccuracyType:text and AccuracyType:phonetic.
It's hard to know when we don't actually have a reporting requirement, just trying to capture data for future needs.

@ReubenRobertsHEE
Copy link
Contributor

Ok, that makes sense. I wonder if including a general log (maybe at debug level) of the names and outcome details would be useful though, to help identify any unexpected mis/matches?

@Judge40
Copy link
Contributor Author

Judge40 commented Aug 6, 2024

Ok, that makes sense. I wonder if including a general log (maybe at debug level) of the names and outcome details would be useful though, to help identify any unexpected mis/matches?

I did intend to do that for failures 🙈
I'll add a bit of debug logging.

Judge40 added 2 commits August 8, 2024 16:06
Split the identity accuracy metrics in to forename and surname instead
of combined.
Two dimensions should be used, combining in to four sets of metrics.
 - `AnalysisType = phonetic|text`
 - `NameType = forename|surname`

TIS21-6065
TIS21-6092
The micrometer meters do not support publishing of minimum values, only
max and average (of those we care about for accuracy). As a result if
accuracy is reported on then we can only ever see the best matches, but
being able to monitor the worst matches is more useful to us.

Update the meter configuration to rename the metric from
`identity.accuracy` to `identity.inaccuracy`.
Update the MetricsService to invert the accuracy value before
publishing.

TIS21-6065
TIS21-6092
@Judge40 Judge40 force-pushed the feat/fuzzyNameVerification branch from 9c53381 to 5f82740 Compare August 8, 2024 16:13
Copy link

sonarqubecloud bot commented Aug 8, 2024

Copy link
Contributor

@ReubenRobertsHEE ReubenRobertsHEE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Judge40 Judge40 merged commit 7d6f73f into main Aug 9, 2024
3 checks passed
@Judge40 Judge40 deleted the feat/fuzzyNameVerification branch August 9, 2024 14:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants