-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose similarity score in groups result #180
Comments
So the From the docstrings:
If we were only dealing with two datasets we could simply include the similarity score between the records. However in the multiparty case with Enter @nbgl who I think has thought about this already...? |
We could output a list of scores for each group after the solver step. |
one curve-ball with the multiparty greedy solver is that if the merge_treshold is smaller than 1, then it might put two entities into the same group, although their pairwise similarity is under the threshold. Do we also need to output to which pair each of these similarity scores belongs to? Just as an idea, since we can compute similarities quite cheaply, we could, instead of modifying the current solver, introduce a new step after solving, which computes all the required similarities again (with threshold set to 0). This shouldn't take long, as the mappings are a small subset of the whole candidate space, it is cleaner, and doesn't introduce overhead into the solver in case it isn't needed. |
TODO: make this about groups rather than mapping. Plan is to deprecate mapping output type. |
This feature is to output the solved mapping while also exposing their similarity scores.This feature is to modify
anonlink
to supportanonlink-entity-service
(and library users) in calculating similarity scores between group members.It may make sense to provide a high level api to compute similarity scores using the group output type, and then recompute the scores between members of the group.
A good chance to refactor the high lever
solver
api.The text was updated successfully, but these errors were encountered: