-
Notifications
You must be signed in to change notification settings - Fork 1
Data validation tools to prevent duplicates/misspellings #74
Comments
also submitted by @ChristiaanScheermeijer |
Scope for M24 is: import existing relations from IMSLP/Wikipedoa/MusicBrainz |
Preventing duplicates for external sources can be reached by adding a unique node property constraint on the source property for example. The identifier field can also be used, but is now filled with a uuid. The identifier (based on Thing) could also be the source uri. Are there any implications when doing this? If this is not a desirable solution, clients should check for existing nodes before inserting. |
For duplicates from the same database, we can use the For M24 we will import data from:
If any of these sources has existing metadata links to any other source, we will use skos:ExactMatch to say that these items are the same. The next part of this task (which for now will probably be out of the scope of M24) is to match items when there are no existing relationships (e.g. an artist on MusicBrainz and muziekweb which is the same, but has no common links to each other or through viaf/worldcat, etc). This matching will require some kind of heuristic (edit distance), or could be a crowd-sourcing task. |
@alastair, in a recent version of neo4j-graphql-js it is possible to add a |
I also suggest that we add some custom mutations making it easier to "tag" nodes related to each other. Now we would need to perform multiple queries/mutations to create a bi-directional relationship between two nodes. p1:Person-[:EXACT_MATCH]->p2:Person
p2:Person-[:EXACT_MATCH]->p1:Person type _matchInput {
identifier: ID!
}
type _matchResult {
fromIdentifier: ID!
toIdentifier: ID!
}
type Mutation {
AddBroadMatch(from: _matchInput!, to: _matchInput!) : _matchResult
AddCloseMatch(from: _matchInput!, to: _matchInput!) : _matchResult
AddExactMatch(from: _matchInput!, to: _matchInput!) : _matchResult
AddNarrowMatch(from: _matchInput!, to: _matchInput!) : _matchResult
AddRelatedMatch(from: _matchInput!, to: _matchInput!) : _matchResult
RemoveBroadMatch(from: _matchInput!, to: _matchInput!) : _matchResult
RemoveCloseMatch(from: _matchInput!, to: _matchInput!) : _matchResult
RemoveExactMatch(from: _matchInput!, to: _matchInput!) : _matchResult
RemoveNarrowMatch(from: _matchInput!, to: _matchInput!) : _matchResult
RemoveRelatedMatch(from: _matchInput!, to: _matchInput!) : _matchResult
} |
@CasperCDR @alastair we are now running a recent version of the neo4j-graphql-js which supports the |
We have @unique on I don't know a good way (other than being careful with our code) to ensure that we don't import the same data multiple times. |
submitted by UPF, relevant for Scholars & enthusiasts use-cases. awarded 3 dots, assigned to @alastair
The text was updated successfully, but these errors were encountered: