Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Multi-Server Person and Participant Relationships and Globally Unique IDs #91

Open
mingward opened this issue Jan 3, 2025 · 1 comment
Assignees

Comments

@mingward
Copy link

mingward commented Jan 3, 2025

Context

There are two primary questions regarding the current Implementation Guide (IG):

  1. Person-to-Participant Uniqueness

    • On the same server, if a Person links to a Participant resource, shoud we ensure that no other Person also link to that same Participant?
  2. Where NCPI Person Lives

    • In multi-server situations(such as KidsFirst, dbGaP and ImmPort), if NCPI Participant resources reside across multiple servers are actually the same Person, where does the corresponding NCPI Person resource “live”?

Proposal

  1. Every Participant has a Person on the same server
  2. A Person only links to Participants on the same server
    • Prevents cross-server references from Person to Participant.
  3. A Person only links to Person resources on different servers
    • Cross-server references occur only between Person resources, not between Person and Participant.
  4. If a Person links to a Participant, then no other Person links to that Participant This is on the same server.

Clarification of "Globally Unique ID"

"ID should be a globally unique identifier associated with the person. This practice is intended to make constructing queries for the same person compatible across different servers (such as QA vs PROD) but also to make the resource URLs more meaningful."

  • “Globally” refers to the entire NCPI ecosystem (including ANVIL, BDC, CRDC, dbGaP, ImmPort, KidsFirst, etc.), not just QA vs PROD, right?

Globaly ID also has interoperability imact
For Example: We know that ImmPort has people overlapping with dbGaP, for example "actual person X" exists in both. We propose this to be represented as:
a. “dbGaP Person- DG1” links to an “ImmPort Person-IP1" and vice versa.
b. This bi-directional link can be obtained by each server having cron code checking external servers for Person.link to the home server.

Summary of Intended Approach

  • Within a single server: PersonParticipant
  • Across multiple servers: PersonPerson
  • Each server maintains a list of external servers to identify and establish cross-server Person references.

Please let me know if anything is unclear or if further details are needed. Thank you!

@RobertJCarroll
Copy link
Contributor

Regarding the proposal, that makes sense to me. I don't think we have it explicitly stated that Persons on a server should behave this way, but it is the "right" approach in my opinion. We should make this clear in the documentation.

For the cross-server person linkage: there's no guarantee at this time that every Participant is contained within a Person. I think we'd either need to make this true or relax the constraint. I understand why it's helpful to architect it this way, but there are some potential issues. For example, what happens when we merge or split Persons? It's a classic MRN issue that can be challenging. We should discuss more. This also will impact AnVIL / dbGaP as data may be available in both places in the future.

We will need to adjust our NCPI Person profile to allow both NCPI Participants and NCPI Persons- this is in line with the Base Person resource.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants