-
Notifications
You must be signed in to change notification settings - Fork 7
Neurobagel Data Model limitations
During https://github.com/neurobagel/bulk_annotations/issues/10 we discovered a couple of limitations of our data model. The idea of this page is to keep a record of them, so we don't have to discover the same limitation twice. Some limiations are expected and intentional, we don't want to model everything we can because we want a balance of usefulness and complexity.
Phenotypic data is modeled at the subject level, not at the session level. This is mainly due to BIDS, as longitudinal phenotypic data in BIDS are currently not defined in the spec.
Problems this causes
- cannot model a change in diagnosis over time. If a participant has a DX of "yes autism" on session 1, but then a DX of "no autism" on session 2, we currently have to pick one. And then the participant will show up with all of their sessions for only that sessions profile.
For example, I have a column about diagnosis. The correct ontology is "SNOMED-CT". My column has four levels:
- Schizophrenia
- Alzheimer's Disease
- Sibling with Schizophrenia
The last one is not in SNOMED (one could argue that this is more a modeling issue than an ontology issue). But in my data dictionary, I need to map this value to something. The only option I currently have is to declare it as "missing value" - which is probably not a good option.
The alternative would be to "force" annotators to pick the "next closest" term from SNOMED-CT, but this comes with the cost of reducing annotation accuracy.
- relations. For example, I cannot model that my participant has a sibling with a diagnosis of "Schizophrenia"
- medication status. E.g. cannot say: "my participant received Drug XYZ"
- assessments that aren't tied to a specific tool. e.g. "Body Mass Index" is not a tool in the same way that "blood pressure" is not a tool. But still relevant info
Right now we think of assessment tools as something a participant either has (successfully completed) or doesn't have (either completely lacking or incomplete).
The way we implement it is that an assessment tool is counted as "available" for a participant if:
- there is one or multiple columns that are
partOf
the assessment tool - the row for the participant in question does not have
any
missing values
Or in other words: all columns that are partOf
the tool,
must not have missing values for the tool to be counted
as available.