Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Clearer DOCs for session and participant IDs #226

Merged
merged 11 commits into from
Sep 24, 2024
2 changes: 1 addition & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ You could run the CLI as follows:
Neurobagel is under active, early development and future releases of the CLI may introduce breaking changes to the data model for subject-level information in a `.jsonld` graph file. Breaking changes will be highlighted in the release notes!

_If you have already created `.jsonld` files for your Neurobagel graph database using the CLI_,
they can be quickly re-generated under the new data model by following the instructions [here](maintaining.md#following-a-change-in-the-neurobagel-data-model) so that they will not conflict with dataset `.jsonld` files generated using the latest CLI version.
they can be quickly re-generated under the new data model by following the instructions [here](guide/maintaining.md#following-a-change-in-the-neurobagel-data-model) so that they will not conflict with dataset `.jsonld` files generated using the latest CLI version.


## Development environment
Expand Down
51 changes: 44 additions & 7 deletions docs/data_prep.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,39 @@ please prepare the tabular data for your dataset as a single, tab-separated file

### All datasets

A valid dataset for Neurobagel **must** include a TSV file that describes participant attributes.
The TSV must contain a minimum of two columns: at least one column must contain subject IDs,
and at least one column must describe demographic or other phenotypic information
(for variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md)).
A valid dataset for Neurobagel **MUST** include a TSV file that describes participant attributes.

The TSV:
surchs marked this conversation as resolved.
Show resolved Hide resolved

- MUST contain a minimum of two columns
- MUST contain at least one column with subject IDs

??? note "Only one subject ID column can be annotated"
Neurobagel currently does not support annotating multiple subject ID columns
so you must choose one as the primary ID during annotation

- MUST contain at least one additional column that describes demographic or other phenotypic information
surchs marked this conversation as resolved.
Show resolved Hide resolved
- MAY contain a column with session IDs if the dataset is longitudinal.

??? note "Only one session ID column can be annotated"
Neurobagel currently does not support annotating multiple session ID columns
so you must choose one as the primary ID during annotation

- MUST NOT contain any missing values in the subject ID and session ID (if available) column
- MUST have unique values in the subject ID column OR in the combination of subject ID and session ID columns

for variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md).
surchs marked this conversation as resolved.
Show resolved Hide resolved

### Datasets with imaging (BIDS) data

If a dataset has imaging data in [BIDS](https://bids-specification.readthedocs.io/en/stable/) format,
Neurobagel **additionally** requires that:

- At least one column in the phenotypic TSV contains subject IDs that match the names of [BIDS subject subdirectories](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#filesystem-structure).
If this condition is not met, you will encounter an error when [running the Neurobagel CLI](cli.md) on your dataset to generate Neurobagel graph-ready files, indicating that your BIDS directory contains subjects not found in your phenotypic file.
- At least one column in the phenotypic TSV contains subject IDs that
match the names of [BIDS subject subdirectories](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#filesystem-structure).
If this condition is not met, you will encounter an error
when [running the Neurobagel CLI](cli.md) on your dataset to generate Neurobagel graph-ready files,
indicating that your BIDS directory contains subjects not found in your phenotypic file.

!!! note
Subject IDs are case-sensitive and must match BIDS subject IDs exactly
Expand All @@ -39,7 +60,13 @@ If this condition is not met, you will encounter an error when [running the Neur
- All BIDS subjects are included in the phenotypic TSV,
even if they only have BIDS imaging information.
Neurobagel does not allow for datasets where subjects have BIDS
data but are not represented in the phenotypic TSV (however, subjects who have phenotypic data but no BIDS data are allowed).
data but are not represented in the phenotypic TSV
(however, subjects who have phenotypic data but no BIDS data are allowed).
- If the dataset is longitudinal, the session IDs in the phenotypic TSV
MAY match the session IDs in the BIDS dataset, but don't have to.
If matching session IDs are present in the phenotypic TSV and the BIDS dataset, neurobagel will interpret
this to mean that the phenotypic data is associated with the corresponding BIDS session.
If phenotypic and BIDS session IDs do not match, Neurobagel will treat them as distinct sessions.
surchs marked this conversation as resolved.
Show resolved Hide resolved

## Examples of valid phenotypic TSVs

Expand Down Expand Up @@ -93,6 +120,14 @@ In this case, both types of participant IDs should be recorded in the tabular fi

The only requirement is that **the combination of all ID values for a row is unique**.

!!! Warning "Neurobagel currently supports only one subject ID and one session ID"
Neurobagel does not support multiple subject or session IDs in the same TSV file.
If you have multiple subject or session IDs, you must choose one to use as the primary ID
and include the others as additional columns.
surchs marked this conversation as resolved.
Show resolved Hide resolved


We are planning to support multiple IDs in the future.

Example **invalid** TSV:

| participant_id | alternative_participant_id | ... |
Expand All @@ -111,3 +146,5 @@ Example **valid** TSV:
| sub-01 | SID-1234 | ses-02 | visit-2 | 23 | |
| sub-02 | SID-2222 | ses-01 | visit-1 | 28 | |
| ... | | | | | |


6 changes: 3 additions & 3 deletions docs/config.md → docs/guide/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ and coordinates them to work together:

(In parentheses are the names of services within the Docker Compose stack)

- **[Neurobagel node API/n-API](api.md)** (`api`): The API that communicates with a single graph store and determines
- **[Neurobagel node API/n-API](../api.md)** (`api`): The API that communicates with a single graph store and determines
how detailed the response to a query should be from that graph.
- **Graph store** (`graph`): A third-party RDF store that stores Neurobagel-harmonized data to be queried. At the moment our recipe uses the free tier
of [GraphDB](https://db-engines.com/en/system/GraphDB) for this.
- **Neurobagel federation/f-API** (`federation`): A special API that can federate over one or more
Neurobagel nodes to provide a single point of access to multiple distributed databases.
By default it will federate over all public nodes and any local nodes you specify.
- **[Neurobagel query tool](query_tool.md)** (`query_tool`): A web app that provides a graphical interface for users to query a
- **[Neurobagel query tool](../query_tool.md)** (`query_tool`): A web app that provides a graphical interface for users to query a
federation API and view the results from one or more nodes. Because the query tool is a static app and is run locally
in the user's browser, this service simply hosts the app.

Expand Down Expand Up @@ -309,7 +309,7 @@ you can use the
hierarchical relationships between concepts themselves can also be represented.
Including these relationships in a graph is important to be able to answer questions such as how many different diagnoses are represented in a graph database, to query for higher-order concepts for a given variable, and more.

The participant variables modeled by Neurobagel are named using Neurobagel's own vocabulary (for more information, see this page on [controlled terms](./term_naming_standards.md)).
The participant variables modeled by Neurobagel are named using Neurobagel's own vocabulary (for more information, see this page on [controlled terms](../term_naming_standards.md)).
This vocabulary, which defines internal relationships between vocabulary terms,
is serialized in the file [`nb_vocab.ttl`](https://github.com/neurobagel/recipes/blob/main/vocab/nb_vocab.ttl) available from the `neurobagel/recipes` repository.
If you have cloned this repository, you will already have downloaded the vocabulary file.
Expand Down
12 changes: 6 additions & 6 deletions docs/getting_started.md → docs/guide/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ and a local federation API
(everything in blue in the picture below)
that lets you search across the data in your node and in public Neurobagel nodes.

![Neurobagel node](imgs/neurobagel_local_node.jpg)
![Neurobagel node](../imgs/neurobagel_local_node.jpg)

To prepare your Neurobagel node for production use (i.e., for local or other users),
and to configure your deployment according to your specific needs,
Expand Down Expand Up @@ -98,7 +98,7 @@ cp local_nb_nodes.template.json local_nb_nodes.json
with the URL address where the Neurobagel federation API will be accessed:

- If you are deploying Neurobagel for yourself or deploying and trying the services **on your local machine only**,
you can use `NB_API_QUERY_URL=http://localhost:8080`, where `8080` is the [default host port for the federation API](./config.md#environment-variables).
you can use `NB_API_QUERY_URL=http://localhost:8080`, where `8080` is the [default host port for the federation API](config.md#environment-variables).
- If you are deploying Neurobagel **on a server for other users**,
you must use the IP (and port) or URL intended for your users to access the federation API on the server with.

Expand Down Expand Up @@ -151,9 +151,9 @@ our [service profile documentation](config.md#available-profiles) for details.
:tada: You are now the proud owner of a running Neurobagel node. Here are some things you can do now:

- Try the Neurobagel node you just deployed by accessing:
- your own query tool at [http://localhost:3000](http://localhost:3000), and reading the [query tool usage](./query_tool.md#usage) guide
- the interactive docs for your node API at [http://localhost:8000/docs](http://localhost:8000/docs), and reading the [API usage](./api.md) guide
- [Prepare your own dataset](./data_prep.md) for annotation with Neurobagel
- your own query tool at [http://localhost:3000](http://localhost:3000), and reading the [query tool usage](../query_tool.md#usage) guide
- the interactive docs for your node API at [http://localhost:8000/docs](http://localhost:8000/docs), and reading the [API usage](../api.md) guide
- [Prepare your own dataset](../data_prep.md) for annotation with Neurobagel
- [Add your own data to your Neurobagel graph](maintaining.md#updating-the-data-in-your-graph) to search
- Learn about the different [configuration options](config.md) for your Neurobagel node
- Hopefully all went well, but if you are experiencing issues, see how to [get help](./getting_help.md)
- Hopefully all went well, but if you are experiencing issues, see how to [get help](../getting_help.md)
12 changes: 6 additions & 6 deletions docs/maintaining.md → docs/guide/maintaining.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,35 +118,35 @@ For any of the below types of changes, you will need to regenerate a graph-ready

If new variables have been added to the dataset such that there are new columns in the phenotypic TSV you previously annotated using Neurobagel's annotation tool, you will need to:

1. **Generate an updated data dictionary** by annotating the new variables in your TSV following the [annotation workflow](annotation_tool.md)
1. **Generate an updated data dictionary** by annotating the new variables in your TSV following the [annotation workflow](../annotation_tool.md)

2. **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) on your updated TSV and data dictionary
2. **Generate a new graph-ready data file** for the dataset by [re-running the CLI](../cli.md) on your updated TSV and data dictionary

#### If only the imaging data have changed

If the BIDS data for a dataset have changed without changes in the corresponding phenotypic TSV (e.g., if new modalities or scans have been acquired for a subject), you have two options:

- If you still have access to the dataset's phenotypic JSONLD generated from the `pheno` command of the `bagel-cli` (step 1), you may choose to [rerun only the `bids` CLI command](cli.md) on the updated BIDS directory.
- If you still have access to the dataset's phenotypic JSONLD generated from the `pheno` command of the `bagel-cli` (step 1), you may choose to [rerun only the `bids` CLI command](../cli.md) on the updated BIDS directory.
This will generate a new graph-ready data file with updated imaging metadata of subjects.

OR

- [Rerun the CLI entirely (`pheno` and `bids` steps)](cli.md) to generate a new graph-ready data file for the dataset.
- [Rerun the CLI entirely (`pheno` and `bids` steps)](../cli.md) to generate a new graph-ready data file for the dataset.

_When in doubt, rerun both CLI commands._

#### If only the subjects have changed

If subjects have been added to or removed from the dataset but the phenotypic TSV is otherwise unchanged (i.e., only new or removed rows, without changes to the available variables), you will need to:

- **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) (`pheno` and `bids` steps) on your updated TSV and existing data dictionary
- **Generate a new graph-ready data file** for the dataset by [re-running the CLI](../cli.md) (`pheno` and `bids` steps) on your updated TSV and existing data dictionary

### Following a change in the _Neurobagel data model_

As Neurobagel continues developing the data model, new tool releases may introduce breaking changes to the data model for subject-level information in a `.jsonld` graph data file.
Breaking changes will be highlighted in the release notes.

_If you have already created `.jsonld` files for a Neurobagel graph database_ but want to update your graph data to the latest Neurobagel data model following such a change, you can easily do so by [rerunning the CLI](cli.md) on the existing data dictionaries and phenotypic TSVs for the dataset(s) in the graph.
_If you have already created `.jsonld` files for a Neurobagel graph database_ but want to update your graph data to the latest Neurobagel data model following such a change, you can easily do so by [rerunning the CLI](../cli.md) on the existing data dictionaries and phenotypic TSVs for the dataset(s) in the graph.
This will ensure that if you use the latest version of the Neurobagel CLI to process new datasets (i.e., generate new `.jsonld` files) for your database, the resulting data will not have conflicts with existing data in the graph.

Note that if upgrading to a newer version of the data model, **you should regenerate the `.jsonld` files for _all_ datasets in your existing graph**.
Expand Down
2 changes: 1 addition & 1 deletion docs/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,5 @@ You can also find official Docker images for our containerized tools on the [Neu
## What to do next

- [Learn how to run a cohort query](./query_tool.md) on publicly accessible Neurobagel nodes
- [Deploy your own Neurobagel node](./getting_started.md) using our official Docker Compose recipe
- [Deploy your own Neurobagel node](guide/getting_started.md) using our official Docker Compose recipe
- [Prepare your own dataset](./data_prep.md) for annotation and harmonization with Neurobagel
2 changes: 1 addition & 1 deletion docs/public_nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ Downloading of imaging data is performed via [datalad](https://rpq-qpn.ca/en/hom
Nodes that are not purposefully made public are not accessible
outside of the institute or network where they are deployed.
If you are interested in deploying a Neurobagel node for your institution,
please refer to our [deployment documentation](./getting_started.md) for more information.
please refer to our [deployment documentation](guide/getting_started.md) for more information.
2 changes: 1 addition & 1 deletion docs/query_tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Example:

If the values for all columns except for `DatasetID` and `SessionPath` in the participant-level results tsv are set to `protected`, this indicates the graph being queried has been configured (via its corresponding Neurobagel node API) to return only aggregate information about matches (due to data privacy reasons).
This configuration can be modified by setting the `NB_RETURN_AGG` environment variable to `false` (the value is by default `true`).
See related section of the documentation [here](config.md#environment-variables).
See related section of the documentation [here](guide/config.md#environment-variables).

Example:

Expand Down
6 changes: 3 additions & 3 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ nav:
- How to use the query tool: "query_tool.md"
- Querying the API directly: "api.md"
- Setting up your own Neurobagel node:
- Getting started: "getting_started.md"
- Configuring a node: "config.md"
- Maintaining a node: "maintaining.md"
- Getting started: "guide/getting_started.md"
surchs marked this conversation as resolved.
Show resolved Hide resolved
- Configuring a node: "guide/config.md"
- Maintaining a node: "guide/maintaining.md"
- Annotating your data:
- Preparing data for annotation: "data_prep.md"
- Annotation tool guide: "annotation_tool.md"
Expand Down
Loading