From 06c4b666de29060b5f43b53ec196d8c828558267 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Mon, 23 Sep 2024 15:49:27 -0400 Subject: [PATCH 01/11] [ENH] Own DIR for getting started Fixes #99 --- docs/cli.md | 2 +- docs/{ => guide}/config.md | 6 +++--- docs/{ => guide}/getting_started.md | 12 ++++++------ docs/{ => guide}/maintaining.md | 12 ++++++------ docs/overview.md | 2 +- docs/public_nodes.md | 2 +- docs/query_tool.md | 2 +- mkdocs.yml | 6 +++--- 8 files changed, 22 insertions(+), 22 deletions(-) rename docs/{ => guide}/config.md (98%) rename docs/{ => guide}/getting_started.md (95%) rename docs/{ => guide}/maintaining.md (95%) diff --git a/docs/cli.md b/docs/cli.md index f318318d..63f20db5 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -177,7 +177,7 @@ You could run the CLI as follows: Neurobagel is under active, early development and future releases of the CLI may introduce breaking changes to the data model for subject-level information in a `.jsonld` graph file. Breaking changes will be highlighted in the release notes! _If you have already created `.jsonld` files for your Neurobagel graph database using the CLI_, -they can be quickly re-generated under the new data model by following the instructions [here](maintaining.md#following-a-change-in-the-neurobagel-data-model) so that they will not conflict with dataset `.jsonld` files generated using the latest CLI version. +they can be quickly re-generated under the new data model by following the instructions [here](guide/maintaining.md#following-a-change-in-the-neurobagel-data-model) so that they will not conflict with dataset `.jsonld` files generated using the latest CLI version. ## Development environment diff --git a/docs/config.md b/docs/guide/config.md similarity index 98% rename from docs/config.md rename to docs/guide/config.md index 1797eb32..38c259bc 100644 --- a/docs/config.md +++ b/docs/guide/config.md @@ -15,14 +15,14 @@ and coordinates them to work together: (In parentheses are the names of services within the Docker Compose stack) -- **[Neurobagel node API/n-API](api.md)** (`api`): The API that communicates with a single graph store and determines +- **[Neurobagel node API/n-API](../api.md)** (`api`): The API that communicates with a single graph store and determines how detailed the response to a query should be from that graph. - **Graph store** (`graph`): A third-party RDF store that stores Neurobagel-harmonized data to be queried. At the moment our recipe uses the free tier of [GraphDB](https://db-engines.com/en/system/GraphDB) for this. - **Neurobagel federation/f-API** (`federation`): A special API that can federate over one or more Neurobagel nodes to provide a single point of access to multiple distributed databases. By default it will federate over all public nodes and any local nodes you specify. -- **[Neurobagel query tool](query_tool.md)** (`query_tool`): A web app that provides a graphical interface for users to query a +- **[Neurobagel query tool](../query_tool.md)** (`query_tool`): A web app that provides a graphical interface for users to query a federation API and view the results from one or more nodes. Because the query tool is a static app and is run locally in the user's browser, this service simply hosts the app. @@ -309,7 +309,7 @@ you can use the hierarchical relationships between concepts themselves can also be represented. Including these relationships in a graph is important to be able to answer questions such as how many different diagnoses are represented in a graph database, to query for higher-order concepts for a given variable, and more. -The participant variables modeled by Neurobagel are named using Neurobagel's own vocabulary (for more information, see this page on [controlled terms](./term_naming_standards.md)). +The participant variables modeled by Neurobagel are named using Neurobagel's own vocabulary (for more information, see this page on [controlled terms](../term_naming_standards.md)). This vocabulary, which defines internal relationships between vocabulary terms, is serialized in the file [`nb_vocab.ttl`](https://github.com/neurobagel/recipes/blob/main/vocab/nb_vocab.ttl) available from the `neurobagel/recipes` repository. If you have cloned this repository, you will already have downloaded the vocabulary file. diff --git a/docs/getting_started.md b/docs/guide/getting_started.md similarity index 95% rename from docs/getting_started.md rename to docs/guide/getting_started.md index 3a66d8d1..6725086c 100644 --- a/docs/getting_started.md +++ b/docs/guide/getting_started.md @@ -5,7 +5,7 @@ and a local federation API (everything in blue in the picture below) that lets you search across the data in your node and in public Neurobagel nodes. -![Neurobagel node](imgs/neurobagel_local_node.jpg) +![Neurobagel node](../imgs/neurobagel_local_node.jpg) To prepare your Neurobagel node for production use (i.e., for local or other users), and to configure your deployment according to your specific needs, @@ -98,7 +98,7 @@ cp local_nb_nodes.template.json local_nb_nodes.json with the URL address where the Neurobagel federation API will be accessed: - If you are deploying Neurobagel for yourself or deploying and trying the services **on your local machine only**, - you can use `NB_API_QUERY_URL=http://localhost:8080`, where `8080` is the [default host port for the federation API](./config.md#environment-variables). + you can use `NB_API_QUERY_URL=http://localhost:8080`, where `8080` is the [default host port for the federation API](config.md#environment-variables). - If you are deploying Neurobagel **on a server for other users**, you must use the IP (and port) or URL intended for your users to access the federation API on the server with. @@ -151,9 +151,9 @@ our [service profile documentation](config.md#available-profiles) for details. :tada: You are now the proud owner of a running Neurobagel node. Here are some things you can do now: - Try the Neurobagel node you just deployed by accessing: - - your own query tool at [http://localhost:3000](http://localhost:3000), and reading the [query tool usage](./query_tool.md#usage) guide - - the interactive docs for your node API at [http://localhost:8000/docs](http://localhost:8000/docs), and reading the [API usage](./api.md) guide -- [Prepare your own dataset](./data_prep.md) for annotation with Neurobagel + - your own query tool at [http://localhost:3000](http://localhost:3000), and reading the [query tool usage](../query_tool.md#usage) guide + - the interactive docs for your node API at [http://localhost:8000/docs](http://localhost:8000/docs), and reading the [API usage](../api.md) guide +- [Prepare your own dataset](../data_prep.md) for annotation with Neurobagel - [Add your own data to your Neurobagel graph](maintaining.md#updating-the-data-in-your-graph) to search - Learn about the different [configuration options](config.md) for your Neurobagel node -- Hopefully all went well, but if you are experiencing issues, see how to [get help](./getting_help.md) \ No newline at end of file +- Hopefully all went well, but if you are experiencing issues, see how to [get help](../getting_help.md) \ No newline at end of file diff --git a/docs/maintaining.md b/docs/guide/maintaining.md similarity index 95% rename from docs/maintaining.md rename to docs/guide/maintaining.md index 4630fd2f..02827f3e 100644 --- a/docs/maintaining.md +++ b/docs/guide/maintaining.md @@ -118,20 +118,20 @@ For any of the below types of changes, you will need to regenerate a graph-ready If new variables have been added to the dataset such that there are new columns in the phenotypic TSV you previously annotated using Neurobagel's annotation tool, you will need to: -1. **Generate an updated data dictionary** by annotating the new variables in your TSV following the [annotation workflow](annotation_tool.md) +1. **Generate an updated data dictionary** by annotating the new variables in your TSV following the [annotation workflow](../annotation_tool.md) -2. **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) on your updated TSV and data dictionary +2. **Generate a new graph-ready data file** for the dataset by [re-running the CLI](../cli.md) on your updated TSV and data dictionary #### If only the imaging data have changed If the BIDS data for a dataset have changed without changes in the corresponding phenotypic TSV (e.g., if new modalities or scans have been acquired for a subject), you have two options: -- If you still have access to the dataset's phenotypic JSONLD generated from the `pheno` command of the `bagel-cli` (step 1), you may choose to [rerun only the `bids` CLI command](cli.md) on the updated BIDS directory. +- If you still have access to the dataset's phenotypic JSONLD generated from the `pheno` command of the `bagel-cli` (step 1), you may choose to [rerun only the `bids` CLI command](../cli.md) on the updated BIDS directory. This will generate a new graph-ready data file with updated imaging metadata of subjects. OR -- [Rerun the CLI entirely (`pheno` and `bids` steps)](cli.md) to generate a new graph-ready data file for the dataset. +- [Rerun the CLI entirely (`pheno` and `bids` steps)](../cli.md) to generate a new graph-ready data file for the dataset. _When in doubt, rerun both CLI commands._ @@ -139,14 +139,14 @@ _When in doubt, rerun both CLI commands._ If subjects have been added to or removed from the dataset but the phenotypic TSV is otherwise unchanged (i.e., only new or removed rows, without changes to the available variables), you will need to: -- **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) (`pheno` and `bids` steps) on your updated TSV and existing data dictionary +- **Generate a new graph-ready data file** for the dataset by [re-running the CLI](../cli.md) (`pheno` and `bids` steps) on your updated TSV and existing data dictionary ### Following a change in the _Neurobagel data model_ As Neurobagel continues developing the data model, new tool releases may introduce breaking changes to the data model for subject-level information in a `.jsonld` graph data file. Breaking changes will be highlighted in the release notes. -_If you have already created `.jsonld` files for a Neurobagel graph database_ but want to update your graph data to the latest Neurobagel data model following such a change, you can easily do so by [rerunning the CLI](cli.md) on the existing data dictionaries and phenotypic TSVs for the dataset(s) in the graph. +_If you have already created `.jsonld` files for a Neurobagel graph database_ but want to update your graph data to the latest Neurobagel data model following such a change, you can easily do so by [rerunning the CLI](../cli.md) on the existing data dictionaries and phenotypic TSVs for the dataset(s) in the graph. This will ensure that if you use the latest version of the Neurobagel CLI to process new datasets (i.e., generate new `.jsonld` files) for your database, the resulting data will not have conflicts with existing data in the graph. Note that if upgrading to a newer version of the data model, **you should regenerate the `.jsonld` files for _all_ datasets in your existing graph**. diff --git a/docs/overview.md b/docs/overview.md index 3c0d925d..88f84fa7 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -22,5 +22,5 @@ You can also find official Docker images for our containerized tools on the [Neu ## What to do next - [Learn how to run a cohort query](./query_tool.md) on publicly accessible Neurobagel nodes -- [Deploy your own Neurobagel node](./getting_started.md) using our official Docker Compose recipe +- [Deploy your own Neurobagel node](guide/getting_started.md) using our official Docker Compose recipe - [Prepare your own dataset](./data_prep.md) for annotation and harmonization with Neurobagel diff --git a/docs/public_nodes.md b/docs/public_nodes.md index b141341e..cfb06d81 100644 --- a/docs/public_nodes.md +++ b/docs/public_nodes.md @@ -32,4 +32,4 @@ Downloading of imaging data is performed via [datalad](https://rpq-qpn.ca/en/hom Nodes that are not purposefully made public are not accessible outside of the institute or network where they are deployed. If you are interested in deploying a Neurobagel node for your institution, -please refer to our [deployment documentation](./getting_started.md) for more information. \ No newline at end of file +please refer to our [deployment documentation](guide/getting_started.md) for more information. \ No newline at end of file diff --git a/docs/query_tool.md b/docs/query_tool.md index e861d4f9..3bc57d54 100644 --- a/docs/query_tool.md +++ b/docs/query_tool.md @@ -65,7 +65,7 @@ Example: If the values for all columns except for `DatasetID` and `SessionPath` in the participant-level results tsv are set to `protected`, this indicates the graph being queried has been configured (via its corresponding Neurobagel node API) to return only aggregate information about matches (due to data privacy reasons). This configuration can be modified by setting the `NB_RETURN_AGG` environment variable to `false` (the value is by default `true`). -See related section of the documentation [here](config.md#environment-variables). +See related section of the documentation [here](guide/config.md#environment-variables). Example: diff --git a/mkdocs.yml b/mkdocs.yml index 7d40883e..8348ff5a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -30,9 +30,9 @@ nav: - How to use the query tool: "query_tool.md" - Querying the API directly: "api.md" - Setting up your own Neurobagel node: - - Getting started: "getting_started.md" - - Configuring a node: "config.md" - - Maintaining a node: "maintaining.md" + - Getting started: "guide/getting_started.md" + - Configuring a node: "guide/config.md" + - Maintaining a node: "guide/maintaining.md" - Annotating your data: - Preparing data for annotation: "data_prep.md" - Annotation tool guide: "annotation_tool.md" From 3130823cb5c0dbf23873bf3b741fe15c9b7833f4 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Mon, 23 Sep 2024 16:05:24 -0400 Subject: [PATCH 02/11] [ENH] Use MUST keyword Also some overlength line breaks --- docs/data_prep.md | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/docs/data_prep.md b/docs/data_prep.md index c2b622df..b18b7966 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -17,18 +17,24 @@ please prepare the tabular data for your dataset as a single, tab-separated file ### All datasets -A valid dataset for Neurobagel **must** include a TSV file that describes participant attributes. -The TSV must contain a minimum of two columns: at least one column must contain subject IDs, -and at least one column must describe demographic or other phenotypic information -(for variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md)). +A valid dataset for Neurobagel **MUST** include a TSV file that describes participant attributes. +The TSV **MUST** contain a minimum of two columns: + +- at least one column must contain subject IDs, and +- at least one column must describe demographic or other phenotypic information + +for variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md). ### Datasets with imaging (BIDS) data If a dataset has imaging data in [BIDS](https://bids-specification.readthedocs.io/en/stable/) format, Neurobagel **additionally** requires that: -- At least one column in the phenotypic TSV contains subject IDs that match the names of [BIDS subject subdirectories](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#filesystem-structure). -If this condition is not met, you will encounter an error when [running the Neurobagel CLI](cli.md) on your dataset to generate Neurobagel graph-ready files, indicating that your BIDS directory contains subjects not found in your phenotypic file. +- At least one column in the phenotypic TSV contains subject IDs that + match the names of [BIDS subject subdirectories](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#filesystem-structure). + If this condition is not met, you will encounter an error + when [running the Neurobagel CLI](cli.md) on your dataset to generate Neurobagel graph-ready files, + indicating that your BIDS directory contains subjects not found in your phenotypic file. !!! note Subject IDs are case-sensitive and must match BIDS subject IDs exactly @@ -39,7 +45,8 @@ If this condition is not met, you will encounter an error when [running the Neur - All BIDS subjects are included in the phenotypic TSV, even if they only have BIDS imaging information. Neurobagel does not allow for datasets where subjects have BIDS - data but are not represented in the phenotypic TSV (however, subjects who have phenotypic data but no BIDS data are allowed). + data but are not represented in the phenotypic TSV + (however, subjects who have phenotypic data but no BIDS data are allowed). ## Examples of valid phenotypic TSVs From c2141862945940f69f1895b8a5ba22102e915747 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Mon, 23 Sep 2024 16:19:25 -0400 Subject: [PATCH 03/11] [ENH] Clarify required columns --- docs/data_prep.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/docs/data_prep.md b/docs/data_prep.md index b18b7966..e49bf21d 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -18,10 +18,15 @@ please prepare the tabular data for your dataset as a single, tab-separated file ### All datasets A valid dataset for Neurobagel **MUST** include a TSV file that describes participant attributes. -The TSV **MUST** contain a minimum of two columns: - -- at least one column must contain subject IDs, and -- at least one column must describe demographic or other phenotypic information +The TSV: + +- MUST contain a minimum of two columns +- MUST contain exactly one column with subject IDs, i.e. at least one, and only one +- MUST contain at least one additional column that describes demographic or other phenotypic information +- MAY contain a column with session IDs if the dataset is longitudinal. + If present, MUST contain only one column about session IDs. +- MUST NOT contain any missing values in the subject ID and session ID (if availble) column +- MUST have unique values in the subject ID column OR in the combination of subject ID and session ID columns for variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md). From a4fed8c5355b1ceac8e63921f1d2218eba0b2f52 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Mon, 23 Sep 2024 16:27:03 -0400 Subject: [PATCH 04/11] [ENH] Clarify requirements for session IDs - Also include a clarification in the section on multiple participants or sessions --- docs/data_prep.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/data_prep.md b/docs/data_prep.md index e49bf21d..c342c4f9 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -52,6 +52,11 @@ Neurobagel **additionally** requires that: Neurobagel does not allow for datasets where subjects have BIDS data but are not represented in the phenotypic TSV (however, subjects who have phenotypic data but no BIDS data are allowed). +- If the dataset is longitudinal, the session IDs in the phenotypic TSV + MAY match the session IDs in the BIDS dataset, but don't have to. + If matching session IDs are present in the phenotypic TSV and the BIDS dataset, neurobagel will interpret + this to mean that the phenotypic data is associated with the corresponding BIDS session. + If phenotypic and BIDS session IDs do not match, Neurobagel will treat them as distinct sessions. ## Examples of valid phenotypic TSVs @@ -105,6 +110,14 @@ In this case, both types of participant IDs should be recorded in the tabular fi The only requirement is that **the combination of all ID values for a row is unique**. +!!! Warning "Neurobagel currently supports only one subject ID and one session ID" + Neurobagel does not support multiple subject or session IDs in the same TSV file. + If you have multiple subject or session IDs, you must choose one to use as the primary ID + and include the others as additional columns. + + + We are planning to support multiple IDs in the future. + Example **invalid** TSV: | participant_id | alternative_participant_id | ... | @@ -123,3 +136,5 @@ Example **valid** TSV: | sub-01 | SID-1234 | ses-02 | visit-2 | 23 | | | sub-02 | SID-2222 | ses-01 | visit-1 | 28 | | | ... | | | | | | + + From f99f213e8c94a45a366a3f5595f9733fcb9b4dc8 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Mon, 23 Sep 2024 16:29:41 -0400 Subject: [PATCH 05/11] [FIX] fix a typo --- docs/data_prep.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data_prep.md b/docs/data_prep.md index c342c4f9..dbe84994 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -25,7 +25,7 @@ The TSV: - MUST contain at least one additional column that describes demographic or other phenotypic information - MAY contain a column with session IDs if the dataset is longitudinal. If present, MUST contain only one column about session IDs. -- MUST NOT contain any missing values in the subject ID and session ID (if availble) column +- MUST NOT contain any missing values in the subject ID and session ID (if available) column - MUST have unique values in the subject ID column OR in the combination of subject ID and session ID columns for variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md). From 42e6d2abd6e23a185ac0aef91b84ddfde8f44250 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Mon, 23 Sep 2024 19:22:38 -0400 Subject: [PATCH 06/11] [ENH] Clarify restrictions on annotation --- docs/data_prep.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/docs/data_prep.md b/docs/data_prep.md index dbe84994..4d5ad606 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -17,14 +17,24 @@ please prepare the tabular data for your dataset as a single, tab-separated file ### All datasets -A valid dataset for Neurobagel **MUST** include a TSV file that describes participant attributes. +A valid dataset for Neurobagel **MUST** include a TSV file that describes participant attributes. + The TSV: - MUST contain a minimum of two columns -- MUST contain exactly one column with subject IDs, i.e. at least one, and only one +- MUST contain at least one colum with subject IDs + + ??? note "Only one subject ID column can be annotated" + Neurobagel currently does not support annotating multiple subject ID columns + so you must choose one as the primary ID during annotation + - MUST contain at least one additional column that describes demographic or other phenotypic information -- MAY contain a column with session IDs if the dataset is longitudinal. - If present, MUST contain only one column about session IDs. +- MAY contain a column with session IDs if the dataset is longitudinal. + + ??? note "Only one session ID column can be annotated" + Neurobagel currently does not support annotating multiple session ID columns + so you must choose one as the primary ID during annotation + - MUST NOT contain any missing values in the subject ID and session ID (if available) column - MUST have unique values in the subject ID column OR in the combination of subject ID and session ID columns From ca53bc0a5cc593b53a89d4bb09cc3de786c1aa5f Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Mon, 23 Sep 2024 19:23:43 -0400 Subject: [PATCH 07/11] [FIX] Typo fix We really should get codespell to run locally --- docs/data_prep.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data_prep.md b/docs/data_prep.md index 4d5ad606..c3fb19c2 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -22,7 +22,7 @@ A valid dataset for Neurobagel **MUST** include a TSV file that describes partic The TSV: - MUST contain a minimum of two columns -- MUST contain at least one colum with subject IDs +- MUST contain at least one column with subject IDs ??? note "Only one subject ID column can be annotated" Neurobagel currently does not support annotating multiple subject ID columns From ab6393b09fb2e0bb85427f92cb0c166285074241 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Tue, 24 Sep 2024 09:16:39 -0400 Subject: [PATCH 08/11] [ENH] Apply code review suggestions --- docs/data_prep.md | 36 +++++++++++++++--------------------- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/docs/data_prep.md b/docs/data_prep.md index c3fb19c2..176f881c 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -5,12 +5,12 @@ We recommend also checking out [Nipoppy](https://nipoppy.readthedocs.io/), a protocol for standardized organization and processing of clinical-neuroimaging datasets that extends [BIDS](https://bids-specification.readthedocs.io/en/stable/). Neurobagel tools are designed to be compatible with data organized according to the Nipoppy specification, although you do not need to use Nipoppy in order to use Neurobagel. -To use the Neurobagel annotation tool, +To use the Neurobagel annotation tool, please prepare the tabular data for your dataset as a single, tab-separated file (`.tsv`). !!! note In the Neurobagel context, _tabular_ or _phenotypic_ data for a dataset refers to any demographic, - clinical/behavioural, cognitive, or other non-imaging-derived data of participants + clinical/behavioural, cognitive, or other non-imaging-derived data of participants which are typically stored in a tabular file format. ## General requirements for the phenotypic TSV @@ -19,26 +19,26 @@ please prepare the tabular data for your dataset as a single, tab-separated file A valid dataset for Neurobagel **MUST** include a TSV file that describes participant attributes. -The TSV: +The TSV MUST: - MUST contain a minimum of two columns - MUST contain at least one column with subject IDs ??? note "Only one subject ID column can be annotated" - Neurobagel currently does not support annotating multiple subject ID columns + Neurobagel currently does not support annotating multiple subject ID columns so you must choose one as the primary ID during annotation -- MUST contain at least one additional column that describes demographic or other phenotypic information +- MUST contain at least one column that describes demographic or other phenotypic information - MAY contain a column with session IDs if the dataset is longitudinal. ??? note "Only one session ID column can be annotated" - Neurobagel currently does not support annotating multiple session ID columns + Neurobagel currently does not support annotating multiple session ID columns so you must choose one as the primary ID during annotation - MUST NOT contain any missing values in the subject ID and session ID (if available) column - MUST have unique values in the subject ID column OR in the combination of subject ID and session ID columns -for variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md). +For all variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md). ### Datasets with imaging (BIDS) data @@ -62,11 +62,8 @@ Neurobagel **additionally** requires that: Neurobagel does not allow for datasets where subjects have BIDS data but are not represented in the phenotypic TSV (however, subjects who have phenotypic data but no BIDS data are allowed). -- If the dataset is longitudinal, the session IDs in the phenotypic TSV - MAY match the session IDs in the BIDS dataset, but don't have to. - If matching session IDs are present in the phenotypic TSV and the BIDS dataset, neurobagel will interpret - this to mean that the phenotypic data is associated with the corresponding BIDS session. - If phenotypic and BIDS session IDs do not match, Neurobagel will treat them as distinct sessions. +- If the dataset is longitudinal, the session IDs in the phenotypic TSV + MAY match the session IDs in the BIDS dataset, but don't have to. ## Examples of valid phenotypic TSVs @@ -110,23 +107,22 @@ Example TSV: (see also the BIDS specification section on [Longitudinal and multi-site studies](https://bids-specification.readthedocs.io/en/stable/06-longitudinal-and-multi-site-studies.html#longitudinal-and-multi-site-studies)). ### Multiple participant or session identifier columns -In some cases, there may be a need for more than one set of IDs + +In some cases, there may be a need for more than one set of IDs for participants and/or sessions. For example, if a participant was first enrolled in a behavioural study -with one type of ID, +with one type of ID, and then later joined an imaging study under a different ID. In this case, both types of participant IDs should be recorded in the tabular file. The only requirement is that **the combination of all ID values for a row is unique**. !!! Warning "Neurobagel currently supports only one subject ID and one session ID" - Neurobagel does not support multiple subject or session IDs in the same TSV file. - If you have multiple subject or session IDs, you must choose one to use as the primary ID - and include the others as additional columns. - - We are planning to support multiple IDs in the future. + Neurobagel currently does not support annotating multiple subject or session ID columns in the same TSV file. + If you have multiple subject or session IDs, you must choose one to use as the primary ID. + Additional subject/session ID columns can still be included in the TSV but will be ignored by Neurobagel. Example **invalid** TSV: @@ -146,5 +142,3 @@ Example **valid** TSV: | sub-01 | SID-1234 | ses-02 | visit-2 | 23 | | | sub-02 | SID-2222 | ses-01 | visit-1 | 28 | | | ... | | | | | | - - From 9b62905775737d2c5636ccb6a7a0e4e58ad413d9 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Tue, 24 Sep 2024 09:18:11 -0400 Subject: [PATCH 09/11] Revert "[ENH] Own DIR for getting started" This reverts commit 06c4b666de29060b5f43b53ec196d8c828558267. We'll deal with it in #99 in one go --- docs/cli.md | 2 +- docs/{guide => }/config.md | 6 +++--- docs/{guide => }/getting_started.md | 12 ++++++------ docs/{guide => }/maintaining.md | 12 ++++++------ docs/overview.md | 2 +- docs/public_nodes.md | 2 +- docs/query_tool.md | 2 +- mkdocs.yml | 6 +++--- 8 files changed, 22 insertions(+), 22 deletions(-) rename docs/{guide => }/config.md (98%) rename docs/{guide => }/getting_started.md (95%) rename docs/{guide => }/maintaining.md (95%) diff --git a/docs/cli.md b/docs/cli.md index 63f20db5..f318318d 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -177,7 +177,7 @@ You could run the CLI as follows: Neurobagel is under active, early development and future releases of the CLI may introduce breaking changes to the data model for subject-level information in a `.jsonld` graph file. Breaking changes will be highlighted in the release notes! _If you have already created `.jsonld` files for your Neurobagel graph database using the CLI_, -they can be quickly re-generated under the new data model by following the instructions [here](guide/maintaining.md#following-a-change-in-the-neurobagel-data-model) so that they will not conflict with dataset `.jsonld` files generated using the latest CLI version. +they can be quickly re-generated under the new data model by following the instructions [here](maintaining.md#following-a-change-in-the-neurobagel-data-model) so that they will not conflict with dataset `.jsonld` files generated using the latest CLI version. ## Development environment diff --git a/docs/guide/config.md b/docs/config.md similarity index 98% rename from docs/guide/config.md rename to docs/config.md index 38c259bc..1797eb32 100644 --- a/docs/guide/config.md +++ b/docs/config.md @@ -15,14 +15,14 @@ and coordinates them to work together: (In parentheses are the names of services within the Docker Compose stack) -- **[Neurobagel node API/n-API](../api.md)** (`api`): The API that communicates with a single graph store and determines +- **[Neurobagel node API/n-API](api.md)** (`api`): The API that communicates with a single graph store and determines how detailed the response to a query should be from that graph. - **Graph store** (`graph`): A third-party RDF store that stores Neurobagel-harmonized data to be queried. At the moment our recipe uses the free tier of [GraphDB](https://db-engines.com/en/system/GraphDB) for this. - **Neurobagel federation/f-API** (`federation`): A special API that can federate over one or more Neurobagel nodes to provide a single point of access to multiple distributed databases. By default it will federate over all public nodes and any local nodes you specify. -- **[Neurobagel query tool](../query_tool.md)** (`query_tool`): A web app that provides a graphical interface for users to query a +- **[Neurobagel query tool](query_tool.md)** (`query_tool`): A web app that provides a graphical interface for users to query a federation API and view the results from one or more nodes. Because the query tool is a static app and is run locally in the user's browser, this service simply hosts the app. @@ -309,7 +309,7 @@ you can use the hierarchical relationships between concepts themselves can also be represented. Including these relationships in a graph is important to be able to answer questions such as how many different diagnoses are represented in a graph database, to query for higher-order concepts for a given variable, and more. -The participant variables modeled by Neurobagel are named using Neurobagel's own vocabulary (for more information, see this page on [controlled terms](../term_naming_standards.md)). +The participant variables modeled by Neurobagel are named using Neurobagel's own vocabulary (for more information, see this page on [controlled terms](./term_naming_standards.md)). This vocabulary, which defines internal relationships between vocabulary terms, is serialized in the file [`nb_vocab.ttl`](https://github.com/neurobagel/recipes/blob/main/vocab/nb_vocab.ttl) available from the `neurobagel/recipes` repository. If you have cloned this repository, you will already have downloaded the vocabulary file. diff --git a/docs/guide/getting_started.md b/docs/getting_started.md similarity index 95% rename from docs/guide/getting_started.md rename to docs/getting_started.md index 6725086c..3a66d8d1 100644 --- a/docs/guide/getting_started.md +++ b/docs/getting_started.md @@ -5,7 +5,7 @@ and a local federation API (everything in blue in the picture below) that lets you search across the data in your node and in public Neurobagel nodes. -![Neurobagel node](../imgs/neurobagel_local_node.jpg) +![Neurobagel node](imgs/neurobagel_local_node.jpg) To prepare your Neurobagel node for production use (i.e., for local or other users), and to configure your deployment according to your specific needs, @@ -98,7 +98,7 @@ cp local_nb_nodes.template.json local_nb_nodes.json with the URL address where the Neurobagel federation API will be accessed: - If you are deploying Neurobagel for yourself or deploying and trying the services **on your local machine only**, - you can use `NB_API_QUERY_URL=http://localhost:8080`, where `8080` is the [default host port for the federation API](config.md#environment-variables). + you can use `NB_API_QUERY_URL=http://localhost:8080`, where `8080` is the [default host port for the federation API](./config.md#environment-variables). - If you are deploying Neurobagel **on a server for other users**, you must use the IP (and port) or URL intended for your users to access the federation API on the server with. @@ -151,9 +151,9 @@ our [service profile documentation](config.md#available-profiles) for details. :tada: You are now the proud owner of a running Neurobagel node. Here are some things you can do now: - Try the Neurobagel node you just deployed by accessing: - - your own query tool at [http://localhost:3000](http://localhost:3000), and reading the [query tool usage](../query_tool.md#usage) guide - - the interactive docs for your node API at [http://localhost:8000/docs](http://localhost:8000/docs), and reading the [API usage](../api.md) guide -- [Prepare your own dataset](../data_prep.md) for annotation with Neurobagel + - your own query tool at [http://localhost:3000](http://localhost:3000), and reading the [query tool usage](./query_tool.md#usage) guide + - the interactive docs for your node API at [http://localhost:8000/docs](http://localhost:8000/docs), and reading the [API usage](./api.md) guide +- [Prepare your own dataset](./data_prep.md) for annotation with Neurobagel - [Add your own data to your Neurobagel graph](maintaining.md#updating-the-data-in-your-graph) to search - Learn about the different [configuration options](config.md) for your Neurobagel node -- Hopefully all went well, but if you are experiencing issues, see how to [get help](../getting_help.md) \ No newline at end of file +- Hopefully all went well, but if you are experiencing issues, see how to [get help](./getting_help.md) \ No newline at end of file diff --git a/docs/guide/maintaining.md b/docs/maintaining.md similarity index 95% rename from docs/guide/maintaining.md rename to docs/maintaining.md index 02827f3e..4630fd2f 100644 --- a/docs/guide/maintaining.md +++ b/docs/maintaining.md @@ -118,20 +118,20 @@ For any of the below types of changes, you will need to regenerate a graph-ready If new variables have been added to the dataset such that there are new columns in the phenotypic TSV you previously annotated using Neurobagel's annotation tool, you will need to: -1. **Generate an updated data dictionary** by annotating the new variables in your TSV following the [annotation workflow](../annotation_tool.md) +1. **Generate an updated data dictionary** by annotating the new variables in your TSV following the [annotation workflow](annotation_tool.md) -2. **Generate a new graph-ready data file** for the dataset by [re-running the CLI](../cli.md) on your updated TSV and data dictionary +2. **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) on your updated TSV and data dictionary #### If only the imaging data have changed If the BIDS data for a dataset have changed without changes in the corresponding phenotypic TSV (e.g., if new modalities or scans have been acquired for a subject), you have two options: -- If you still have access to the dataset's phenotypic JSONLD generated from the `pheno` command of the `bagel-cli` (step 1), you may choose to [rerun only the `bids` CLI command](../cli.md) on the updated BIDS directory. +- If you still have access to the dataset's phenotypic JSONLD generated from the `pheno` command of the `bagel-cli` (step 1), you may choose to [rerun only the `bids` CLI command](cli.md) on the updated BIDS directory. This will generate a new graph-ready data file with updated imaging metadata of subjects. OR -- [Rerun the CLI entirely (`pheno` and `bids` steps)](../cli.md) to generate a new graph-ready data file for the dataset. +- [Rerun the CLI entirely (`pheno` and `bids` steps)](cli.md) to generate a new graph-ready data file for the dataset. _When in doubt, rerun both CLI commands._ @@ -139,14 +139,14 @@ _When in doubt, rerun both CLI commands._ If subjects have been added to or removed from the dataset but the phenotypic TSV is otherwise unchanged (i.e., only new or removed rows, without changes to the available variables), you will need to: -- **Generate a new graph-ready data file** for the dataset by [re-running the CLI](../cli.md) (`pheno` and `bids` steps) on your updated TSV and existing data dictionary +- **Generate a new graph-ready data file** for the dataset by [re-running the CLI](cli.md) (`pheno` and `bids` steps) on your updated TSV and existing data dictionary ### Following a change in the _Neurobagel data model_ As Neurobagel continues developing the data model, new tool releases may introduce breaking changes to the data model for subject-level information in a `.jsonld` graph data file. Breaking changes will be highlighted in the release notes. -_If you have already created `.jsonld` files for a Neurobagel graph database_ but want to update your graph data to the latest Neurobagel data model following such a change, you can easily do so by [rerunning the CLI](../cli.md) on the existing data dictionaries and phenotypic TSVs for the dataset(s) in the graph. +_If you have already created `.jsonld` files for a Neurobagel graph database_ but want to update your graph data to the latest Neurobagel data model following such a change, you can easily do so by [rerunning the CLI](cli.md) on the existing data dictionaries and phenotypic TSVs for the dataset(s) in the graph. This will ensure that if you use the latest version of the Neurobagel CLI to process new datasets (i.e., generate new `.jsonld` files) for your database, the resulting data will not have conflicts with existing data in the graph. Note that if upgrading to a newer version of the data model, **you should regenerate the `.jsonld` files for _all_ datasets in your existing graph**. diff --git a/docs/overview.md b/docs/overview.md index 88f84fa7..3c0d925d 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -22,5 +22,5 @@ You can also find official Docker images for our containerized tools on the [Neu ## What to do next - [Learn how to run a cohort query](./query_tool.md) on publicly accessible Neurobagel nodes -- [Deploy your own Neurobagel node](guide/getting_started.md) using our official Docker Compose recipe +- [Deploy your own Neurobagel node](./getting_started.md) using our official Docker Compose recipe - [Prepare your own dataset](./data_prep.md) for annotation and harmonization with Neurobagel diff --git a/docs/public_nodes.md b/docs/public_nodes.md index cfb06d81..b141341e 100644 --- a/docs/public_nodes.md +++ b/docs/public_nodes.md @@ -32,4 +32,4 @@ Downloading of imaging data is performed via [datalad](https://rpq-qpn.ca/en/hom Nodes that are not purposefully made public are not accessible outside of the institute or network where they are deployed. If you are interested in deploying a Neurobagel node for your institution, -please refer to our [deployment documentation](guide/getting_started.md) for more information. \ No newline at end of file +please refer to our [deployment documentation](./getting_started.md) for more information. \ No newline at end of file diff --git a/docs/query_tool.md b/docs/query_tool.md index 3bc57d54..e861d4f9 100644 --- a/docs/query_tool.md +++ b/docs/query_tool.md @@ -65,7 +65,7 @@ Example: If the values for all columns except for `DatasetID` and `SessionPath` in the participant-level results tsv are set to `protected`, this indicates the graph being queried has been configured (via its corresponding Neurobagel node API) to return only aggregate information about matches (due to data privacy reasons). This configuration can be modified by setting the `NB_RETURN_AGG` environment variable to `false` (the value is by default `true`). -See related section of the documentation [here](guide/config.md#environment-variables). +See related section of the documentation [here](config.md#environment-variables). Example: diff --git a/mkdocs.yml b/mkdocs.yml index 8348ff5a..7d40883e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -30,9 +30,9 @@ nav: - How to use the query tool: "query_tool.md" - Querying the API directly: "api.md" - Setting up your own Neurobagel node: - - Getting started: "guide/getting_started.md" - - Configuring a node: "guide/config.md" - - Maintaining a node: "guide/maintaining.md" + - Getting started: "getting_started.md" + - Configuring a node: "config.md" + - Maintaining a node: "maintaining.md" - Annotating your data: - Preparing data for annotation: "data_prep.md" - Annotation tool guide: "annotation_tool.md" From 4fcb5caa628f1cb73a69c8376f8e6076f89c43f9 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Tue, 24 Sep 2024 13:28:49 -0400 Subject: [PATCH 10/11] [REF] Changes from PR review --- docs/data_prep.md | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/docs/data_prep.md b/docs/data_prep.md index 176f881c..45117439 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -19,24 +19,29 @@ please prepare the tabular data for your dataset as a single, tab-separated file A valid dataset for Neurobagel **MUST** include a TSV file that describes participant attributes. -The TSV MUST: +The TSV MUST contain: -- MUST contain a minimum of two columns -- MUST contain at least one column with subject IDs +- A minimum of two columns +- At least one column containing subject IDs ??? note "Only one subject ID column can be annotated" Neurobagel currently does not support annotating multiple subject ID columns so you must choose one as the primary ID during annotation -- MUST contain at least one column that describes demographic or other phenotypic information -- MAY contain a column with session IDs if the dataset is longitudinal. +- At least one column that describes demographic or other phenotypic information +- Unique values in the subject ID column or unique combinations of IDs if both subject and session ID columns are present +The TSV MAY contain: + +- A column with session IDs, e.g. if the dataset is longitudinal + ??? note "Only one session ID column can be annotated" Neurobagel currently does not support annotating multiple session ID columns so you must choose one as the primary ID during annotation - -- MUST NOT contain any missing values in the subject ID and session ID (if available) column -- MUST have unique values in the subject ID column OR in the combination of subject ID and session ID columns + +The TSV MUST **NOT** contain: + +- Missing values in the subject ID and session ID (if available) columns For all variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md). From 6af44c05b2654f65ce356d49d72a6c813d326d88 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Tue, 24 Sep 2024 13:47:29 -0400 Subject: [PATCH 11/11] Update docs/data_prep.md Co-authored-by: Alyssa Dai --- docs/data_prep.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data_prep.md b/docs/data_prep.md index 45117439..d2c31b31 100644 --- a/docs/data_prep.md +++ b/docs/data_prep.md @@ -41,7 +41,7 @@ The TSV MAY contain: The TSV MUST **NOT** contain: -- Missing values in the subject ID and session ID (if available) columns +- Missing values in the columns you plan to annotate as containing the primary subject IDs and session IDs (if available) For all variables currently modeled by Neurobagel, see the [data dictionary section](dictionaries.md).