Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rawdata & root (top-level) BIDS dataset #1882

Open
dipterix opened this issue Aug 4, 2024 · 2 comments
Open

rawdata & root (top-level) BIDS dataset #1882

dipterix opened this issue Aug 4, 2024 · 2 comments
Labels
question Further information is requested

Comments

@dipterix
Copy link

dipterix commented Aug 4, 2024

I feel it a bit confusing reading Source vs. raw vs. derived data section. Currently it provides an alternative way to organize BIDS data in this way.

└─ my_dataset-1/
   ├─ sourcedata 
   ├─ ... 
   ├─ rawdata/
   │  ├─ dataset_description.json 
   │  ├─ participants.tsv 
   │  ├─ sub-01/
   │  ├─ sub-02/
   │  └─ ... 
   └─ derivatives/
      ├─ pipeline_1/
      ├─ pipeline_2/
      └─ ... 

My question is: what is the top-level BIDS directory in this case? Or how do I programmatically and reliably find the subject raw data in general?

Let's say

  1. I set my_dataset-1 as my BIDS folder: it has derivatives, but I can't find dataset_description.json. There is no direct subject data within. The other folders within are not BIDS compliant
  2. If I set my_dataset-1/rawdata as my BIDS folder, then I can't find derivatives as stated in BIDS specs (putting derivatives in rawdata does not make sense at all).

Initially there was dataset_description.json requirement under my_dataset-1, but #1741 removes it.

I wonder if it's a good idea to at least put some file in to indicate that this is the root of BIDS directory. This file includes file descriptors to indicate which components in my folders are BIDS-compliant? For example, ask people to provide BIDSDatasetLinks if the folder is structured in such way:

{
	"BIDSDatasetLinks" : {
		"rawdata" : "bids::rawdata",		# default is current directory
		"derivatives" : "bids::derivatives", # this is the default
	},
}

To address #1741 (comment) , people should add .bidsignore.

@sappelhoff
Copy link
Member

what is the top-level BIDS directory in this case?

In your example, my_dataset-1/ is not a BIDS dataset. It is a directory that houses one BIDS dataset (rawdata). In your example, the derivatives directory does not have a dataset_description.json, which means that it is not a BIDS derivatives dataset. If you added a dataset_description.json there, then my_dataset-1 would house two BIDS datasets, but it would still not be a BIDS dataset in itself.

I wonder if it's a good idea to at least put some file in to indicate that this is the root of BIDS directory.

That is what dataset_description.json does

Or how do I programmatically and reliably find the subject raw data in general?

That is being done via DatasetLinks. That is, if in your derivatives you define Sources, the sources will be specified using BIDS URIs and these BIDS URIs will make reference to datasets that are specified in DatasetLinks.

@yarikoptic
Copy link
Collaborator

@dipterix please check out to-soon-be-released latest version of BIDS specification which has that exampled reworked a little: https://bids-specification.readthedocs.io/en/latest/common-principles.html#source-vs-raw-vs-derived-data

└─ my_project-1/
   ├─ sourcedata/
   │  ├─ dicoms/
   │  ├─ raw/
   │  │  ├─ sub-01/
   │  │  ├─ sub-02/
   │  │  ├─ ... 
   │  │  └─ dataset_description.json 
   │  └─ ... 
   └─ derivatives/
      ├─ pipeline_1/
      ├─ pipeline_2/
      └─ ... 

but to get closer to answering your two posed questions, please have a look at

where I argue that the entire project folder can be BIDS dataset, and then by convention sourcedata/raw would be such a "raw BIDS dataset". Note though that in principle there could be multiple raw BIDS datasets used in a project or to create another "derived raw BIDS dataset" (e.g. by combining multiple datasets into one), so such convention alone would might be not sufficient for some cases.

@yarikoptic yarikoptic added the question Further information is requested label Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants