diff --git a/README.md b/README.md index 00db0ff..588ff4a 100755 --- a/README.md +++ b/README.md @@ -2,6 +2,8 @@ Written by the OHSU ABCD site for selectively downloading ABCD Study imaging DICOM data QC'ed as good by the ABCD DAIC site, converting it to BIDS standard input data, selecting the best pair of spin echo field maps, and correcting the sidecar JSON files to meet the BIDS Validator specification. +*Note: DWI has been added to the list of modalities that can be downloaded. This has resulted in a couple important changes to the scripts included here and the output BIDS data. Most notabaly fieldmaps now include an acquisition field in their filenames to differentiate those used for functional images and those used for DWI (e.g. ..._acq-func_... or ..._acq-dwi_...). Data uploaded to [Collection 3165](https://github.com/ABCD-STUDY/nda-abcd-collection-3165), which was created using this repository, does not contain this identifier.* + ## Installation Clone this repository and save it somewhere on the Linux system you want to do ABCD DICOM downloads and conversions to BIDS on. @@ -12,6 +14,7 @@ Clone this repository and save it somewhere on the Linux system you want to do A 1. [MathWorks MATLAB Runtime Environment (MRE) version 9.1 (R2016b)](https://www.mathworks.com/products/compiler/matlab-runtime.html) 1. [cbedetti Dcm2Bids](https://github.com/cbedetti/Dcm2Bids) (`export` into your BASH `PATH` variable) 1. [Rorden Lab dcm2niix](https://github.com/rordenlab/dcm2niix) (`export` into your BASH `PATH` variable) +1. [dcmdump](https://dicom.offis.de/dcmtk.php.en) (`export` into your BASH `PATH` variable) 1. [zlib's pigz-2.4](https://zlib.net/pigz) (`export` into your BASH `PATH` variable) 1. Docker (see documentation for [Docker Community Edition for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/)) 1. [FMRIB Software Library (FSL) v5.0](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FslInstallation) @@ -75,6 +78,10 @@ This wrapper will create a temporary folder (`temp/` by default) with hundreds o `--temp`: By default, the temporary files will be created in the `temp/` subdirectory of the clone of this repo. If the user wants to place the temporary files anywhere else, then they can do so using the optional `--temp` flag followed by the path at which to create the directory containing temp files, e.g. `--temp /usr/home/abcd2bids-temporary-folder`. A folder will be created at the given path if one does not already exist. +`--subject-list`: By default, all subjects will be downloaded and converted. If only a subset of subjects are desired then specify a path to a .txt file containing a list of subjects (each on their own line) to download. If none is provided this script will attempt to download and convert every subject, which may take weeks to complete. It is recommended to run in parallel on batches of subjects. + +`--modalities`: By default, the wrapper will download all modalities from each subject. This is equivalent to `--modalities ['anat', 'func', 'dwi']`. If only certain modalities should be downloaded for a subject then provide a list, e.g. `--modalities ['anat', 'func']` + `--download`: By default, the wrapper will download the ABCD data to the `raw/` subdirectory of the cloned folder. If the user wants to download the ABCD data to a different directory, they can use the `--download` flag, e.g. `--download ~/abcd-dicom2bids/ABCD-Data-Download`. A folder will be created at the given path if one does not already exist. `--qc`: Path to the Quality Control (QC) spreadsheet file downloaded from the NDA. By default, the wrapper will use the `abcd_fastqc01.txt` file in the `spreadsheets` directory. @@ -112,11 +119,11 @@ Next, the wrapper will produce a download list for the Python & BASH portion to **NOTE:** This step can take over two hours to complete. -### 1. (Python) `good_bad_series_parser.py` +### 1. (Python) `aws_downloader.py` -Once `ABCD_good_and_bad_series_table.csv` is successfully created, the wrapper will run `src/good_bad_series_parser.py` with this repository's cloned folder as the present working directory to download the ABCD data from the NDA website. It requires the `ABCD_good_and_bad_series_table.csv` spreadsheet under a `spreadsheets` subdirectory of this repository's cloned folder. +Once `ABCD_good_and_bad_series_table.csv` is successfully created, the wrapper will run `src/aws_downloader.py` with this repository's cloned folder as the present working directory to download the ABCD data from the NDA website. It requires the `ABCD_good_and_bad_series_table.csv` spreadsheet under a `spreadsheets` subdirectory of this repository's cloned folder. -`src/good_bad_series_parser.py` also requires a valid NDA token in the `.aws/` folder in the user's `home/` directory. If successful, this will download the ABCD data from the NDA site into the `raw/` subdirectory of the clone of this repo. If the download crashes and shows errors about `awscli`, try making sure you have the [latest AWS CLI installed](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html), and that the [`aws` executable is in your BASH `PATH` variable](https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html#install-linux-path). +`src/aws_downloader.py` also requires a valid NDA token in the `.aws/` folder in the user's `home/` directory. If successful, this will download the ABCD data from the NDA site into the `raw/` subdirectory of the clone of this repo. If the download crashes and shows errors about `awscli`, try making sure you have the [latest AWS CLI installed](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html), and that the [`aws` executable is in your BASH `PATH` variable](https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html#install-linux-path). ### 2. (BASH) `unpack_and_setup.sh` @@ -167,6 +174,7 @@ This wrapper relies on the following other projects: - [zlib's pigz-2.4](https://zlib.net/pigz) - [Official BIDS validator](https://github.com/bids-standard/bids-validator) - [NDA AWS token generator](https://github.com/NDAR/nda_aws_token_generator) +- [dcmdump](https://dicom.offis.de/dcmtk.php.en) ## Meta diff --git a/abcd2bids.py b/abcd2bids.py index 747c90b..38e5b81 100644 --- a/abcd2bids.py +++ b/abcd2bids.py @@ -52,10 +52,12 @@ SERIES_TABLE_PARSER = os.path.join(PWD, "src", "good_bad_series_parser.py") SPREADSHEET_DOWNLOAD = os.path.join(PWD, "spreadsheets", "ABCD_good_and_bad_series_table.csv") + SPREADSHEET_QC = os.path.join(PWD, "spreadsheets", "abcd_fastqc01.txt") TEMP_FILES_DIR = os.path.join(PWD, "temp") UNPACK_AND_SETUP = os.path.join(PWD, "src", "unpack_and_setup.sh") UNPACKED_FOLDER = os.path.join(PWD, "data") +MODALITIES = ['anat', 'func', 'dwi'] def main(): @@ -196,6 +198,30 @@ def get_cli_args(): "spreadsheet.".format(SPREADSHEET_QC)) ) + # Optional: Subject list + parser.add_argument( + "-l", + "--subject-list", + dest="subject_list", + type=validate_readable_file, + required=True, + help=("Path to a .txt file containing a list of subjects to download. " + "The default is to download all available subjects.") + ) + + # Optional: Modalities + parser.add_argument( + "-m", + "--modalities", + choices=MODALITIES, + nargs="+", + dest="modalities", + default=MODALITIES, + help=("List of the imaging modalities that should be downloaded for " + "each subject. The default is to download all modalities. " + "The possible selections are {}".format(MODALITIES)) +) + # Optional: During unpack_and_setup, remove unprocessed data parser.add_argument( "-rm", @@ -243,6 +269,17 @@ def get_cli_args(): "then the user will be prompted for their NDA password.") ) + parser.add_argument( + "-z", + "--docker-cmd", + type=str, + dest="docker_cmd", + default=None, + help=("A necessary docker command replacement on HPCs like " + "the one at OHSU, which has it's own special wrapper for" + "docker for security reasons. Example: '/opt/acc/sbin/exadocker'") + ) + # Parse, validate, and return all CLI args return validate_cli_args(parser.parse_args(), parser) @@ -573,8 +610,12 @@ def download_nda_data(cli_args): with downloaded NDA data. :return: N/A """ - subprocess.check_call(("python3", SERIES_TABLE_PARSER, cli_args.download, - SPREADSHEET_DOWNLOAD)) + subprocess.check_call(("python3", "--version")) + subprocess.check_call(("python3", + SERIES_TABLE_PARSER, + "--download-dir", cli_args.download, + "--subject-list", cli_args.subject_list, + "--modalities", ','.join(cli_args.modalities))) def unpack_and_setup(cli_args): @@ -655,9 +696,14 @@ def validate_bids(cli_args): :return: N/A """ try: - subprocess.check_call(("docker", "run", "-ti", "--rm", "-v", - cli_args.output + ":/data:ro", "bids/validator", - "/data")) + if cli_args.docker_cmd: + subprocess.check_call(('sudo', cli_args.docker_cmd, "run", "-ti", "--rm", "-v", + cli_args.output + ":/data:ro", "bids/validator", + "/data")) + else: + subprocess.check_call(("docker", "run", "-ti", "--rm", "-v", + cli_args.output + ":/data:ro", "bids/validator", + "/data")) except subprocess.CalledProcessError: print("Error: BIDS validation failed.") diff --git a/abcd_dcm2bids.conf b/abcd_dcm2bids.conf index 9861941..e4cd013 100644 --- a/abcd_dcm2bids.conf +++ b/abcd_dcm2bids.conf @@ -1,9 +1,75 @@ { "descriptions": [ + { + "dataType": "dwi", + "modalityLabel": "dwi", + "criteria": { + "SeriesDescription": "ABCD-DTI_SIEMENS_mosaic_original_(baseline_year_1_arm_1)" + } + }, + { + "dataType": "dwi", + "modalityLabel": "dwi", + "criteria": { + "SeriesDescription": "ABCD-DTI_PHILIPS_original_(baseline_year_1_arm_1)" + } + }, + { + "dataType": "dwi", + "modalityLabel": "dwi", + "criteria": { + "SeriesDescription": "ABCD-DTI_GE_original_(baseline_year_1_arm_1)" + } + }, + { + "dataType": "fmap", + "modalityLabel": "epi", + "customLabels": "acq-dwi_dir-AP", + "intendedFor": 0, + "criteria": { + "SeriesDescription": "ABCD-Diffusion-FM-AP_SIEMENS_original_(baseline_year_1_arm_1)" + } + }, + { + "dataType": "fmap", + "modalityLabel": "epi", + "customLabels": "acq-dwi_dir-AP", + "intendedFor": 1, + "criteria": { + "SeriesDescription": "ABCD-Diffusion-FM-AP_PHILIPS_original_(baseline_year_1_arm_1)" + } + }, + { + "dataType": "fmap", + "modalityLabel": "epi", + "customLabels": "acq-dwi_dir-AP", + "intendedFor": 2, + "criteria": { + "SeriesDescription": "ABCD-Diffusion-FM_GE_original_(baseline_year_1_arm_1)" + } + }, + { + "dataType": "fmap", + "modalityLabel": "epi", + "customLabels": "acq-dwi_dir-PA", + "intendedFor": 0, + "criteria": { + "SeriesDescription": "ABCD-Diffusion-FM-PA_SIEMENS_original_(baseline_year_1_arm_1)" + } + }, + { + "dataType": "fmap", + "modalityLabel": "epi", + "customLabels": "acq-dwi_dir-PA", + "intendedFor": 1, + "criteria": { + "SeriesDescription": "ABCD-Diffusion-FM-PA_PHILIPS_original_(baseline_year_1_arm_1)" + } + }, { "dataType": "fmap", "modalityLabel": "epi", - "customLabels": "dir-AP", + "customLabels": "acq-func_dir-AP", "criteria": { "SeriesDescription": "ABCD-fMRI-FM-AP_SIEMENS_original_(baseline_year_1_arm_1)" } @@ -11,7 +77,7 @@ { "dataType": "fmap", "modalityLabel": "epi", - "customLabels": "dir-PA", + "customLabels": "acq-func_dir-PA", "criteria": { "SeriesDescription": "ABCD-fMRI-FM-PA_SIEMENS_original_(baseline_year_1_arm_1)" } @@ -81,7 +147,7 @@ { "dataType": "fmap", "modalityLabel": "epi", - "customLabels": "dir-PA", + "customLabels": "acq-func_dir-PA", "criteria": { "SeriesDescription": "ABCD-fMRI-FM-PA_PHILIPS_original_(baseline_year_1_arm_1)" } @@ -89,7 +155,7 @@ { "dataType": "fmap", "modalityLabel": "epi", - "customLabels": "dir-AP", + "customLabels": "acq-func_dir-AP", "criteria": { "SeriesDescription": "ABCD-fMRI-FM-AP_PHILIPS_original_(baseline_year_1_arm_1)" } @@ -143,7 +209,7 @@ { "dataType": "fmap", "modalityLabel": "epi", - "customLabels": "dir-both", + "customLabels": "acq-func_dir-both", "criteria": { "SeriesDescription": "ABCD-fMRI-FM_GE_original_(baseline_year_1_arm_1)" } diff --git a/src/aws_downloader.py b/src/aws_downloader.py new file mode 100755 index 0000000..c02cc0d --- /dev/null +++ b/src/aws_downloader.py @@ -0,0 +1,312 @@ +#! /usr/bin/env python3 + + +import pandas as pd +import csv +import subprocess +import os +import sys +import argparse + +####################################### +# Read in ABCD_good_and_bad_series_table.csv (renamed to ABCD_operator_QC.csv) that is continually updated +# Create a log of all subjects that have been checked +# If they are not able to be processed report what is wrong with them +# +####################################### + +prog_descrip='test downloader' + +QC_CSV = os.path.join(os.path.dirname(os.path.dirname( + os.path.abspath(__file__))), "spreadsheets", + "ABCD_good_and_bad_series_table.csv") +YEARS = ['baseline_year_1_arm_1', '2_year_follow_up_y_arm_1'] +MODALITIES = ['anat', 'func', 'dwi'] + +def generate_parser(parser=None): + + if not parser: + parser = argparse.ArgumentParser( + description=prog_descrip + ) + parser.add_argument( + '-q', + '--qc-csv', + dest='qc_csv', + default=QC_CSV, + help='Path to the csv file containing aws paths and operator QC info' +) + parser.add_argument( + '-d', + '--download-dir', + dest='download_dir', + default='./new_download', + help='Path to where the subjects should be downloaded to.' +) + parser.add_argument( + '-s', + '--subject-list', + dest='subject_list', + required=True, + help='Path to a text file containing a list of subject IDs' +) + parser.add_argument( + '-y', + '--sessions', + choices=YEARS, + nargs='+', + dest='year_list', + default=['baseline_year_1_arm_1'], + help='List the years that images should be downloaded from' +) + parser.add_argument( + '-m', + '--modalities', +# choices=MODALITIES, +# nargs='+', + dest='modalities', + default=MODALITIES, + help="List the modalities that should be downloaded. Default: ['anat', 'func', 'dwi']" +) + + return parser + +def main(argv=sys.argv): + parser = generate_parser() + args = parser.parse_args() + + # Logging variables + num_sub_visits = 0 + num_t1 = 0 + num_rsfmri = 0 + num_sst = 0 + num_mid = 0 + num_nback = 0 + num_t2 = 0 + num_dti = 0 + + + series_csv = args.qc_csv + if args.subject_list: + f = open(args.subject_list, 'r') + x = f.readlines() + f.close + subject_list = [sub.strip() for sub in x] + log = os.path.join(os.path.dirname(args.subject_list), os.path.splitext(os.path.basename(args.subject_list))[0] + "_download_log.csv") + year_list = args.year_list + modalities = args.modalities + if isinstance(modalities, str): + modalities = modalities.split(',') + download_dir = args.download_dir + + print("aws_downloader.py command line arguments:") + print(" QC spreadsheet: {}".format(series_csv)) + print(" Subject List : {}".format(subject_list)) + print(" Year : {}".format(year_list)) + print(" Modalities : {}".format(modalities)) + + with open(log, 'w') as f: + writer = csv.writer(f) + + # Read csv as pandas dataframe, drop duplicate entries, sort, and group by subject/visit + series_df = pd.read_csv(series_csv) + + # If subject list is provided + #subject_list = ['sub-NDARINV019DXLU4', 'NDAR_INV353XG4XH'] + # Get list of all unique subjects if not provided + # subject_list = series_df.pGUID.unique() + #year_list = ['baseline_year_1_arm_1'] + # Get list of all years if not provided + # year_list = series_df.EventName.unique() + + uid_start = "INV" + for sub in subject_list: + uid = sub.split(uid_start, 1)[1] + pguid = 'NDAR_INV' + ''.join(uid) + bids_id = 'sub-NDARINV' + ''.join(uid) + subject_df = series_df[series_df['pGUID'] == pguid] + for year in year_list: + sub_ses_df = subject_df[subject_df['EventName'] == year] + sub_pass_QC_df = sub_ses_df[sub_ses_df['QC'] == 1.0] + file_paths = [] + ### Logging information + # initialize logging variables + has_t1 = 0 + has_t2 = 0 + has_sefm = 0 + has_rsfmri = 0 + has_mid = 0 + has_sst = 0 + has_nback = 0 + has_dti = 0 + + num_sub_visits += 1 + tgz_dir = os.path.join(download_dir, bids_id, year) + print("Checking QC data for valid images for {} {}.".format(bids_id, year)) + os.makedirs(tgz_dir, exist_ok=True) + + if 'anat' in modalities: + (file_paths, has_t1, has_t2) = add_anat_paths(sub_pass_QC_df, file_paths) + if 'func' in modalities: + (file_paths, has_sefm, has_rsfmri, has_mid, has_sst, has_nback) = add_func_paths(sub_pass_QC_df, file_paths) + if 'dwi' in modalities: + (file_paths, has_dti) = add_dwi_paths(sub_pass_QC_df, file_paths) + + + + # TODO: log subject level information + print(' t1=%s, t2=%s, sefm=%s, rsfmri=%s, mid=%s, sst=%s, nback=%s, has_dti=%s' % (has_t1, has_t2, has_sefm, has_rsfmri, has_mid, has_sst, has_nback, has_dti)) + writer.writerow([bids_id, year, has_t1, has_t2, has_sefm, has_rsfmri, has_mid, has_sst, has_nback, has_dti]) + + if has_t1 != 0: + num_t1 += 1 + if has_t2 != 0: + num_t2 += 1 + if has_rsfmri != 0: + num_rsfmri += 1 + if has_mid != 0: + num_mid += 1 + if has_sst != 0: + num_sst += 1 + if has_nback != 0: + num_nback += 1 + if has_dti != 0: + num_dti += 1 + for i in file_paths: + tgz_name = os.path.basename(i) + tgz_path = tgz_dir + '/' + tgz_name + if os.path.exists(tgz_path): + print("{} already exists".format(tgz_path)) + continue + else: + aws_cmd = ["aws", "s3", "cp", i, tgz_dir + "/", "--profile", "NDA"] + subprocess.call(aws_cmd) + + + print("There are %s subject visits" % num_sub_visits) + print("number of subjects with a T1 : %s" % num_t1) + print("number of subjects with a T2 : %s" % num_t2) + print("number of subjects with rest : %s" % num_rsfmri) + print("number of subjects with mid : %s" % num_mid) + print("number of subjects with sst : %s" % num_sst) + print("number of subjects with nBack: %s" % num_nback) + print("number of subjects with dti : %s" % num_dti) + + + +def add_anat_paths(passed_QC_group, file_paths): + ## Check if T1_NORM exists and download that instead of just T1 + # If there is a T1_NORM in the df of good T1s then use it. Else just use good T1 + T1_df = passed_QC_group[passed_QC_group['image_description'] == 'ABCD-T1-NORM'] + if T1_df.empty: + T1_df = passed_QC_group[passed_QC_group['image_description'] == 'ABCD-T1'] + if T1_df.empty: + has_t1 = 0 # No T1s. Invalid subject + else: + for file_path in T1_df['image_file']: + file_paths += [file_path] + has_t1 = T1_df.shape[0] + else: + for file_path in T1_df['image_file']: + file_paths += [file_path] + has_t1 = T1_df.shape[0] + + T2_df = passed_QC_group[passed_QC_group['image_description'] == 'ABCD-T2-NORM'] + if T2_df.empty: + T2_df = passed_QC_group[passed_QC_group['image_description'] == 'ABCD-T2'] + if T2_df.empty: + has_t2 = 0 # No T2s + else: + for file_path in T2_df['image_file']: + file_paths += [file_path] + has_t2 = T2_df.shape[0] + else: + for file_path in T2_df['image_file']: + file_paths += [file_path] + has_t2 = T2_df.shape[0] + return (file_paths, has_t1, has_t2) + +def add_func_paths(passed_QC_group, file_paths): + ## Pair SEFMs and only download if both pass QC + # Check first if just the FM exists + FM_df = passed_QC_group[passed_QC_group['image_description'] == 'ABCD-fMRI-FM'] + if FM_df.empty: + FM_AP_df = passed_QC_group[passed_QC_group['image_description'] == 'ABCD-fMRI-FM-AP'] + FM_PA_df = passed_QC_group[passed_QC_group['image_description'] == 'ABCD-fMRI-FM-PA'] + if FM_AP_df.shape[0] != FM_PA_df.shape[0] or FM_AP_df.empty: + has_sefm = 0 # No SEFMs. Invalid subject + else: + for i in range(0, FM_AP_df.shape[0]): + if FM_AP_df.iloc[i]['QC'] == 1.0 and FM_PA_df.iloc[i]['QC'] == 1.0: + FM_df = FM_df.append(FM_AP_df.iloc[i]) + FM_df = FM_df.append(FM_PA_df.iloc[i]) + if FM_df.empty: + has_sefm = 0 # No SEFMs. Invalid subject + else: + for file_path in FM_df['image_file']: + file_paths += [file_path] + has_sefm = FM_df.shape[0] + + + ## List all rsfMRI scans that pass QC + RS_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-rsfMRI'] + if RS_df.empty: + has_rsfmri = 0 + else: + for file_path in RS_df['image_file']: + file_paths += [file_path] + has_rsfmri = RS_df.shape[0] + + ## List only download task iff their is a pair of scans for the task that passed QC + MID_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-MID-fMRI'] + if MID_df.shape[0] != 2: + has_mid = MID_df.shape[0] + else: + for file_path in MID_df['image_file']: + file_paths += [file_path] + has_mid = MID_df.shape[0] + SST_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-SST-fMRI'] + if SST_df.shape[0] != 2: + has_sst = SST_df.shape[0] + else: + for file_path in SST_df['image_file']: + file_paths += [file_path] + has_sst = SST_df.shape[0] + nBack_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-nBack-fMRI'] + if nBack_df.shape[0] != 2: + has_nback = nBack_df.shape[0] + else: + for file_path in nBack_df['image_file']: + file_paths += [file_path] + has_nback = nBack_df.shape[0] + + return (file_paths, has_sefm, has_rsfmri, has_mid, has_sst, has_nback) + + +def add_dwi_paths(passed_QC_group, file_paths): + DTI_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-DTI'] + # There should only be a single DTI run that passes QC. If more than it requires investigation + if DTI_df.shape[0] >= 1: + # If a DTI exists then download all passing DTI fieldmaps + DTI_FM_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-Diffusion-FM'] + DTI_FM_df = DTI_FM_df.tail(1) + if DTI_FM_df.empty: + DTI_FM_AP_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-Diffusion-FM-AP'] + if DTI_FM_AP_df.empty: + return (file_paths, 0) + DTI_FM_PA_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-Diffusion-FM-PA'] + DTI_FM_df = DTI_FM_AP_df.tail(1) + DTI_FM_df = DTI_FM_df.append(DTI_FM_PA_df.tail(1)) + if not DTI_FM_df.empty: + for file_path in DTI_df.tail(1)['image_file']: + file_paths += [file_path] + for file_path in DTI_FM_df['image_file']: + file_paths += [file_path] + has_dti = DTI_df.shape[0] + else: + has_dti = DTI_df.shape[0] + + return (file_paths, has_dti) + +if __name__ == "__main__": + main() diff --git a/src/correct_jsons.py b/src/correct_jsons.py index 958a414..2f3d51e 100755 --- a/src/correct_jsons.py +++ b/src/correct_jsons.py @@ -77,6 +77,7 @@ def main(argv=sys.argv): if ext == '.json': json_path = os.path.join(root, filename) + #print(json_path) with open(json_path, 'r') as f: data = json.load(f) diff --git a/src/good_bad_series_parser.py b/src/good_bad_series_parser.py deleted file mode 100755 index b9c63f7..0000000 --- a/src/good_bad_series_parser.py +++ /dev/null @@ -1,218 +0,0 @@ -#! /usr/bin/env python3 - - -import pandas as pd -import csv -import subprocess -import os -import sys - -####################################### -# Read in ABCD_good_and_bad_series_table.csv that is continually updated -# Create a log of all subjects that have been checked -# If they are not able to be processed report what is wrong with them -# -####################################### - - -# Logging variables -num_sub_visits = 0 -# num_siemens = 0 -# num_ge = 0 -# num_philips = 0 -num_rsfmri = 0 -num_sst = 0 -num_mid = 0 -num_nback = 0 -num_t2 = 0 -num_invalid = 0 -num_valid = 0 -num_subjects_after_checks = 0 - -# Get download folder name. Use one entered from command line if it exists; -# otherwise use "./new_download". Added by Greg Conan 2019-06-06 -if len(sys.argv) > 1: - new_download_dir = sys.argv[1] - if len(sys.argv) > 2: # added 2019-11-07 - series_csv = sys.argv[2] - else: - series_csv = os.path.join(os.path.dirname(os.path.dirname( - os.path.abspath(__file__))), "spreadsheets", - "ABCD_good_and_bad_series_table.csv" - ) -elif len(sys.argv) < 1: - new_download_dir = './new_download/' - -with open('abcd_download_log.csv','w') as f: - writer = csv.writer(f) - - # Read csv as pandas dataframe, drop duplicate entries, sort, and group by subject/visit - series_df = pd.read_csv(series_csv) - subject_dfs = series_df.drop_duplicates().sort_values(by='SeriesTime', ascending=True).groupby(["pGUID", "EventName"]) - - for name, group in subject_dfs: - - ### Logging information - # initialize logging variables - has_t1 = 0 - has_t2 = 0 - has_sefm = 0 - has_rsfmri = 0 - has_mid = 0 - has_sst = 0 - has_nback = 0 - - # TODO: Add pGUID and EventName (Subject ID and Visit) to csv for logging information - num_sub_visits += 1 - - # TODO: Create tgz directory if it doesn't already exist - sub_id = name[0] - visit = name[1] - sub = "sub-" + sub_id.replace("_","") - #print(sub_id, visit) - tgz_dir = os.path.join('./download', sub, visit) - new_tgz_dir = os.path.join(new_download_dir, sub, visit) - if os.path.exists(tgz_dir): - print("{0} already exists from old download. Updating now.".format(name)) - #continue - elif os.path.exists(new_tgz_dir): - print("{0} already exists from the most recent download. Updating now.".format(name)) - tgz_dir = new_tgz_dir - else: - print("{0} downloading now.".format(name)) - tgz_dir = new_tgz_dir - os.makedirs(tgz_dir) - - ### Get ready to download only good QC'd data that passes all of our criteria ! - - passed_QC_group = group.loc[group['QC'] == 1.0] - - file_paths = [] - - ### Identify valid scans - # Download only T1, T2, fMRI_FM_PA, fMRI_FM_AP, fMRI_FM, rsfMRI, fMRI_MID_task, fMRI_SST_task, fMRI_nBack_task - - ## Check if T1_NORM exists and download that instead of just T1 - # If there is a T1_NORM in the df of good T1s then use it. Else just use good T1 - T1_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-T1-NORM'] - if T1_df.empty: - T1_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-T1'] - if T1_df.empty: - has_t1 = 0 # No T1s. Invalid subject - else: - for file_path in T1_df['image_file']: - file_paths += [file_path] - has_t1 = T1_df.shape[0] - else: - for file_path in T1_df['image_file']: - file_paths += [file_path] - has_t1 = T1_df.shape[0] - - T2_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-T2-NORM'] - if T2_df.empty: - T2_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-T2'] - if T2_df.empty: - has_t2 = 0 # No T2s - else: - for file_path in T2_df['image_file']: - file_paths += [file_path] - has_t2 = T2_df.shape[0] - else: - for file_path in T2_df['image_file']: - file_paths += [file_path] - has_t2 = T2_df.shape[0] - - ## Pair SEFMs and only download if both pass QC - # Check first if just the FM exists - FM_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-fMRI-FM'] - if FM_df.empty: - FM_AP_df = group.loc[group['image_description'] == 'ABCD-fMRI-FM-AP'] - FM_PA_df = group.loc[group['image_description'] == 'ABCD-fMRI-FM-PA'] - if FM_AP_df.shape[0] != FM_PA_df.shape[0] or FM_AP_df.empty: - has_sefm = 0 # No SEFMs. Invalid subject - else: - for i in range(0, FM_AP_df.shape[0]): - if FM_AP_df.iloc[i]['QC'] == 1.0 and FM_PA_df.iloc[i]['QC'] == 1.0: - FM_df = FM_df.append(FM_AP_df.iloc[i]) - FM_df = FM_df.append(FM_PA_df.iloc[i]) - if FM_df.empty: - has_sefm = 0 # No SEFMs. Invalid subject - else: - for file_path in FM_df['image_file']: - file_paths += [file_path] - has_sefm = FM_df.shape[0] - - - ## List all rsfMRI scans that pass QC - RS_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-rsfMRI'] - if RS_df.empty: - has_rsfmri = 0 - else: - for file_path in RS_df['image_file']: - file_paths += [file_path] - has_rsfmri = RS_df.shape[0] - - ## List only download task iff their is a pair of scans for the task that passed QC - MID_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-MID-fMRI'] - if MID_df.shape[0] != 2: - has_mid = MID_df.shape[0] - else: - for file_path in MID_df['image_file']: - file_paths += [file_path] - has_mid = MID_df.shape[0] - SST_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-SST-fMRI'] - if SST_df.shape[0] != 2: - has_sst = SST_df.shape[0] - else: - for file_path in SST_df['image_file']: - file_paths += [file_path] - has_sst = SST_df.shape[0] - nBack_df = passed_QC_group.loc[passed_QC_group['image_description'] == 'ABCD-nBack-fMRI'] - if nBack_df.shape[0] != 2: - has_nback = nBack_df.shape[0] - else: - for file_path in nBack_df['image_file']: - file_paths += [file_path] - has_nback = nBack_df.shape[0] - - # TODO: log subject level information - if has_t1 == 0: - num_invalid += 1 - print('%s: t1=%s, t2=%s, sefm=%s, rsfmri=%s, mid=%s, sst=%s, nback=%s INVALID' % (sub, has_t1, has_t2, has_sefm, has_rsfmri, has_mid, has_sst, has_nback)) - writer.writerow([sub, has_t1, has_t2, has_sefm, has_rsfmri, has_mid, has_sst, has_nback]) - else: - num_valid += 1 - print('%s: t1=%s, t2=%s, sefm=%s, rsfmri=%s, mid=%s, sst=%s, nback=%s' % (sub, has_t1, has_t2, has_sefm, has_rsfmri, has_mid, has_sst, has_nback)) - writer.writerow([sub, has_t1, has_t2, has_sefm, has_rsfmri, has_mid, has_sst, has_nback]) - if has_t2 != 0: - num_t2 += 1 - if has_rsfmri != 0: - num_rsfmri += 1 - if has_mid != 0: - num_mid += 1 - if has_sst != 0: - num_sst += 1 - if has_nback != 0: - num_nback += 1 - # subprocess.call(["./nda_aws_token_maker.py", ">", "/dev/null"]) - for i in file_paths: - tgz_name = os.path.basename(i) - tgz_path = tgz_dir + '/' + tgz_name - if os.path.exists(tgz_path): - continue - else: - aws_cmd = ["aws", "s3", "cp", i, tgz_dir + "/", "--profile", "NDA"] - #print(aws_cmd) - subprocess.call(aws_cmd) - - -print("There are %s subject visits" % num_sub_visits) -print("%s are valid. %s are invalid" % (num_valid, num_invalid)) -# print("%s Siemens" % num_siemens) -# print("%s Philips" % num_philips) -# print("%s GE" % num_ge) -print("number of valid subjects with a T2 : %s" % num_t2) -print("number of valid subjects with rest : %s" % num_rsfmri) -print("number of valid subjects with mid : %s" % num_mid) -print("number of valid subjects with sst : %s" % num_sst) -print("number of valid subjects with nBack: %s" % num_nback) diff --git a/src/sefm_eval_and_json_editor.py b/src/sefm_eval_and_json_editor.py index 9a70c8b..87943c1 100755 --- a/src/sefm_eval_and_json_editor.py +++ b/src/sefm_eval_and_json_editor.py @@ -102,7 +102,7 @@ def sefm_select(layout, subject, sessions, base_temp_dir, fsl_dir, mre_dir, try: len(list_pos) == len(list_neg) except: - print("Error: There are a mismatched number of SEFMs. This should never happen!") + print("ERROR in SEFM select: There are a mismatched number of SEFMs. This should never happen!") pairs = [] for pair in zip(list_pos, list_neg): @@ -171,8 +171,7 @@ def sefm_select(layout, subject, sessions, base_temp_dir, fsl_dir, mre_dir, else: insert_edit_json(pos_json, "IntendedFor", []) insert_edit_json(neg_json, "IntendedFor", []) - - + # Delete the temp directory containing all the intermediate images if not debug: rm_cmd = ['rm', '-rf', temp_dir] @@ -225,14 +224,17 @@ def seperate_concatenated_fm(bids_layout, subject, session, fsl_dir): # add required fields to the orig json as well insert_edit_json(orig_json, 'IntendedFor', []) return + def insert_edit_json(json_path, json_field, value): - with open(json_path, 'r+') as f: + with open(json_path, 'r') as f: data = json.load(f) - data[json_field] = value - f.seek(0) + if json_field in data and data[json_field] != value: + print('WARNING: Replacing {}: {} with {} in {}'.format(json_field, data[json_field], value, json_path)) + data[json_field] = value + with open(json_path, 'w') as f: json.dump(data, f, indent=4) - f.truncate + return diff --git a/src/unpack_and_setup.sh b/src/unpack_and_setup.sh index d4d62c6..21295f2 100755 --- a/src/unpack_and_setup.sh +++ b/src/unpack_and_setup.sh @@ -82,6 +82,8 @@ for tgz in ${TempSubjectDir}/*.tgz; do done + + # # IMPORTANT PATH DEPENDENCY VARIABLES AT OHSU IN SLURM CLUSTER # export PATH=.../anaconda2/bin:${PATH} # relevant Python path with dcm2bids # export PATH=.../mricrogl_lx/:${PATH} # relevant dcm2niix path @@ -94,19 +96,67 @@ echo ${participant} echo `date`" :RUNNING dcm2bids" dcm2bids -d ${TempSubjectDir}/DCMs/${SUB} -p ${participant} -s ${session} -c ${ABCD2BIDS_DIR}/abcd_dcm2bids.conf -o ${TempSubjectDir}/BIDS_unprocessed --forceDcm2niix --clobber -echo `date`" :CHECKING BIDS ORDERING OF EPIs" -if [[ -e ${TempSubjectDir}/BIDS_unprocessed/${SUB}/${VISIT}/func ]]; then - if [[ `${ABCD2BIDS_DIR}/src/run_order_fix.py ${TempSubjectDir}/BIDS_unprocessed ${TempSubjectDir}/bids_order_error.json ${TempSubjectDir}/bids_order_map.json --all --subject ${SUB}` == ${SUB} ]]; then - echo BIDS correctly ordered + +# replace bvals and bvecs with files supplied by the NDA +if [ -e ${TempSubjectDir}/DCMs/${SUB}/${VISIT}/dwi ]; then + echo "Replacing bvals and bvecs with files supplied by the NDA" + orig_bval=${TempSubjectDir}/BIDS_unprocessed/${SUB}/${VISIT}/dwi/${SUB}_${VISIT}_dwi.bval + orig_bvec=${TempSubjectDir}/BIDS_unprocessed/${SUB}/${VISIT}/dwi/${SUB}_${VISIT}_dwi.bvec + first_dcm=`ls ${TempSubjectDir}/DCMs/${SUB}/${VISIT}/dwi/*/*.dcm | head -n1` + if [[ `dcmdump --search 0008,0070 ${first_dcm} 2>/dev/null` == *GE* ]]; then + if dcmdump --search 0018,1020 ${first_dcm} 2>/dev/null | grep -q DV25; then + echo "Replacing GE DV25 bvals and bvecs" + echo cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/GE_bvals_DV25.txt ${orig_bval} + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/GE_bvals_DV25.txt ${orig_bval} + echo cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/GE_bvecs_DV25.txt ${orig_bvec} + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/GE_bvecs_DV25.txt ${orig_bvec} + elif dcmdump --search 0018,1020 ${first_dcm} 2>/dev/null | grep -q DV26; then + echo "Replacing GE DV26 bvals and bvecs" + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/GE_bvals_DV26.txt ${orig_bval} + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/GE_bvecs_DV26.txt ${orig_bvec} + else + echo "ERROR setting up DWI: GE software version not recognized" + exit + fi + elif [[ `dcmdump --search 0008,0070 ${first_dcm} 2>/dev/null` == *Philips* ]]; then + software_version=`dcmdump --search 0018,1020 ${first_dcm} 2>/dev/null | awk '{print $3}'` + if [[ ${software_version} == *5.3* ]]; then + echo "Replacing Philips s1 bvals and bvecs" + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/Philips_bvals_s1.txt ${orig_bval} + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/Philips_bvecs_s1.txt ${orig_bvec} + elif [[ ${software_version} == *5.4* ]]; then + echo "Replacing Philips s2 bvals and bvecs" + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/Philips_bvals_s2.txt ${orig_bval} + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/Philips_bvecs_s2.txt ${orig_bvec} + else + echo "ERROR setting up DWI: Philips software version " ${software_version} " not recognized" + exit + fi + elif [[ `dcmdump --search 0008,0070 ${first_dcm} 2>/dev/null` == *SIEMENS* ]]; then + echo "Replacing Siemens bvals and bvecs" + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/Siemens_bvals.txt ${orig_bval} + cp `dirname $0`/ABCD_Release_2.0_Diffusion_Tables/Siemens_bvecs.txt ${orig_bvec} else - echo ERROR: BIDS incorrectly ordered even after running run_order_fix.py + echo "ERROR setting up DWI: Manufacturer not recognized" exit fi -else - echo "No functional images found for subject ${SUB}. Skipping sefm_eval_and_json_editor to copy and rename source data." - exit fi + +if [[ -e ${TempSubjectDir}/BIDS_unprocessed/${SUB}/${VISIT}/func ]]; then + echo `date`" :CHECKING BIDS ORDERING OF EPIs" + i=0 + while [ "`${ABCD2BIDS_DIR}/src/run_order_fix.py ${TempSubjectDir}/BIDS_unprocessed ${TempSubjectDir}/bids_order_error.json ${TempSubjectDir}/bids_order_map.json --all --subject ${SUB}`" != ${SUB} ] && [ $i -ne 3 ]; do + ((i++)) + echo `date`" : WARNING: BIDS functional scans incorrectly ordered. Attempting to reorder. Attempt #$i" + done + if [ "`${ABCD2BIDS_DIR}/src/run_order_fix.py ${TempSubjectDir}/BIDS_unprocessed ${TempSubjectDir}/bids_order_error.json ${TempSubjectDir}/bids_order_map.json --all --subject ${SUB}`" == ${SUB} ]; then + echo `date`" : BIDS functional scans correctly ordered" + else + echo `date`" : ERROR: BIDS incorrectly ordered even after running run_order_fix.py" + exit + fi +fi # select best fieldmap and update sidecar jsons echo `date`" :RUNNING SEFM SELECTION AND EDITING SIDECAR JSONS" if [ -d ${TempSubjectDir}/BIDS_unprocessed/${SUB}/${VISIT}/fmap ]; then @@ -114,46 +164,52 @@ if [ -d ${TempSubjectDir}/BIDS_unprocessed/${SUB}/${VISIT}/fmap ]; then ${ABCD2BIDS_DIR}/src/sefm_eval_and_json_editor.py ${TempSubjectDir}/BIDS_unprocessed/ ${FSL_DIR} ${MRE_DIR} --participant-label=${participant} --output-dir $ROOT_BIDSINPUT fi +# Fix all json extra data errors +for j in ${TempSubjectDir}/BIDS_unprocessed/${SUB}/${VISIT}/*/*.json; do + mv ${j} ${j}.temp + # print only the valid part of the json back into the original json + jq '.' ${j}.temp > ${j} + rm ${j}.temp +done + + rm ${TempSubjectDir}/BIDS_unprocessed/${SUB}/ses-baselineYear1Arm1/fmap/*dir-both* 2> /dev/null || true # rename EventRelatedInformation -echo `date`" :COPY AND RENAME SOURCE DATA" srcdata_dir=${TempSubjectDir}/BIDS_unprocessed/sourcedata/${SUB}/ses-baselineYear1Arm1/func -echo $srcdata_dir -ls ${TempSubjectDir}/DCMs/${SUB}/ses-baselineYear1Arm1/func/*EventRelatedInformation.txt if ls ${TempSubjectDir}/DCMs/${SUB}/ses-baselineYear1Arm1/func/*EventRelatedInformation.txt > /dev/null 2>&1; then + echo `date`" :COPY AND RENAME SOURCE DATA" mkdir -p ${srcdata_dir} - echo "Made srcdata_dir" -fi -MID_evs=`ls ${TempSubjectDir}/DCMs/${SUB}/ses-baselineYear1Arm1/func/*MID*EventRelatedInformation.txt 2>/dev/null` -SST_evs=`ls ${TempSubjectDir}/DCMs/${SUB}/ses-baselineYear1Arm1/func/*SST*EventRelatedInformation.txt 2>/dev/null` -nBack_evs=`ls ${TempSubjectDir}/DCMs/${SUB}/ses-baselineYear1Arm1/func/*nBack*EventRelatedInformation.txt 2>/dev/null` -echo ${MID_evs} -echo ${SST_evs} -echo ${nBack_evs} -if [ `echo ${MID_evs} | wc -w` -eq 2 ]; then - i=1 - for ev in ${MID_evs}; do - cp ${ev} ${srcdata_dir}/${SUB}_ses-baselineYear1Arm1_task-MID_run-0${i}_bold_EventRelatedInformation.txt - ((i++)) - done -fi -if [ `echo ${SST_evs} | wc -w` -eq 2 ]; then - i=1 - for ev in ${SST_evs}; do - cp ${ev} ${srcdata_dir}/${SUB}_ses-baselineYear1Arm1_task-SST_run-0${i}_bold_EventRelatedInformation.txt - ((i++)) - done -fi -if [ `echo ${nBack_evs} | wc -w` -eq 2 ]; then - i=1 - for ev in ${nBack_evs}; do - cp ${ev} ${srcdata_dir}/${SUB}_ses-baselineYear1Arm1_task-nback_run-0${i}_bold_EventRelatedInformation.txt - ((i++)) - done + MID_evs=`ls ${TempSubjectDir}/DCMs/${SUB}/ses-baselineYear1Arm1/func/*MID*EventRelatedInformation.txt 2>/dev/null` + SST_evs=`ls ${TempSubjectDir}/DCMs/${SUB}/ses-baselineYear1Arm1/func/*SST*EventRelatedInformation.txt 2>/dev/null` + nBack_evs=`ls ${TempSubjectDir}/DCMs/${SUB}/ses-baselineYear1Arm1/func/*nBack*EventRelatedInformation.txt 2>/dev/null` + echo ${MID_evs} + echo ${SST_evs} + echo ${nBack_evs} + if [ `echo ${MID_evs} | wc -w` -eq 2 ]; then + i=1 + for ev in ${MID_evs}; do + cp ${ev} ${srcdata_dir}/${SUB}_ses-baselineYear1Arm1_task-MID_run-0${i}_bold_EventRelatedInformation.txt + ((i++)) + done + fi + if [ `echo ${SST_evs} | wc -w` -eq 2 ]; then + i=1 + for ev in ${SST_evs}; do + cp ${ev} ${srcdata_dir}/${SUB}_ses-baselineYear1Arm1_task-SST_run-0${i}_bold_EventRelatedInformation.txt + ((i++)) + done + fi + if [ `echo ${nBack_evs} | wc -w` -eq 2 ]; then + i=1 + for ev in ${nBack_evs}; do + cp ${ev} ${srcdata_dir}/${SUB}_ses-baselineYear1Arm1_task-nback_run-0${i}_bold_EventRelatedInformation.txt + ((i++)) + done + fi fi -echo `date`" :COPYING SOURCE AND SORTED DATA BACK: ${ROOT_BIDSINPUT}" +echo `date`" :COPYING BIDS DATA BACK: ${ROOT_BIDSINPUT}" TEMPBIDSINPUT=${TempSubjectDir}/BIDS_unprocessed/${SUB} if [ -d ${TEMPBIDSINPUT} ] ; then