Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS Finder ._* hidden metadata files cause pybids to crash #1069

Open
shnizzedy opened this issue Jun 20, 2024 · 1 comment
Open

MacOS Finder ._* hidden metadata files cause pybids to crash #1069

shnizzedy opened this issue Jun 20, 2024 · 1 comment

Comments

@shnizzedy
Copy link

shnizzedy commented Jun 20, 2024

Example

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start bytewhile trying to decode JSON from file […]/._sub-PA069_ses-V1W1_task-poke_run-2_bold.json
Traceback (most recent call last):
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/index.py", line 303, in load_json
    return json.load(handle)
  File "/opt/conda/envs/sdcflows/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/opt/conda/envs/sdcflows/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/sdcflows/bin/sdcflows", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/cli/main.py", line 39, in main
    parse_args(argv)
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/cli/parser.py", line 281, in parse_args
    config.from_dict(vars(opts))
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/config.py", line 589, in from_dict
    execution.load(settings)
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/config.py", line 249, in load
    cls.init()
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/config.py", line 476, in init
    cls._layout = BIDSLayout(
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/layout.py", line 177, in __init__
    _indexer(self)
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/index.py", line 154, in __call__
    self._index_metadata()
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/index.py", line 415, in _index_metadata
    file_md.update(pl())
  File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/index.py", line 305, in load_json
    raise OSError(
OSError: Error occurred while trying to decode JSON from file /ocean/projects/med220004p/shared/data_raw/vannucci/bids_raw/sub-PA069/ses-V1W1/func/._sub-PA069_ses-V1W1_task-poke_run-2_bold.json

Proposed Solution

I think these types of files (._* and .DS_Store) can be safely ignored.

Context

In analyzing someone else's read-only (to me) data, I hit this issue. I worked around it by creating a symlinked recreation of the data directory without the MacOS hidden metadata files, but I don't think that should have been necessary.

bids-validator raised errors and warnings for the dataset, but none related to these hidden metadata files as far as I can tell:

1: [ERR] Files with such naming scheme are not part of BIDS specification. This error is most commonly caused by typos in file names that make them not BIDS compatible. Please consult the specification and make sure your files are named correctly. If this is not a file naming issue (for example when including files not yet covered by the BIDS specification) you should include a ".bidsignore" file in your dataset (see https://github.com/bids-standard/bids-validator#bidsignore for details). Please note that derived (processed) data should be placed in /derivatives folder and source data (such as DICOMS or behavioural logs in proprietary formats) should be placed in the /sourcedata folder. (code: 1 - NOT_INCLUDED)
  	./sub-PA028/ses-V2W2/files.txt
  		Evidence: files.txt
  	./sub-PA070/ses-V1W1/anat/sub-PA070_ses-V2W2_acq-MPR_rec-vNavNorm_T1w.nii.gz
  		Evidence: sub-PA070_ses-V2W2_acq-MPR_rec-vNavNorm_T1w.nii.gz
2: [ERR] 'IntendedFor' field needs to point to an existing file. (code: 37 - INTENDED_FOR)
3: [ERR] You have to define 'TaskName' for this file. (code: 50 - TASK_NAME_MUST_DEFINE)
4: [ERR] Session label in the filename doesn't match with the path of the file. File seems to be saved in incorrect session directory. (code: 65 - SESSION_LABEL_IN_FILENAME_DOESNOT_MATCH_DIRECTORY)
5: [ERR] _T1w.nii[.gz] files must have exactly three dimensions.  (code: 95 - T1W_FILE_WITH_TOO_MANY_DIMENSIONS)
1: [WARN] Task scans should have a corresponding events.tsv file. If this is a resting state scan you can ignore this warning or rename the task to include the word "rest". (code: 25 - EVENTS_TSV_MISSING)
2: [WARN] Not all subjects contain the same files. Each subject should contain the same number of files with the same naming unless some files are known to be missing. (code: 38 - INCONSISTENT_SUBJECTS)
3: [WARN] Not all subjects/sessions/runs have the same scanning parameters. (code: 39 - INCONSISTENT_PARAMETERS)
4: [WARN] NIfTI file's header field for pixel dimension information empty or too short. (code: 42 - NIFTI_PIXDIM)
5: [WARN] There are files in the /stimuli directory that are not utilized in any _events.tsv file. (code: 77 - UNUSED_STIMULUS)
6: [WARN] Tabular file contains custom columns not described in a data dictionary (code: 82 - CUSTOM_COLUMN_WITHOUT_DESCRIPTION)
7: [WARN] The onset of the last event is after the total duration of the corresponding scan. This design is suspiciously long.  (code: 85 - SUSPICIOUSLY_LONG_EVENT_DESIGN)
8: [WARN] Not all subjects contain the same sessions. (code: 97 - MISSING_SESSION)
9: [WARN] The recommended file /README is missing. See Section 03 (Modality agnostic files) of the BIDS specification. (code: 101 - README_FILE_MISSING)
10: [WARN] The Authors field of dataset_description.json should contain an array of fields - with one author per field. This was triggered because there are no authors, which will make DOI registration from dataset metadata impossible. (code: 113 - NO_AUTHORS)

I get the same errors and warnings in my workaround data directory but avoid the issue in PyBIDS.

@effigies
Copy link
Collaborator

effigies commented Jul 9, 2024

I agree that we should generally ignore dotfiles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants