Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EEG: multiple dataset files can be submitted with different extension #1900

Open
arnodelorme opened this issue Feb 27, 2024 · 7 comments
Open

Comments

@arnodelorme
Copy link

This dataset contains both .set and .vhdr files

https://nemar.org/dataexplorer/detail?dataset_id=ds003190

This should not be possible. There should be either one or the other.

@Remi-Gau
Copy link
Contributor

Relates to bids-standard/bids-specification#1487

Unless I am mistaken enforcing this would depend on updating the schema (see bids-standard/bids-specification#1492) for the deno based validator.

I doubt the legacy validator will enforce this.

@sappelhoff
Copy link
Member

I don't think we have a rule in BIDS that the data format chosen for a dataset must be consistent 🤔

although it is a bit weird to mix them in a single dataset.

@effigies
Copy link
Collaborator

effigies commented Mar 6, 2024

We do check whether someone has both .nii and .nii.gz files, as this is a somewhat common issue.

const duplicateNiftis = (files) => {
// check if same file with .nii and .nii.gz extensions is present
const issues = []
const niftiCounts = files
.map(function (val) {
return { count: 1, val: val.name.split('.')[0] }
})
.reduce(function (a, b) {
a[b.val] = (a[b.val] || 0) + b.count
return a
}, {})
const duplicates = Object.keys(niftiCounts).filter(function (a) {
return niftiCounts[a] > 1
})
for (let key of duplicates) {
const duplicateFiles = files.filter(function (a) {
return a.name.split('.')[0] === key
})
for (let file of duplicateFiles) {
issues.push(
new Issue({
code: 74,
file: file,
}),
)
}
}
return issues
}

A similar thing could be written for other formats, although it would be more complicated, because EEG has multi-file formats and files sharing the same stem isn't in itself an error.

@sappelhoff
Copy link
Member

We do check whether someone has both .nii and .nii.gz files, as this is a somewhat common issue.

but does it say in the spec, that these shouldn't be mixed?

@effigies
Copy link
Collaborator

effigies commented Mar 8, 2024

No, but it does create ambiguity about the data and frequently problems for tools expecting to retrieve a unique data file for a collection of entities.

I would support making it an explicit part of the spec, though I would not complain if it happened after @Remi-Gau's schema changes were incorporated and supported by the schema validator.

@Remi-Gau
Copy link
Contributor

Remi-Gau commented Mar 8, 2024

I think we had added this section in the spec, no?

https://bids-specification.readthedocs.io/en/stable/common-principles.html#uniqueness-of-data-files

But I need to get back to finishing the schema pr.

@sappelhoff
Copy link
Member

I think we had added this section in the spec, no?

what we added there was:

  • there MUST NOT be data_a.jpg and data_a.tif in the same dataset

However, here we have data_a.jpg and data_b.tif as a situation 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants