Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIDSLayoutIndexer's **filters argument not working as expected #1062

Open
alperkent opened this issue May 10, 2024 · 6 comments
Open

BIDSLayoutIndexer's **filters argument not working as expected #1062

alperkent opened this issue May 10, 2024 · 6 comments

Comments

@alperkent
Copy link
Contributor

Hello again! Following the fix in PR #1049 for the error mentioned in #1050, I've noticed that the BIDSLayoutIndexer is still not functioning as expected. As an example, despite setting filters for datatype and suffix, the indexer returns all entities when running the following code:

import bids

filters = {'datatype': 'eeg', 'suffix': 'eeg'}
indexer = bids.BIDSLayoutIndexer(**filters)
layout = bids.BIDSLayout('../data/bids-examples/eeg_ds000117', indexer=indexer)
layout.get_datatypes(), layout.get_suffixes()

I expected the get_datatypes() and get_suffixes() methods to return only the entities that match the filters set in the BIDSLayoutIndexer. But, the methods return all entities:

(['anat', 'eeg'],
['description',
'participants',
'T1w',
'coordsystem',
'electrodes',
'eeg',
'events',
'channels'])

While workarounds are possible using layout.get() and BIDSLayoutIndexer(ignore=...), I wanted to raise this issue to bring it to the devs' attention.

@effigies
Copy link
Collaborator

Yes, it looks like _index_file() does not respect self.filters like _index_metadata() does:

pybids/bids/layout/index.py

Lines 243 to 280 in 20007e5

def _index_file(self, abs_fn, entities):
"""Create DB record for file and its tags. """
bf = make_bidsfile(abs_fn)
# Extract entity values
match_vals = {}
for e in entities.values():
m = e.match_file(bf)
if m is None and e.mandatory:
break
if m is not None:
match_vals[e.name] = (e, m)
# Create Entity <=> BIDSFile mappings
tag_dicts = [
_create_tag_dict(bf, ent, val, ent._dtype)
for ent, val in match_vals.values()
]
return bf, tag_dicts
def _index_metadata(self):
"""Index metadata for all files in the BIDS dataset.
"""
filters = self.filters
if filters:
# ensure we are returning objects
filters['return_type'] = 'object'
if filters.get('extension'):
filters['extension'] = listify(filters['extension'])
# ensure json files are being indexed
if '.json' not in filters['extension']:
filters['extension'].append('.json')
# Process JSON files first if we're indexing metadata
all_files = self._layout.get(absolute_paths=True, **filters)

If you're interested in addressing this, it's probably worth noting that _index_metadata() should be working with a copy of self.filters, not mutating it directly.

@alperkent
Copy link
Contributor Author

alperkent commented May 11, 2024

Interestingly, after adding the if clause below to _index_file(), layout.get_entity() functions seem to be respecting the filtering but layout.get() still does not:

        # Extract entity values
        match_vals = {}
        for e in entities.values():
            m = e.match_file(bf)
            if m is None and e.mandatory:
                break
>           # Break if entities don't match the filter
>           if self.filters and e.name and e.name in self.filters and m and m not in self.filters[e.name]:
>               break
            if m is not None:
                match_vals[e.name] = (e, m)

@effigies
Copy link
Collaborator

I can't promise I'll have time to dig into this soon. Please feel free to keep working on this if you like.

It would be good to get out a new version of pybids, with Python 3.12 support. Do you consider this a blocker, and if so, how much time would you like to try to get something in before we release?

@alperkent
Copy link
Contributor Author

I don't think it is a blocker (I already implemented a workaround in my own code by using the ignore argument instead) so please feel free to move forward with the release. Would you like me to reopen the half-baked fix (#1063) so that at least the get_<entity>s functions respect the indexer filters in the new release?

@effigies
Copy link
Collaborator

If it improves things, sure. If you want to dig deeper for a more complete fix first, that's fine, too.

I am not using this feature, so I'm unlikely to dig into it myself, and I'm inclined to follow your lead as someone who uses the feature.

@alperkent
Copy link
Contributor Author

OK, I reopened it. It restores some functionality, so I think it improves things, albeit slightly (and hopefully does not break anything). I should have time to dig into it a bit more in the upcoming weeks, so feel free to move forward with the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants