Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: cloud paths checking against patterns #1094

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

akhanf
Copy link
Contributor

@akhanf akhanf commented Oct 14, 2024

The _check_path_matches_patterns function was failing for cloud (e.g. s3, gc3) URIs (added in #1074):

  1. the path.relative_to() function doesn't work for cloud URIs
    (it just returns the original).
  2. combining Path('/') with the result of relative_to() now (ie since universal_pathlib > 0.2.3) throws
    an error when using cloud paths.

So if using universal_pathlib >0.2.3, any cloud URIs as bids datasets would throw an error message.

This fix drops the uri prefix by using .path, so that the regex check works as expected.

The _check_path_matches_patterns function was failing for cloud (e.g.
s3, gc3) URIs.

 1) the path.relative_to() function doesn't work for cloud URIs
(it just returns the original).
 2) combining Path('/') with the result of relative_to now also throws
an error when using cloud (or non-posix) paths.

This fix drops the uri prefix by using .path to get the posix-like
part of the path, so that the regex check works as expected.
@akhanf akhanf changed the title fix for cloud paths FIX: cloud paths checking against patterns Oct 14, 2024
Copy link

codecov bot commented Oct 14, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.77%. Comparing base (eb2f926) to head (c9f7c1f).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
bids/layout/tests/test_remote_bids.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1094      +/-   ##
==========================================
- Coverage   89.79%   89.77%   -0.02%     
==========================================
  Files          63       64       +1     
  Lines        7141     7152      +11     
  Branches     1367      835     -532     
==========================================
+ Hits         6412     6421       +9     
- Misses        531      534       +3     
+ Partials      198      197       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

was just for local use
@effigies
Copy link
Collaborator

Could you add tests with examples? That will make it easier to be explicit about what was going wrong and allow us to consider other solutions quickly.

The check if in a derivatives folder was not working when the root had a
uri prefix. This fixes it so the uri prefix is stripped off for the
check.
Checks the number of files in an example openneuro s3 dataset. Also adds
s3fs as a test dependency.
@akhanf
Copy link
Contributor Author

akhanf commented Oct 26, 2024

Yes good idea, I've added one simple test using an s3 uri from openneuro, and also made another minor fix.

The test passes locally, but I see now it is failing in the CI as botocore can't find the credentials file.. I'll see if there is a good workaround for this later..

For reference: running the test on the code before this PR gives this error:


test_remote_bids.py F                                                                                                        [100%]

============================================================= FAILURES =============================================================
____________________________ test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136] ____________________________

dataset = 's3://openneuro.org/ds000102', nb_files = 136

    @pytest.mark.parametrize(
        "dataset, nb_files",
        [
            ("s3://openneuro.org/ds000102", 136),
        ],
    )
    def test_layout_on_s3_datasets_no_derivatives(dataset, nb_files):
>       layout = BIDSLayout(dataset)

test_remote_bids.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../layout.py:177: in __init__
    _indexer(self)
../index.py:148: in __call__
    all_bfs, all_tag_dicts = self._index_dir(self._layout._root, self._config)
../index.py:239: in _index_dir
    if force or self._validate_file(abs_fn):
../index.py:162: in _validate_file
    matched_patt = _validate_path(
../index.py:64: in _validate_path
    if _check_path_matches_patterns(path, excl_patt, root=root):
../index.py:49: in _check_path_matches_patterns
    path = Path("/") / path.relative_to(root)
/usr/lib/python3.12/pathlib.py:721: in __truediv__
    return self.joinpath(key)
/usr/lib/python3.12/pathlib.py:717: in joinpath
    return self.with_segments(self, *pathsegments)
../../../.venv/lib/python3.12/site-packages/upath/core.py:560: in with_segments
    return type(self)(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[AttributeError('drive') raised in repr()] PosixUPath object at 0x7fca4b13de40>, protocol = ''
args = (PosixUPath('/'), S3Path('s3://openneuro.org/ds000102/.gitattributes')), storage_options = {}, base_options = {}
args0 = PosixUPath('/')

    def __init__(
        self, *args, protocol: str | None = None, **storage_options: Any
    ) -> None:
        # allow subclasses to customize __init__ arg parsing
        base_options = getattr(self, "_storage_options", {})
        args, protocol, storage_options = type(self)._transform_init_args(
            args, protocol or self._protocol, {**base_options, **storage_options}
        )
        if self._protocol != protocol and protocol:
            self._protocol = protocol
    
        # retrieve storage_options
        if args:
            args0 = args[0]
            if isinstance(args0, UPath):
                self._storage_options = {**args0.storage_options, **storage_options}
            else:
                if hasattr(args0, "__fspath__"):
                    _args0 = args0.__fspath__()
                else:
                    _args0 = str(args0)
                self._storage_options = type(self)._parse_storage_options(
                    _args0, protocol, storage_options
                )
        else:
            self._storage_options = storage_options.copy()
    
        # check that UPath subclasses in args are compatible
        # TODO:
        #   Future versions of UPath could verify that storage_options
        #   can be combined between UPath instances. Not sure if this
        #   is really necessary though. A warning might be enough...
        if not compatible_protocol(self._protocol, *args):
>           raise ValueError("can't combine incompatible UPath protocols")
E           ValueError: can't combine incompatible UPath protocols

../../../.venv/lib/python3.12/site-packages/upath/core.py:263: ValueError
========================================================= warnings summary =========================================================
bids/layout/tests/test_remote_bids.py::test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136]
bids/layout/tests/test_remote_bids.py::test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136]
bids/layout/tests/test_remote_bids.py::test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136]
bids/layout/tests/test_remote_bids.py::test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136]
bids/layout/tests/test_remote_bids.py::test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136]
bids/layout/tests/test_remote_bids.py::test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136]
bids/layout/tests/test_remote_bids.py::test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136]
bids/layout/tests/test_remote_bids.py::test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136]
  /local/scratch/pybids/.venv/lib/python3.12/site-packages/botocore/auth.py:424: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
    datetime_now = datetime.datetime.utcnow()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================== short test summary info ======================================================
FAILED test_remote_bids.py::test_layout_on_s3_datasets_no_derivatives[s3://openneuro.org/ds000102-136] - ValueError: can't combine incompatible UPath protocols
================================================== 1 failed, 8 warnings in 1.00s ===================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants