Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uproot.recreate and uproot.update are using the colon-parsing of uproot.open, but they shouldn't #1251

Open
jpivarski opened this issue Jul 16, 2024 · 1 comment
Labels
bug The problem described is something that must be fixed

Comments

@jpivarski
Copy link
Member

% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.recreate("/tmp/a::b.root")
<WritableDirectory '/' at 0x7ceb9ec89350>
>>> 
% ls /tmp/a*
/tmp/a
% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.update("/tmp/a::b.root")
<WritableDirectory '/' at 0x759c5ff57ad0>
>>> 
% ls /tmp/a*
/tmp/a

These should create and update a file named /tmp/a::b.root, with the colons in the filename. It might get even weirder if the colons are in a directory in the full path, rather than the final filename.


This isn't even fixed by a pathlib.Path. That's weird, because pathlib.Path is the way to turn off colon-parsing in uproot.open.

% rm /tmp/a*
rm: remove regular file '/tmp/a'? y
% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> import uproot
>>> uproot.recreate(pathlib.Path("/tmp/a::b.root"))
<WritableDirectory '/' at 0x7e3acc793350>
>>> 
% ls /tmp/a*
/tmp/a

I thought maybe it wasn't the colon-parsing code, but maybe URL-parsing (since files can now be written remotely). But no, that's not it:

% python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.recreate("file:///tmp/a::b.root")
<WritableDirectory '/' at 0x78f47e54e810>
>>> 
% ls /tmp/a*
/tmp/a
@jpivarski jpivarski added the bug The problem described is something that must be fixed label Jul 16, 2024
@maxgalli
Copy link
Collaborator

This seems to come from fsspec, more precisely here. It seems to me that it happens because the code doesn't distinguish between the :: being part of the file name or protocol separator.
I tried changing the first lines to this

    if "::" in path:
        x = re.compile(".*[^a-z]+.*")  # test for non protocol-like single word
        bits = []
        for p in path.split("::"):
            # Check if part looks like a protocol or URL
            if "://" in p or x.match(p) or p in known_implementations:
                bits.append(p)
            else:
                # If not, assume it is part of the file name
                bits.append(p + "://")
        
        # If no part matches a known protocol, treat the entire path as a file name
        if not any(b for b in bits if b.strip("://") in known_implementations):
            bits = [path]
    else:
        bits = [path]

and your reproducer seems to work. I will open a PR in fsspec and get feedback from the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

No branches or pull requests

2 participants