-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive Globbing at GCS bucket top level results in TypeError #431
Comments
I tried this on the same version of Python that you are on (3.10.12), and I can't repro it: Python 3.10.12 (main, Jul 5 2023, 15:02:25) [Clang 14.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.24.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from dotenv import load_dotenv, find_dotenv; load_dotenv(find_dotenv())
Out[1]: True
In [2]: from cloudpathlib import CloudPath
In [3]: list(CloudPath("gs://cloudpathlib-test-bucket/").rglob("*"))
Out[3]:
[GSPath('gs://cloudpathlib-test-bucket/test_client-test_content_type_setting'),
GSPath('gs://cloudpathlib-test-bucket/test_caching-test_persistent_mode'),
GSPath('gs://cloudpathlib-test-bucket/test_cloudpath_file_io-test_file_discovery'),
...
] Is there anything else peculiar about the bucket or the data in it? For example, does it have a file that starts with a |
Hm, I still can't find a repro on buckets that we own (which include many folders and sub folders). If possible, could you provide the local variable values at the the different levels in the callstack by stopping there with |
Sorry for the delay, I did a little bit of debugging today and this is what I found. The following layering/file is what's causing the issue What's very weird is that there are plenty of other files we have with the same exact directory structure, but this is the first that's causing this issue. rglobbing from the Here's the args at the time of failure (that is,
Up the stack I see
Hopefully this helps. |
Could you also try listing the contents of the bucket with the Google Cloud SDK Is there any chance that there is additionally a blob (file) called Unlike on a normal file system, blob storage can have both a file and a "folder" with the same name in the same location. We don't always handle this gracefully. |
I also experienced this same error as well. Although, it was not on a top level bucket for me. For what it's worth, I was using Python 3.8.16 with these pip versions:
There were over hundred thousand files within this folder prefix. I know it isn't of much help, but I couldn't exactly narrow it down to anything useful without our folder structure. While we still use the cloudpathlib for other things, we decided to just write this part using the Google Cloud APIs. |
The following works
However, when I perform this action at the bucket root, for example
I get the following error
This only happens if I do a recursive glob. If I were to simply do a
glob("*")
it retrieves the root level blob paths.The text was updated successfully, but these errors were encountered: