-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better double asterisks **
support
#1329
Better double asterisks **
support
#1329
Conversation
@martindurant @ianthomas23 I can't run all the tests on my machine locally (errors with dask and ftp). Could you trigger a CI run? |
You are failing on the range of things that we typically see break whenever you change anything about how paths work :|. In particular, you will find it difficult to reconcile what happens on windows without having a system of your own to run tests on. |
@@ -65,10 +65,6 @@ def ls(self, path, detail=False, **kwargs): | |||
else: | |||
return [posixpath.join(path, f) for f in os.listdir(path)] | |||
|
|||
def glob(self, path, **kwargs): | |||
path = self._strip_protocol(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing _strip_protocol from the flow here is likely the cause of all the windows errors, because for LocalFileSystem, it also converts windows-style paths to posix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what caused this issue: #1322
How about using make_posix_path
helper here then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the windows tests by using make_path_posix
on the expected outputs in the tests when required: https://github.com/john-jam/filesystem_spec/actions/runs/5807521471/job/15742517288
@ianthomas23 , do you think you will have time to go through this? |
Yes I will, but later in the week. |
e332f05
to
8680aef
Compare
Yes, there's a lot of updates to force
|
This reverts commit 52b0e65.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to merge now as far as I am concerned. I'll give @martindurant the final say.
When this is merged we can re-run the CI of the s3fs
and gcsfs
PRs, and I will review them properly. They look fine at initial glance. Ideally we would want to merge both of those quite quickly after this so that everything is consistent before the next releases.
I've closed my gcsfs
abstract tests PR fsspec/gcsfs#535 as fsspec/gcsfs#572 is definitely a better solution.
@ianthomas23 I changed the posix tests with python glob and bash glob to use relative paths instead of absolute paths. When using a prefix though (e.g. I fixed it in this commit. @martindurant Please let me know if there's any work still to be done in this PR for it to be merged soon? |
@ianthomas23 @martindurant I found another edge case with |
Merging now, and then we can deal with the small leftovers. |
@martindurant @ianthomas23 Thanks for the merge of this PR and the ones in Do you know approx. when will you release a new version of those 3 repos? And also if you plan to include those changes in |
Releases should be out this week, so long as lingering PRs are cleaned up. |
`fsspec` upgrade in fsspec/filesystem_spec#1329 makes 1. when using a trailing slash in globs, it returns only directories. 2. the double asterisks ** matches 0+ directories. This PR fix the tests accordingly.
…tory (#37558) ### Rationale for this change There has been some changes in the way fsspec lists the directories with new version 2023.9.0, see fsspec/filesystem_spec#1329, which caused our tests to start failing. ### What changes are included in this PR? This PR updates the `get_file_info_selector` in [FSSpecHandler](https://arrow.apache.org/docs/_modules/pyarrow/fs.html#FSSpecHandler) class to keep the behaviour of our spec. ### Are there any user-facing changes? No. * Closes: #37555 Authored-by: AlenkaF <[email protected]> Signed-off-by: AlenkaF <[email protected]>
… directory (apache#37558) ### Rationale for this change There has been some changes in the way fsspec lists the directories with new version 2023.9.0, see fsspec/filesystem_spec#1329, which caused our tests to start failing. ### What changes are included in this PR? This PR updates the `get_file_info_selector` in [FSSpecHandler](https://arrow.apache.org/docs/_modules/pyarrow/fs.html#FSSpecHandler) class to keep the behaviour of our spec. ### Are there any user-facing changes? No. * Closes: apache#37555 Authored-by: AlenkaF <[email protected]> Signed-off-by: AlenkaF <[email protected]>
… directory (apache#37558) ### Rationale for this change There has been some changes in the way fsspec lists the directories with new version 2023.9.0, see fsspec/filesystem_spec#1329, which caused our tests to start failing. ### What changes are included in this PR? This PR updates the `get_file_info_selector` in [FSSpecHandler](https://arrow.apache.org/docs/_modules/pyarrow/fs.html#FSSpecHandler) class to keep the behaviour of our spec. ### Are there any user-facing changes? No. * Closes: apache#37555 Authored-by: AlenkaF <[email protected]> Signed-off-by: AlenkaF <[email protected]>
Description
This PR:
glob
method compliant with posix rules and add better support for double asterisks**
in globs.maxdepth
for theglob
method.copy
,get
andput
methods to not create unwanted directories on some scenarios.I added some tests to check the posix rules against the python
glob.glob
implementation and the bashglobstar
support. Both do not support double asterisks with other prefix/suffix than/
so for example**.json
is the same as*.json
and not the same as**/*.json
. But since this repository already supported that well, this feature is kept and patterns likeab**cd
are still working properly (returning paths likeab/ef/cd
).New rules:
find
. It is because of the posix**
rule compliance. For example, it should return an empty string among the other paths when using**
or returnab
among the other paths when usingab/**
.**
matches 0+ directories.maxdepth
option is applied on the first**
when using a glob.References