-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyn_wrapper: avoid creating async version of open method #1769
base: master
Are you sure you want to change the base?
Conversation
The `_open` method is part of an AbstractFileSystem interface, and it is supposed to be synchronous.
excluded_methods = {"open"} | ||
for method_name in dir(self.sync_fs): | ||
if method_name.startswith("_"): | ||
if method_name.startswith("_") or method_name in excluded_methods: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, this can be changed to following,
for method in dir(AsyncFileSystem):
if not private.match(method):
continue
if not asyncio.iscoroutinefunction(getattr(AsyncFileSystem, method, None)):
continue
smethod = method[1:]
f = getattr(self.sync_fs, smethod, None)
if callable(f) and not asyncio.iscoroutinefunction(f):
mth = async_wrapper(f, obj=self)
setattr(self, method, mth)
if not mth.__doc__:
mth.__doc__ = f.__doc__
and maybe moved to fsspec.asyn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is similar code in https://github.com/fsspec/filesystem_spec/blob/master/fsspec/asyn.py#L915 to differentiate methods defined in Async versus AsbtractFilesystem.
To be clear: I am happy with the changes as given, I can merge it. I wonder if there will be other methods that need adding to the list in the future. |
I see that the current implementation async-ifies APIs that are not in If the latter, then this might be a good-enough workaround. If the former, I can push the aforementioned patch. In any case, I am only interested in the |
It's to allow using a Sync filesystem in a context where only async calls are being made. This is only used currently with in zarr (e.g., zarr-developers/zarr-python#2533 ), and its store implementation only calls a subset of the available methods. The implementation, applying automatically to many methods, is I think the simplest, and that was the main motivation; it also naturally follows the existing code in fsspec.asyn to do the inverse, providing end-user sync methods for async filesystems. |
There is a bug in `fsspec==2024.12.0` that causes the `ReferenceFileSystem` to incorrectly make `fs._open` return a coroutine object instead of a file-like object. (See a proposed PR to fix this issue: fsspec/filesystem_spec#1769.) We have a test for the expected behavior (`test_arrow_generator_partitioned` in `tests/unit/lib/test_arrow.py`) running in the CI environment. But that does not fail because the latest version of `fsspec` does not get installed in the CI due to the upper limit set by the `datasets` library. The `datasets` library is only installed as part of the `hf` and `tests` extras, so the default installation of `datachain` will encounter this issue. Fixes #806.
The
_open
method is part of anAbstractFileSystem
interface, and is supposed to be synchronous, even in case ofAsyncFileSystem
.