Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyn_wrapper: avoid creating async version of open method #1769

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

skshetry
Copy link
Contributor

@skshetry skshetry commented Jan 9, 2025

The _open method is part of an AbstractFileSystem interface, and is supposed to be synchronous, even in case of AsyncFileSystem.

The `_open` method is part of an AbstractFileSystem interface, and it is
supposed to be synchronous.
Comment on lines +60 to 63
excluded_methods = {"open"}
for method_name in dir(self.sync_fs):
if method_name.startswith("_"):
if method_name.startswith("_") or method_name in excluded_methods:
continue
Copy link
Contributor Author

@skshetry skshetry Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this can be changed to following,

for method in dir(AsyncFileSystem):
  if not private.match(method):
      continue
  if not asyncio.iscoroutinefunction(getattr(AsyncFileSystem, method, None)):
      continue
  smethod = method[1:]
  f = getattr(self.sync_fs, smethod, None)
  if callable(f) and not asyncio.iscoroutinefunction(f):
      mth = async_wrapper(f, obj=self)
      setattr(self, method, mth)
      if not mth.__doc__:
          mth.__doc__ = f.__doc__

and maybe moved to fsspec.asyn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is similar code in https://github.com/fsspec/filesystem_spec/blob/master/fsspec/asyn.py#L915 to differentiate methods defined in Async versus AsbtractFilesystem.

@martindurant
Copy link
Member

To be clear: I am happy with the changes as given, I can merge it. I wonder if there will be other methods that need adding to the list in the future.

@skshetry
Copy link
Contributor Author

To be clear: I am happy with the changes as given, I can merge it. I wonder if there will be other methods that need adding to the list in the future.

I see that the current implementation async-ifies APIs that are not in AsyncFileSystem (eg: lexists, read_bytes, etc). What's the use-case of AsyncFileSystemWrapper - is it to imitate AsyncFileSystem or is it to provide async methods for all sync methods of a filesystem (which is the current implementation)?

If the latter, then this might be a good-enough workaround. If the former, I can push the aforementioned patch.

In any case, I am only interested in the _open API at the moment. So, I'll leave that up to you to decide. :)

@martindurant
Copy link
Member

martindurant commented Jan 13, 2025

What's the use-case of AsyncFileSystemWrapper

It's to allow using a Sync filesystem in a context where only async calls are being made. This is only used currently with in zarr (e.g., zarr-developers/zarr-python#2533 ), and its store implementation only calls a subset of the available methods. The implementation, applying automatically to many methods, is I think the simplest, and that was the main motivation; it also naturally follows the existing code in fsspec.asyn to do the inverse, providing end-user sync methods for async filesystems.

skshetry added a commit to iterative/datachain that referenced this pull request Jan 16, 2025
There is a bug in `fsspec==2024.12.0` that causes the `ReferenceFileSystem` to
incorrectly make `fs._open` return a coroutine object instead of a file-like object.
(See a proposed PR to fix this issue: fsspec/filesystem_spec#1769.)

We have a test for the expected behavior (`test_arrow_generator_partitioned` in `tests/unit/lib/test_arrow.py`)
running in the CI environment.
But that does not fail because the latest version of `fsspec` does not get installed in the CI
due to the upper limit set by the `datasets` library.

The `datasets` library is only installed as part of the `hf` and `tests` extras,
so the default installation of `datachain` will encounter this issue.

Fixes #806.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants