Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening via xarray backendentrypoint #35

Open
TomNicholas opened this issue Mar 15, 2024 · 4 comments
Open

Opening via xarray backendentrypoint #35

TomNicholas opened this issue Mar 15, 2024 · 4 comments
Labels
enhancement New feature or request xarray Requires changes to xarray upstream

Comments

@TomNicholas
Copy link
Member

Some changes are needed in xarray to support using the xarray backend entrypoint system to open datasets from disk as ManifestArray-backed Variables just by passing a keyword arg to open_dataset/open_mfdataset. This requires dodging some internal array wrapping that occurs in the depths of xarray's backend machinery.

Originally posted by @TomNicholas in #14 (comment)

@TomNicholas
Copy link
Member Author

This would be cool, but it's actually not needed for an MVP, because we have open_dataset_via_kerchunk.

@TomNicholas
Copy link
Member Author

Trying to get this to work leads down rabbit holes like the one in pydata/xarray#8712

@TomNicholas TomNicholas added the xarray Requires changes to xarray upstream label Mar 15, 2024
@TomNicholas TomNicholas mentioned this issue Mar 15, 2024
15 tasks
@TomNicholas TomNicholas added the enhancement New feature or request label Mar 26, 2024
@TomNicholas
Copy link
Member Author

TomNicholas commented Mar 26, 2024

Note that to get the syntax open_dataset(file, engine='virtualizarr', indexes={}) to work as a way of avoiding creating indexes does not actually require that indexes be added to xr.open_dataset upstream (xref pydata/xarray#6633). That's because if the kwarg is not recognized by xarray it should be passed on to our backend engine, which already does support an indexes kwarg.

@TomNicholas
Copy link
Member Author

I'm no longer so sure that we actually need this feature?

It's a bit of a pain to implement because it likely requires finicky upstream changes to xarray, and the benefit in the end is purely syntactic sugar, not actually new capability.

It's also weird design-wise for a few reasons:

  1. The set of kwargs to xr.open_dataset and virtualizarr.open_virtual_dataset are related, but there are some arguments that make sense on the former but not the latter (e.g. chunks, cache), and some that make sense on the latter but not the former (e.g. loadable_variables).
  2. Mostly in xarray the engine kwarg refers to the type of file you're trying to open. But here it would refer to the wrapped array type that gets returned, and would be valid for a range of filetypes.
  3. If we wanted to allow 3rd parties to create their own kerchunk-like readers for virtualizarr without them all having to live in this codebase, we could imagine having an entrypoint system for virtualizarr. But combining that would mean xarray's backendentrypoint called a virtualizarr engine, which then calls more entrypoints...

I'm curious what others think, especially @ayushnag, @betolink, and @sharkinsspatial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request xarray Requires changes to xarray upstream
Projects
None yet
Development

No branches or pull requests

1 participant