Opener refactor #245

rabernat · 2021-11-29T15:01:14Z

Eventually fixes #242.

Goes on top of #238.

rabernat · 2021-12-17T22:34:11Z

As I mentioned at our latest meeting, I'm having some challenges around this refactor because our current "opener" mixes together file opening and caching. Caching is essentially a side effect which obscures the flow of data.

I am starting to wonder if we should continue to refactor things so that data only flows in one direction.

Imagine in we could create Recipes like this, in extreme pseudocode

source = Source(file_pattern, opener, **options)
destination = ZarrDestination(storage_target, **options)

recipe = Recipe(source, destination)

This would create a one way flow of data from source ➡️ destination.

Now to add caching we could do the following

cache = FileCacheDestination(cache_target)

recipe = Recipe(source, cache, destination)

This would create a one way flow of data from source ➡️ cache ➡️ destination.

This implies that any valid "destination" would also be a valid "source" for the next stage. In theory we could keep chaining steps together to build more complicated pipelines. For example.

combine = CombineBlocks(time=10)

recipe = Recipe(source, cache, combine, destination)

Just starting to think through the implications of this model. A key question is: What are the basic interface for one of the stages in a recipe? What methods / attributes are implemented by all of source, cache, combine, destination?

Possible answers:

Each has to have something equivalent to a FilePattern which allows the next stage to effectively iterate through a known number of steps. For the final step, it would just be a single item, the Zarr store itself.
Each stage has to know what type of thing it will produce: a file / url, an Xarray dataset, etc.

Can each stage effectively ignore everything other than the most proximate previous stage?

TomNicholas · 2022-04-25T21:50:33Z

Having spoken to @cisaacstern , I think that this refactor might be needed in order for @RobertPincus to use datatree to open all the groups in his NASA data in one function call.

cisaacstern · 2023-08-24T22:56:34Z

This has been superseded by the beam refactor, so closing.

Cool to see how this work informed the opener transforms there!

rabernat added 3 commits November 26, 2021 17:18

made openers work

8cdc847

storage still working

0d161bd

almost all linting passed

bf7b328

rabernat mentioned this pull request Nov 29, 2021

How to handle data with mixtures of Grib 1 and Grib 2? #244

Open

rabernat added 11 commits November 29, 2021 11:53

Merge remote-tracking branch 'upstream/master' into opener-refactor

f6e6c2b

rename open_kwargs to fsspec_open_kwargs

b73893a

remove dask from test_openers

01b8c05

temporarily remove xarray from test_openers

0712c3e

big refactor of opener tests

8fe12c1

big refactor of opener tests

d42ef3d

fix temporary file bug caught by improved test

799e664

add xarray opener and test

3cb46e3

smoke test for xarray openers

6373fbf

add cache checking to test_xarray_kerchunk_opener

6e2f101

remove storage from opener classes

5bb3104

rabernat mentioned this pull request Dec 14, 2021

Proposed Recipes for ERA5 pangeo-forge/staged-recipes#92

Open

rabernat mentioned this pull request Dec 18, 2021

Should we just adopt xarray-beam as our internal data model? #256

Open

cisaacstern mentioned this pull request Jan 24, 2022

Sanitize creds passed in netloc (per Section 3.1 of RFC 1738)? #263

Open

rabernat mentioned this pull request Mar 4, 2022

limit concurrency for input downloads #45

Closed

cisaacstern mentioned this pull request Mar 7, 2022

copy_pruned doesn't work with is_opendap #321

Open

rabernat mentioned this pull request Mar 8, 2022

Replace FilePattern.is_opendap with generalized FilePattern.file_type #320

Closed

cisaacstern mentioned this pull request Mar 24, 2022

Proposed Recipes for NASA MODIS-COSP data (satellite observations of clouds) pangeo-forge/staged-recipes#125

Open

cisaacstern mentioned this pull request May 2, 2022

add chelsa recipe pangeo-forge/staged-recipes#133

Open

cisaacstern closed this Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opener refactor #245

Opener refactor #245

rabernat commented Nov 29, 2021

rabernat commented Dec 17, 2021 •

edited

Loading

TomNicholas commented Apr 25, 2022

cisaacstern commented Aug 24, 2023

Opener refactor #245

Opener refactor #245

Conversation

rabernat commented Nov 29, 2021

rabernat commented Dec 17, 2021 • edited Loading

TomNicholas commented Apr 25, 2022

cisaacstern commented Aug 24, 2023

rabernat commented Dec 17, 2021 •

edited

Loading