You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our discussion today with ECMWF about ERA5 (see pangeo-forge/staged-recipes#92) surfaced the need for Pangeo Forge to be able to extract data from APIs.
Current Situation in pangeo-forge-recipes
We assume that the recipe "inputs" are either
Files that can be opened with fsspec
OpenDAP endpoints
These inputs are generated by a FilePattern object. The file pattern must contain a function which translates "input keys" into a string. This string is then passed either to fsspec or directly to xarray.
The FilePattern interfaces directly with the recipe as follows:
fp=FilePattern(...)
rec=Recipe(fp, ...)
The fact that the FilePattern returns a string is a limitation.
Examples of data sources that don't fit with this model
Data from the ECMWF CDS API -> The API ultimately gives you a netCDF file based on various query parameters
MITgcm data opened with xmitgcm, e.g ECCO data portal -> returns an Xarray Dataset directly
One Solution: An additional "Opener" layer
Rather than passing the FilePattern directly to the Recipe and hoping that the Recipe knows what to do with it, we could imagine the following sort of pattern
The FilePattern could then return whatever it wants, provided that the opener knows what to do with it. For example, FilePattern could return a dict of parameters to pass to an API, or any other Python object that is useful to the opener.
We could imagine providing some default openers
XarrayFsspecOpener
XarrayOpendapOpener
PandasOpener
etc.
... plus also allowing users to define their own custom openers to handle more specialized situations.
We could also consider making some of these mixins or using inheritance, such that we could do something like
This would be a significant refactor. In the end, code would end up leaving XarrayZarrRecipe and moving to a new class (XarrayFsspecOpener). Overall I think this would make for better separation of concerns and more reuseability of code.
I am concerned that this would create yet another layer of complexity for the users. This could be mitigated by creating some convenience functions or adaptors that would reduce the amount of boilerplate that would need to be written.
The text was updated successfully, but these errors were encountered:
@martindurant mentioned that the Intake community is considering separating the backend reader component (perhaps there is a better descriptor for this) from that project into its own package. If so, some of this may be able to rely on that.
For the CDS/MARS case specifically: @alxmrs has experience with formatting efficient CDS requests and may be able to offer relevant insights (and possibly code, tbd).
I just had a very nice chat with @cisaacstern and he pointed me to this issue. From a brief reading I think this would fit our usecase of processing CMIP6 data to e.g. remove control run drift.
If I understand this broadly I could see the Opener contain logic to load/filter datasets from a catalog (using cmip6_preprocessing) and return an xarray dataset to the actual recipe?
We set some time aside next week to hack about with this, and will report back.
Our discussion today with ECMWF about ERA5 (see pangeo-forge/staged-recipes#92) surfaced the need for Pangeo Forge to be able to extract data from APIs.
Current Situation in pangeo-forge-recipes
We assume that the recipe "inputs" are either
These inputs are generated by a FilePattern object. The file pattern must contain a function which translates "input keys" into a string. This string is then passed either to fsspec or directly to xarray.
The FilePattern interfaces directly with the recipe as follows:
The fact that the FilePattern returns a string is a limitation.
Examples of data sources that don't fit with this model
One Solution: An additional "Opener" layer
Rather than passing the FilePattern directly to the Recipe and hoping that the Recipe knows what to do with it, we could imagine the following sort of pattern
Basically the "opener" would be guaranteed to return certain things that the recipe could use, for example:
The
Inputs
interface would be pretty light, something likeThe FilePattern could then return whatever it wants, provided that the opener knows what to do with it. For example, FilePattern could return a dict of parameters to pass to an API, or any other Python object that is useful to the opener.
We could imagine providing some default openers
... plus also allowing users to define their own custom openers to handle more specialized situations.
We could also consider making some of these mixins or using inheritance, such that we could do something like
etc.
Pros and Cons
This would be a significant refactor. In the end, code would end up leaving XarrayZarrRecipe and moving to a new class (XarrayFsspecOpener). Overall I think this would make for better separation of concerns and more reuseability of code.
I am concerned that this would create yet another layer of complexity for the users. This could be mitigated by creating some convenience functions or adaptors that would reduce the amount of boilerplate that would need to be written.
The text was updated successfully, but these errors were encountered: