Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add via_cf_namespace option #19

Closed
wants to merge 0 commits into from
Closed

Conversation

cisaacstern
Copy link
Contributor

cf_xarray "is a lightweight accessor that allows you to interpret CF attributes present on xarray objects."

In building automated STAC Item generation for Pangeo Forge, we'd prefer to access dataset attributes via cf_xarray's cf namespace, rather than need to know the actual attribute names. (See discussion here.)

This PR adds that option. If you'd prefer not to have the added dependency by default, we could make it an optional dependency, and then wrap the module-level import in a try/except.

@cisaacstern
Copy link
Contributor Author

Was typing fast this morning and now realize that my comment above was perhaps a bit presumptuous that you'd want this as part of main. So I'll rewind and ask what I should've to begin with: Is this feature something you're open to?

@TomAugspurger
Copy link
Collaborator

Thanks.

How about we

  • make cf_xarray a required dependency
  • make name optional in each of these build_* functoins
  • use the cf standard name by default (name is not provided)
  • use the name value if it's provided, to handle non-cf-compliant sources

Actually... we already use time_dimension=None to mean "no time dimension". So maybe we amend that slightly to mean "use the CF standard name if it's present, otherwise no time dimension". Make sense?

@cisaacstern
Copy link
Contributor Author

Cool, I like dropping the additional via_cf_namespace kwarg in favor of defaulting to cf.

use the CF standard name if it's present

Right so for time we can do

 def build_temporal_dimension(ds, name, extent, values, step):
-    time = ds.coords[name]
+    time = ds.cf[name] if name in ds.cf.keys() else ds.coords[name]

... otherwise no time dimension

👍 If time dimension name is None, we never make it to build_temporal_dimension.

And actually, I think we need to the same thing for X and Y, because they both use the same builder, and if name is None there, we don't know if we're trying to build X or Y

https://github.com/TomAugspurger/xstac/blob/2d794ea51472ea84edb21ee9a4dc877ddaae4277/xstac/_xstac.py#L106-L107

So would

```diff
def build_horizontal_dimension(ds, name, axis, extent, values, step, reference_system):
-    da = ds[name]
+    da = ds.cf[name] if name in ds.cf.keys() else ds[name]

be okay?

@cisaacstern
Copy link
Contributor Author

cisaacstern commented Nov 12, 2021

Via https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#coordinate-types:

The attribute axis may be attached to a coordinate variable and given one of the values X, Y, Z or T which stand for a longitude, latitude, vertical, or time axis respectively. Alternatively the standard_name attribute may be used for direct identification.

What I'm trying w/ xstac + Pangeo Forge is to identify coordinates via the axis attribute, not the standard name. This allows us to (hopefully) uniformly access X/Y axes which are not lon/lat; a circumstance which I admittedly don't fully understand, but which @rabernat identified as a concern in pangeo-forge/pangeo-forge-orchestrator#1 (comment):

However, the X axis will not always be longitude. It may be a logical index or a projected coordinate.

As such, the accessors I'm looking to use are ds.cf["X"] and ds.cf["Y"], which I think perhaps exist sufficiently infrequently even on CF-compliant data (they're optional in CF) as to make them sub-optimal defaults. (I'm working on adding them at runtime with, among other approaches, cf-xarray's guess_coord_axes.)

TL;DR is I think the approach described in #19 (comment) is a good compromise which nudges users in the direction of CF but doesn't default to names which may not exist on otherwise CF-compliant datasets.

@TomAugspurger
Copy link
Collaborator

I need to try out cf-xarray, but my ideal API is to rely on it for inferring the right variable name for each dimension coordinate.

So right now the README has this example:

$ xstac examples/terraclimate/item-template.json \
    zarr-https examples/terraclimate/item.json \
    --x-dimension=lon --y-dimension=lat --reference-system=4326

I hope that can become

$ xstac examples/terraclimate/item-template.json \
    zarr-https examples/terraclimate/item.json

i.e. we've inferred x-dimension, y-dimension, time-dimension, and maybe the reference system from CF.

In terms of implementation, I think that means

  1. removing the via_cf_namespace argument (it's unnecessary since CF is the default and passing x_dimension=<name> will override the default).
  2. Updating build_datacube to start with something like
if time_dimension is None:
    time_dimension = ds.cf["time"].name  # TODO: handle the case of it not being there gracefully

likewise for x_dimension, y_dimension, (and maybe reference system?) That should be it. In other words, just use cf-xarray to infer the right defaults from the data / metadata, rather than requiring the user to pass them in.

uniformly access X/Y axes which are not lon/lat; a circumstance which I admittedly don't fully understand,

In the case of a dataset like Daymet, the lat / lon variables are 2-D arrays. The latitude / longitude depends on where you're at in the projected x / y space. All the data variables have dimensions (y, x).

@cisaacstern
Copy link
Contributor Author

🤦 Oops I just reset this branch to be equal with main in anticipation of re-working it to align with the goals outlined in #19 (comment), and this resulted in it automatically merging. I guess I need to open a new PR; will follow up with that shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants