Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read grid mapping and bounds as coords #2844

Merged
merged 40 commits into from
Feb 17, 2021
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
62152d0
Read and save `grid_mapping` and `bounds` as coordinates.
DWesl Mar 21, 2019
2ae8a7e
Add tests for (de)serialization of `grid_mapping` and `bounds`.
DWesl Mar 21, 2019
fff73c8
BUG: Use only encoding for tracking bounds and grid_mapping.
DWesl Mar 31, 2019
b3696d3
Address feedback on PR.
DWesl May 31, 2019
315d39d
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl May 31, 2019
c82cd47
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Feb 14, 2020
02aff73
Style fixes: newline before binary operator.
DWesl Feb 14, 2020
0721506
Style fixes: double quotes for string literals, rewrap lines.
DWesl Feb 14, 2020
239761e
Address comments from review.
DWesl Jul 9, 2020
e0b8e99
Fix style issues and complete name changes.
DWesl Jul 9, 2020
9ba7485
Add more attributes from the CF conventions.
DWesl Aug 2, 2020
bf97fe1
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Aug 2, 2020
4274730
Remove a trailing comma in a one-element dict literal.
DWesl Aug 2, 2020
ca0f805
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Aug 7, 2020
7027767
Stop moving ancillary_variables to coords
DWesl Aug 9, 2020
8d96a66
Expand the list of attributes in the documentation.
DWesl Aug 16, 2020
1a5b35d
Make sure to run the pip associated with the running python.
DWesl Aug 16, 2020
9f53fbb
Warn about new locations for some variables.
DWesl Aug 16, 2020
1b8218d
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Aug 17, 2020
546b43e
Move ancillary variables back to data_vars in test.
DWesl Aug 23, 2020
8ec4af3
Update warnings to provide a more useful stack level.
DWesl Aug 23, 2020
bc0b1d1
Split the CF attribute test into multiple smaller tests.
DWesl Aug 23, 2020
5c085e1
Add a test of a roundtrip after dropping bounds.
DWesl Aug 23, 2020
a5a67d1
Merge work from github back into local branch.
DWesl Aug 23, 2020
a864b83
Run black on changes.
DWesl Aug 23, 2020
c8d1bdc
Check whether round-trip to iris breaks things.
DWesl Aug 23, 2020
478be8a
Remove trailing comma.
DWesl Aug 23, 2020
036695c
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Jan 5, 2021
b0e7a85
Style fixes from black.
DWesl Jan 5, 2021
1a9b201
Include suggestions from review.
DWesl Jan 16, 2021
6f3d55e
Update xarray/tests/test_backends.py
DWesl Jan 17, 2021
5268500
Update xarray/conventions.py
DWesl Jan 17, 2021
2edd367
Mention that there are other attributes not listed
DWesl Jan 17, 2021
948465c
Fix .rst syntax in whats-new
DWesl Jan 17, 2021
c68d372
Shorten name of another test.
DWesl Jan 17, 2021
9ee7c3a
Update docs.
dcherian Jan 17, 2021
b65e579
Merge remote-tracking branch 'upstream/master' into read_grid_mapping…
dcherian Feb 11, 2021
94b8153
fix merge.
dcherian Feb 11, 2021
c8896f3
Activate new behaviour only with `decode_coords="all"`
dcherian Feb 11, 2021
d3ec7ab
[skip-ci] fix docstrings
dcherian Feb 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
subprocess.run(["conda", "list"])
else:
print("pip environment:")
subprocess.run(["pip", "list"])
subprocess.run([sys.executable, "-m", "pip", "list"])

print(f"xarray: {xarray.__version__}, {xarray.__file__}")

Expand Down
22 changes: 22 additions & 0 deletions doc/weather-climate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,28 @@ Weather and climate data

``xarray`` can leverage metadata that follows the `Climate and Forecast (CF) conventions`_ if present. Examples include automatic labelling of plots with descriptive names and units if proper metadata is present (see :ref:`plotting`) and support for non-standard calendars used in climate science through the ``cftime`` module (see :ref:`CFTimeIndex`). There are also a number of geosciences-focused projects that build on xarray (see :ref:`related-projects`).

Several CF variable attributes contain lists of other variables
associated with the variable with the attribute. A few of these are
now parsed by XArray, with the attribute value popped to encoding on
read and the variables in that value interpreted as non-dimension
coordinates:

- `coordinates`
- `bounds`
- `grid_mapping`
- `climatology`
- `geometry`
- `node_coordinates`
- `node_count`
- `part_node_count`
- `interior_ring`
- `cell_measures`
- `formula_terms`

The CF attribute `ancillary_variables` was not included in the list
due to the variables listed there being associated primarily with the
variable with the attribute, rather than with the dimensions.
Copy link
Contributor Author

@DWesl DWesl Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
variable with the attribute, rather than with the dimensions.
variable with the attribute, rather than with the dimensions
associated with that variable.

Copy link
Member

@andersy005 andersy005 Feb 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DWesl, do you still want to commit this suggestion? Wasn't sure whether you missed it or not...


.. _Climate and Forecast (CF) conventions: http://cfconventions.org

.. _metpy_accessor:
Expand Down
5 changes: 4 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ New Features
~~~~~~~~~~~~
- Performance improvement when constructing DataArrays. Significantly speeds up repr for Datasets with large number of variables.
By `Deepak Cherian <https://github.com/dcherian>`_
- Decode more CF attributes on file read (put the referenced variables
into 'coords' instead of 'data_vars') and encode on write (write
attributes corresponding to encoding values). The list of variables
is in :ref:`weather-climate` (:pull:`2844`, :issue:`3689`)

Bug fixes
~~~~~~~~~
Expand Down Expand Up @@ -276,7 +280,6 @@ New Features
- Expose ``use_cftime`` option in :py:func:`~xarray.open_zarr` (:issue:`2886`, :pull:`3229`)
By `Samnan Rahee <https://github.com/Geektrovert>`_ and `Anderson Banihirwe <https://github.com/andersy005>`_.


Bug fixes
~~~~~~~~~

Expand Down
67 changes: 66 additions & 1 deletion xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,23 @@
from .core.pycompat import is_duck_dask_array
from .core.variable import IndexVariable, Variable, as_variable

CF_RELATED_DATA = (
"bounds",
"grid_mapping",
"climatology",
"geometry",
"node_coordinates",
"node_count",
"part_node_count",
"interior_ring",
"cell_measures",
"formula_terms",
)
CF_RELATED_DATA_NEEDS_PARSING = (
"cell_measures",
"formula_terms",
)


class NativeEndiannessArray(indexing.ExplicitlyIndexedNDArrayMixin):
"""Decode arrays on the fly from non-native to native endianness
Expand Down Expand Up @@ -256,6 +273,9 @@ def encode_cf_variable(var, needs_copy=True, name=None):
var = maybe_default_fill_value(var)
var = maybe_encode_bools(var)
var = ensure_dtype_not_object(var, name=name)

for attr_name in CF_RELATED_DATA:
pop_to(var.encoding, var.attrs, attr_name)
return var


Expand Down Expand Up @@ -508,6 +528,39 @@ def stackable(dim):
new_vars[k].encoding["coordinates"] = coord_str
del var_attrs["coordinates"]
coord_names.update(var_coord_names)
for attr_name in CF_RELATED_DATA:
if attr_name in var_attrs:
attr_val = var_attrs[attr_name]
var_names = attr_val.split()
if attr_name in CF_RELATED_DATA_NEEDS_PARSING:
var_names = [
name
for name in var_names
if not name.endswith(":") and not name == k
]
if all(k in variables for k in var_names):
new_vars[k].encoding[attr_name] = attr_val
coord_names.update(var_names)
# Warn that some things will be coords rather
# than data_vars.
warnings.warn(
"Variable(s) {0!s} moved from data_vars to coords\n"
"based on {1:s} attribute".format(var_names, attr_name),
stacklevel=5,
)
else:
warnings.warn(
"Variable(s) referenced in {0:s} not in variables: {1!s}".format(
attr_name,
[
proj_name
for proj_name in var_names
if proj_name not in variables
],
),
stacklevel=5,
)
del var_attrs[attr_name]

if decode_coords and "coordinates" in attributes:
attributes = dict(attributes)
Expand Down Expand Up @@ -664,6 +717,7 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):

global_coordinates = non_dim_coord_names.copy()
variable_coordinates = defaultdict(set)
not_technically_coordinates = set()
for coord_name in non_dim_coord_names:
target_dims = variables[coord_name].dims
for k, v in variables.items():
Expand All @@ -674,6 +728,13 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
):
variable_coordinates[k].add(coord_name)

if any(
attr_name in v.encoding and coord_name in v.encoding.get(attr_name)
for attr_name in CF_RELATED_DATA
):
not_technically_coordinates.add(coord_name)
global_coordinates.discard(coord_name)

variables = {k: v.copy(deep=False) for k, v in variables.items()}

# keep track of variable names written to file under the "coordinates" attributes
Expand All @@ -691,7 +752,11 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
# we get support for attrs["coordinates"] for free.
coords_str = pop_to(encoding, attrs, "coordinates")
if not coords_str and variable_coordinates[name]:
attrs["coordinates"] = " ".join(map(str, variable_coordinates[name]))
attrs["coordinates"] = " ".join(
str(coord_name)
for coord_name in variable_coordinates[name]
if coord_name not in not_technically_coordinates
)
if "coordinates" in attrs:
written_coords.update(attrs["coordinates"].split())

Expand Down
104 changes: 104 additions & 0 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
requires_cftime,
requires_dask,
requires_h5netcdf,
requires_iris,
requires_netCDF4,
requires_pseudonetcdf,
requires_pydap,
Expand Down Expand Up @@ -857,6 +858,109 @@ def test_roundtrip_mask_and_scale(self, decoded_fn, encoded_fn):
assert decoded.variables[k].dtype == actual.variables[k].dtype
assert_allclose(decoded, actual, decode_bytes=False)

@staticmethod
def _create_cf_dataset():
original = Dataset(
dict(
variable=(
("ln_p", "latitude", "longitude"),
np.arange(8, dtype="f4").reshape(2, 2, 2),
{"ancillary_variables": "std_devs det_lim"},
),
std_devs=(
("ln_p", "latitude", "longitude"),
np.arange(0.1, 0.9, 0.1).reshape(2, 2, 2),
{"standard_name": "standard_error"},
),
det_lim=(
(),
0.1,
{"standard_name": "detection_minimum"},
),
),
dict(
latitude=("latitude", [0, 1], {"units": "degrees_north"}),
longitude=("longitude", [0, 1], {"units": "degrees_east"}),
latlon=((), -1, {"grid_mapping_name": "latitude_longitude"}),
latitude_bnds=(("latitude", "bnds2"), [[0, 1], [1, 2]]),
longitude_bnds=(("longitude", "bnds2"), [[0, 1], [1, 2]]),
areas=(
("latitude", "longitude"),
[[1, 1], [1, 1]],
{"units": "degree^2"},
),
ln_p=(
"ln_p",
[1.0, 0.5],
{
"standard_name": "atmosphere_ln_pressure_coordinate",
"computed_standard_name": "air_pressure",
},
),
P0=((), 1013.25, {"units": "hPa"}),
),
)
original["variable"].encoding.update(
{"cell_measures": "area: areas", "grid_mapping": "latlon"},
)
original.coords["latitude"].encoding.update(
dict(grid_mapping="latlon", bounds="latitude_bnds")
)
original.coords["longitude"].encoding.update(
dict(grid_mapping="latlon", bounds="longitude_bnds")
)
original.coords["ln_p"].encoding.update({"formula_terms": "p0: P0 lev: ln_p"})
return original

def test_grid_mapping_and_bounds_are_not_coordinates_in_file(self):
original = self._create_cf_dataset()
with create_tmp_file() as tmp_file:
original.to_netcdf(tmp_file)
with open_dataset(tmp_file, decode_coords=False) as ds:
assert ds.coords["latitude"].attrs["bounds"] == "latitude_bnds"
assert ds.coords["longitude"].attrs["bounds"] == "longitude_bnds"
assert "latlon" not in ds["variable"].attrs["coordinates"]
assert "coordinates" not in ds.attrs

def test_grid_mapping_and_bounds_are_coordinates_after_dataset_roundtrip(self):
original = self._create_cf_dataset()
with pytest.warns(
UserWarning, match=" moved from data_vars to coords\nbased on "
):
with self.roundtrip(original) as actual:
assert_identical(actual, original)

def test_grid_mapping_and_bounds_are_coordinates_after_dataarray_roundtrip(self):
original = self._create_cf_dataset()
# The DataArray roundtrip should have the same warnings as the
# Dataset, but we already tested for those, so just go for the
# new warnings. It would appear that there is no way to tell
# pytest "This warning and also this warning should both be
# present".
# xarray/tests/test_conventions.py::TestCFEncodedDataStore
# needs the to_dataset. The other backends should be fine
# without it.
with pytest.warns(
UserWarning,
match=(
r"Variable\(s\) referenced in bounds not in variables: "
r"\['l(at|ong)itude_bnds'\]"
),
):
with self.roundtrip(original["variable"].to_dataset()) as actual:
assert_identical(actual, original["variable"].to_dataset())

@requires_iris
def test_grid_mapping_and_bounds_are_coordinates_after_iris_roundtrip(self):
original = self._create_cf_dataset()
iris_cube = original["variable"].to_iris()
actual = DataArray.from_iris(iris_cube)
# Bounds will be missing (xfail)
del original.coords["latitude_bnds"], original.coords["longitude_bnds"]
# Ancillary vars will be missing
# Those are data_vars, and will be dropped when grabbing the variable
assert_identical(actual, original["variable"])

def test_coordinates_encoding(self):
def equals_latlon(obj):
return obj == "lat lon" or obj == "lon lat"
Expand Down