Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for temperature differences in CF attributes #1836

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
4 changes: 2 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ Contributors to this version: David Huard (:user:`huard`).

Internal changes
^^^^^^^^^^^^^^^^
* Changed french translation of "wet days" from "jours mouillés" to "jours pluvieux". (:issue:`1825`, :pull:`1826`).

* Add attribute ``units_metadata`` to outputs representing a difference between temperatures. This is needed to disambiguate temperature differences from absolute temperature, and affects indicators using functions ``cumulative_difference``, ``temperature_sum``, ``interday_diurnal_temperature_range`` and `` extreme_temperature_range``. Added ``pint2cfattrs`` function to convert pint units to a dictionary of CF attributes. ``units2pint`` is also modified to support ``units_metadata`` attributes in DataArrays. Some SDBA properties and measures previously returning units of ``delta_degC`` will now return the original input DataArray units accompanied with the ``units_metadata`` attribute. (:issue:`1822`, :pull:`1830`).
* Changed french translation of "wet days" from "jours mouillés" to "jours pluvieux". (:issue:`1825`, :pull:`1836`).
Copy link
Contributor

@juliettelavoie juliettelavoie Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PVI, "pluvieux" avait été remplacé par "avec précipitation" à beaucoup d'endroits dans #1123.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, bon à savoir. Je vais faire un autre PR pour harmoniser.


v0.51.0 (2024-07-04)
--------------------
Expand Down
42 changes: 42 additions & 0 deletions docs/notebooks/units.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,48 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Temperature differences vs absolute temperature\n",
"\n",
"Temperature anomalies and biases as well as degree-days indicators are all *differences* between temperatures. If we assign those differences units of degrees Celsius, then converting to Kelvins or Fahrenheits will yield nonsensical values. ``pint`` solves this using *delta* units such as ``delta_degC`` and ``delta_degF``. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If we have a DataArray storing a temperature anomaly of 2°C, converting to Kelvin will yield a nonsensical value 0f 275.15.\n",
"# Fortunately, pint has delta units to represent temperature differences.\n",
"display(units.convert_units_to(xr.DataArray([2], attrs={\"units\": \"delta_degC\"}), \"K\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The issue for ``xclim`` is that there are no equivalent delta units in the CF convention. To resolve this ambiguity, the [CF convention](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#temperature-units) recommends including a ``units_metadata`` attribute set to ``\"temperature: difference\"``, and this is supported in ``xclim`` as of version 0.52. The function ``units2pint`` interprets the ``units_metadata`` attribute and returns a ``pint`` delta unit as needed. To convert a ``pint`` delta unit to CF attributes, use the function ``pint2cfattrs``, which returns a dictionary with the ``units`` and ``units_metadata`` attributes (``pint2cfunits`` cannot support the convention because it only returns the unit string). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"delta = xr.DataArray(\n",
" [2], attrs={\"units\": \"K\", \"units_metadata\": \"temperature: difference\"}\n",
")\n",
"units.convert_units_to(delta, \"delta_degC\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"## Other utilities\n",
"\n",
"Many helper functions are defined in `xclim.core.units`, see [Unit handling module](../api.rst#units-handling-submodule).\n"
Expand Down
6 changes: 6 additions & 0 deletions tests/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,12 @@ def test_forbidden(self, tas_series):
with pytest.raises(NotImplementedError):
generic.cumulative_difference(tas, threshold="10 degC", op="!=")

def test_delta_units(self, tas_series):
tas = tas_series(np.array([-10, 15, 20, 3, 10]) + K2C)

out = generic.cumulative_difference(tas, threshold="10 degC", op=">=")
assert "units_metadata" in out.attrs


class TestFirstDayThreshold:
@pytest.mark.parametrize(
Expand Down
6 changes: 4 additions & 2 deletions tests/test_sdba/test_properties.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,8 @@ def test_annual_cycle(self, open_dataset):

assert amp.long_name.startswith("Absolute amplitude of the annual cycle")
assert phase.long_name.startswith("Phase of the annual cycle")
assert amp.units == "delta_degC"
assert amp.units == "K"
assert amp.units_metadata == "temperature: difference"
assert relamp.units == "%"
assert phase.units == ""

Expand Down Expand Up @@ -330,7 +331,8 @@ def test_annual_range(self, open_dataset):

assert amp.long_name.startswith("Average annual absolute amplitude")
assert phase.long_name.startswith("Average annual phase")
assert amp.units == "delta_degC"
assert amp.units == "K"
assert amp.units_metadata == "temperature: difference"
assert relamp.units == "%"
assert phase.units == ""

Expand Down
27 changes: 27 additions & 0 deletions tests/test_units.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
declare_units,
infer_context,
lwethickness2amount,
pint2cfattrs,
pint2cfunits,
pint_multiply,
rate2amount,
Expand Down Expand Up @@ -117,6 +118,14 @@ def test_cf_conversion_amount2lwethickness_amount2rate(self):
assert out.attrs["units"] == "kg d-1 m-2" # CF equivalent unit
assert out.attrs["standard_name"] == "rainfall_flux"

def test_temperature_difference(self):
delta = xr.DataArray(
[2], attrs={"units": "K", "units_metadata": "temperature: difference"}
)
out = convert_units_to(source=delta, target="delta_degC")
assert out == 2
assert out.attrs["units"] == "degC"


class TestUnitConversion:
def test_pint2cfunits(self):
Expand Down Expand Up @@ -347,3 +356,21 @@ def test_to_agg_units(in_u, opfunc, op, exp, exp_u):
out = to_agg_units(getattr(da, opfunc)(), da, op)
np.testing.assert_allclose(out, exp)
assert out.attrs["units"] == exp_u


def test_pint2cfattrs():
attrs = pint2cfattrs(units.degK, is_difference=True)
assert attrs == {"units": "K", "units_metadata": "temperature: difference"}


def test_temp_difference_rountrip():
"""Test roundtrip of temperature difference units."""
attrs = {"units": "degC", "units_metadata": "temperature: difference"}
da = xr.DataArray([1], attrs=attrs)
pu = units2pint(da)
# Confirm that we get delta pint units
assert pu == units.delta_degC

# and that converting those back to cf attrs gives the same result
attrs = pint2cfattrs(pu)
assert attrs == {"units": "degC", "units_metadata": "temperature: difference"}
64 changes: 57 additions & 7 deletions xclim/core/units.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,11 +134,18 @@ def units2pint(value: xr.DataArray | str | units.Quantity) -> pint.Unit:
-------
pint.Unit
Units of the data array.

Notes
-----
To avoid ambiguity related to differences in temperature vs absolute temperatures, set the `units_metadata`
attribute to `"temperature: difference"` or `"temperature: on_scale"` on the DataArray.
"""
metadata = None
if isinstance(value, str):
unit = value
elif isinstance(value, xr.DataArray):
unit = value.attrs["units"]
metadata = value.attrs.get("units_metadata", None)
elif isinstance(value, units.Quantity):
# This is a pint.PlainUnit, which is not the same as a pint.Unit
return cast(pint.Unit, value.units)
Expand All @@ -164,7 +171,10 @@ def units2pint(value: xr.DataArray | str | units.Quantity) -> pint.Unit:
"Remove white space from temperature units, e.g. use `degC`."
)

return units.parse_units(unit)
pu = units.parse_units(unit)
if metadata == "temperature: difference":
return (1 * pu - 1 * pu).units
return pu


def pint2cfunits(value: units.Quantity | units.Unit) -> str:
Expand All @@ -185,11 +195,46 @@ def pint2cfunits(value: units.Quantity | units.Unit) -> str:

# Issue originally introduced in https://github.com/hgrecco/pint/issues/1486
# Should be resolved in pint v0.24. See: https://github.com/hgrecco/pint/issues/1913
# Relies on cf_xarray's unit string formatting logic.
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=DeprecationWarning)
return f"{value:cf}".replace("dimensionless", "")


def pint2cfattrs(value: units.Quantity | units.Unit, is_difference=None) -> dict:
"""Return CF-compliant units attributes from a `pint` unit.

Parameters
----------
value : pint.Unit
Input unit.
is_difference : bool
Whether the value represent a difference in temperature, which is ambiguous in the case of absolute
temperature scales like Kelvin or Rankine. It will automatically be set to True if units are "delta_*"
units.

Returns
-------
dict
Units following CF-Convention, using symbols.
"""
s = pint2cfunits(value)
if "delta_" in s:
is_difference = True
s = s.replace("delta_", "")

attrs = {"units": s}
if "[temperature]" in value.dimensionality:
aulemahal marked this conversation as resolved.
Show resolved Hide resolved
if is_difference:
attrs["units_metadata"] = "temperature: difference"
elif is_difference is False:
attrs["units_metadata"] = "temperature: on_scale"
else:
attrs["units_metadata"] = "temperature: unknown"

return attrs


def ensure_cf_units(ustr: str) -> str:
"""Ensure the passed unit string is CF-compliant.

Expand Down Expand Up @@ -325,7 +370,7 @@ def convert_units_to( # noqa: C901

if isinstance(source, xr.DataArray):
source_unit = units2pint(source)
target_cf_unit = pint2cfunits(target_unit)
target_cf_attrs = pint2cfattrs(target_unit)

# Automatic pre-conversions based on the dimensionalities and CF standard names
standard_name = source.attrs.get("standard_name")
Expand Down Expand Up @@ -360,12 +405,12 @@ def convert_units_to( # noqa: C901
out: xr.DataArray
if source_unit == target_unit:
# The units are the same, but the symbol may not be.
out = source.assign_attrs(units=target_cf_unit)
out = source.assign_attrs(**target_cf_attrs)
return out

with units.context(context or "none"):
out = source.copy(data=units.convert(source.data, source_unit, target_unit))
out = out.assign_attrs(units=target_cf_unit)
out = out.assign_attrs(**target_cf_attrs)
return out

# TODO remove backwards compatibility of int/float thresholds after v1.0 release
Expand Down Expand Up @@ -560,20 +605,23 @@ def to_agg_units(
out.attrs.update(
units="", is_dayofyear=np.int32(1), calendar=get_calendar(orig)
)
out.attrs.pop("units_metadata", None)

elif op in ["count", "integral"]:
m, freq_u_raw = infer_sampling_units(orig[dim])
orig_u = str2pint(ensure_absolute_temperature(orig.units))
freq_u = str2pint(freq_u_raw)
out = out * m
with xr.set_options(keep_attrs=True):
out = out * m

if op == "count":
out.attrs["units"] = freq_u_raw
elif op == "integral":
if "[time]" in orig_u.dimensionality:
# We need to simplify units after multiplication
out_units = (orig_u * freq_u).to_reduced_units()
out = out * out_units.magnitude
with xr.set_options(keep_attrs=True):
out = out * out_units.magnitude
out.attrs["units"] = pint2cfunits(out_units)
else:
out.attrs["units"] = pint2cfunits(orig_u * freq_u)
Expand Down Expand Up @@ -1328,7 +1376,7 @@ def wrapper(*args, **kwargs):
return dec


def ensure_delta(unit: str) -> str:
def ensure_delta(unit: xr.DataArray | str | units.Quantity) -> str:
"""Return delta units for temperature.

For dimensions where delta exist in pint (Temperature), it replaces the temperature unit by delta_degC or
Expand All @@ -1344,6 +1392,8 @@ def ensure_delta(unit: str) -> str:
#
delta_unit = pint2cfunits(d - d)
# replace kelvin/rankine by delta_degC/F
# Note (DH): This will fail if dimension is [temperature]^-1 or [temperature]^2 (e.g. variance)
# Recent versions of pint have a `to_preferred` method that could be used here (untested).
if "kelvin" in u._units:
delta_unit = pint2cfunits(u / units2pint("K") * units2pint("delta_degC"))
if "degree_Rankine" in u._units:
Expand Down
12 changes: 7 additions & 5 deletions xclim/indices/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -701,6 +701,7 @@ def temperature_sum(

out = (data - threshold).where(cond).resample(time=freq).sum()
out = direction * out
out.attrs["units_metadata"] = "temperature: difference"
return to_agg_units(out, data, "integral")
Comment on lines +920 to 921
Copy link
Collaborator

@aulemahal aulemahal Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For degC input, this outputs:

    units_metadata:  temperature: difference
    units:           K d

because #1804 decided that we should avoid delta temperatures and put absolute ones instead.

Should we revert this ? In any case, I think to_agg_units could be the one to add units_metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean revert going through absolute temperature ? I think this is not necessary with this PR.

Not sure about to_agg_units. Is your logic "if we aggregate temperatures, then it's obviously a difference" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, going to absolute temperatures and this PR both try to solve the same problem. I guess they are compatible, but I feel both aren't needed at the same time. We could revert to make it cleaner.

My point about to_agg_units is that that function is there to manage units after a transformation. Instead of going per-indicator, you could simply infer the "units_metadata" needed for the given transformation. Like I did in #1804.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to sabotage the work you had done and it seemed prudent to keep the changes as small as possible. So no strong opinion on this one.

My thinking was that to_agg_units has no way to know whether it is aggregating a temperature or a temperature difference if we give it a DataArray with units of K.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it now does/could with your additions, no ? But in any case, isn't the logic the same ? All operations that transform an on_scale to a difference will also transform a difference to a difference. I don't think we have operations that would actually go from a difference to an on_scale ?



Expand All @@ -718,7 +719,6 @@ def interday_diurnal_temperature_range(
freq : str
Resampling frequency defining the periods as defined in :ref:`timeseries.resampling`.


Returns
-------
xr.DataArray
Expand All @@ -728,8 +728,8 @@ def interday_diurnal_temperature_range(
vdtr = abs((high_data - low_data).diff(dim="time"))
out = vdtr.resample(time=freq).mean(dim="time")

u = str2pint(low_data.units)
out.attrs["units"] = pint2cfunits(u - u)
out.attrs["units"] = low_data.attrs["units"]
out.attrs["units_metadata"] = "temperature: difference"
return out


Expand All @@ -755,8 +755,8 @@ def extreme_temperature_range(

out = high_data.resample(time=freq).max() - low_data.resample(time=freq).min()

u = str2pint(low_data.units)
out.attrs["units"] = pint2cfunits(u - u)
out.attrs["units"] = low_data.attrs["units"]
out.attrs["units_metadata"] = "temperature: difference"
return out


Expand Down Expand Up @@ -892,6 +892,8 @@ def cumulative_difference(
if freq is not None:
diff = diff.resample(time=freq).sum(dim="time")

diff.attrs["units_metadata"] = "temperature: difference"

return to_agg_units(diff, data, op="integral")


Expand Down
9 changes: 5 additions & 4 deletions xclim/sdba/measures.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import xarray as xr

from xclim.core.indicator import Indicator, base_registry
from xclim.core.units import convert_units_to, ensure_delta, units2pint
from xclim.core.units import convert_units_to, units2pint
from xclim.core.utils import InputKind

from .base import Grouper
Expand Down Expand Up @@ -173,7 +173,8 @@ def _bias(sim: xr.DataArray, ref: xr.DataArray) -> xr.DataArray:
Absolute bias
"""
out = sim - ref
out.attrs["units"] = ensure_delta(ref.attrs["units"])
out.attrs["units"] = ref.attrs["units"]
out.attrs["units_metadata"] = "temperature: difference"
return out


Expand Down Expand Up @@ -296,7 +297,7 @@ def _rmse_internal(_sim: xr.DataArray, _ref: xr.DataArray) -> xr.DataArray:
input_core_dims=[["time"], ["time"]],
dask="parallelized",
)
out = out.assign_attrs(units=ensure_delta(ref.units))
out = out.assign_attrs(units=ref.units, units_metadata={"temperature: difference"})
huard marked this conversation as resolved.
Show resolved Hide resolved
return out


Expand Down Expand Up @@ -343,7 +344,7 @@ def _mae_internal(_sim: xr.DataArray, _ref: xr.DataArray) -> xr.DataArray:
input_core_dims=[["time"], ["time"]],
dask="parallelized",
)
out = out.assign_attrs(units=ensure_delta(ref.units))
out = out.assign_attrs(units=ref.units, units_metadata="temperature: difference")
return out


Expand Down
6 changes: 4 additions & 2 deletions xclim/sdba/properties.py
Original file line number Diff line number Diff line change
Expand Up @@ -583,7 +583,8 @@ def _annual_cycle(
# TODO: In April 2024, use a match-case.
if stat == "absamp":
out = ac.max("dayofyear") - ac.min("dayofyear")
out.attrs["units"] = xc.core.units.ensure_delta(units)
out.attrs["units"] = units
out.attrs["units_metadata"] = "temperature: difference"
huard marked this conversation as resolved.
Show resolved Hide resolved
elif stat == "relamp":
out = (ac.max("dayofyear") - ac.min("dayofyear")) * 100 / ac.mean("dayofyear")
out.attrs["units"] = "%"
Expand Down Expand Up @@ -699,7 +700,8 @@ def _annual_statistic(

if stat == "absamp":
out = yrs.max() - yrs.min()
out.attrs["units"] = xc.core.units.ensure_delta(units)
out.attrs["units"] = units
out.attrs["units_metadata"] = "temperature: difference"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

un ptit dernier!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merci ! Ça me fait penser que stat == "relamp" devrait probablement appeler ensure_absolute_units, sinon ça pourrait donner des choses bizarres (genre division par 0 degC).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je ne trouve pas ensure_absolute_units, juste ensure_absolute_temperature. Cette fonction assume un delta. ça ne fonctionne donc pas vrm ici pour empêcher une division par 0. Je pourrais ajouter un test spécifiquement pour modifier les unités du input si c'est une température, mais comme c'est une fonction généraliste, ça n'empêche pas quelqu'un de l'utiliser avec une autre variable ayant un 0. J'ai l'impression que c'est la responsabilité de l'utilisateur de mettre les bonnes unités avec relamp...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

En théorie il n'y a pas d'autres unités ayant le problème "difference" vs "on_scale". En tout cas, rien dans pint, rien dans CF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mais le problème de division par zero dans "relamp" ne dépend pas de "difference" vs "on_scale".
Et admettons qu'il n'y ai pas de 0, je ne vois pas le problème de faire (deg C-degC)/degC = %. pas besoin de spécifier "difference" ou "on_scale" dans le processus, on est sur que c'est toujours la même unité, non ?

Copy link
Collaborator

@aulemahal aulemahal Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

En effet, les unités de relamp ne dépendent pas de rien d'autre. Et je suis d'accord que la division par zéro est un autre problème.

Même que de changer les unités de degC a K pour calculer un % n'est peut-être pas quelque chose qu'on devrait faire silencieusement ? Je propose de pelleter le problème chez l'usager. Et donc de ne rien faire.

elif stat == "relamp":
out = (yrs.max() - yrs.min()) * 100 / yrs.mean()
out.attrs["units"] = "%"
Expand Down
Loading