Skip to content

impossible to save in netcdf because of the time dimension #8858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 tasks
jfleroux opened this issue Mar 20, 2024 · 8 comments
Closed
5 tasks

impossible to save in netcdf because of the time dimension #8858

jfleroux opened this issue Mar 20, 2024 · 8 comments
Labels
duplicate plan to close May be closeable, needs more eyeballs

Comments

@jfleroux
Copy link

What happened?

I have a dataset ds with a variable TEMP(time,level,ni,nj).
The time dimension a of type datetime64 but I get an error when I try to save a temporal selection.

What did you expect to happen?

Time dimension management should be completely transparent to the user when a datetime64 type is used.

Minimal Complete Verifiable Example

ds.time
xarray.DataArray
'time'
    time: 8760
    array(['2015-01-01T00:00:00.000000', '2015-01-01T01:00:00.000000',
           '2015-01-01T02:00:00.000000', ..., '2015-12-31T21:00:00.000000',
           '2015-12-31T22:00:00.000000', '2015-12-31T23:00:00.000000'],
          dtype='datetime64[us]')
    Coordinates:
        time
        (time)
        datetime64[us]
        2015-01-01 ... 2015-12-31T23:00:00
    Indexes: (1)
    Attributes:
    axis : T
    conventions : relative number of seconds with no decimal part
    long_name : time in seconds (UT)
    standard_name : time
    time_origin : 01-JAN-1900 00:00:00
    _FillValue : nan
     units : seconds since 1900-01-01

ds['TEMP'].isel(time=slice(0,1)).to_netcdf(f"./extract.nc")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[81], line 1
----> 1 ds['TEMP'].isel(time=slice(0,1)).to_netcdf(f"./extract__2.nc")

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/core/dataarray.py:4081](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/core/dataarray.py#line=4080), in DataArray.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   4077 else:
   4078     # No problems with the name - so we're fine!
   4079     dataset = self.to_dataset()
-> 4081 return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
   4082     dataset,
   4083     path,
   4084     mode=mode,
   4085     format=format,
   4086     group=group,
   4087     engine=engine,
   4088     encoding=encoding,
   4089     unlimited_dims=unlimited_dims,
   4090     compute=compute,
   4091     multifile=False,
   4092     invalid_netcdf=invalid_netcdf,
   4093 )

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/backends/api.py:1339](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/backends/api.py#line=1338), in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1334 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
   1335 # to avoid this mess of conditionals
   1336 try:
   1337     # TODO: allow this work (setting up the file for writing array data)
   1338     # to be parallelized with dask
-> 1339     dump_to_store(
   1340         dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1341     )
   1342     if autoclose:
   1343         store.close()

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/backends/api.py:1386](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/backends/api.py#line=1385), in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1383 if encoder:
   1384     variables, attrs = encoder(variables, attrs)
-> 1386 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/backends/common.py:393](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/backends/common.py#line=392), in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    390 if writer is None:
    391     writer = ArrayWriter()
--> 393 variables, attributes = self.encode(variables, attributes)
    395 self.set_attributes(attributes)
    396 self.set_dimensions(variables, unlimited_dims=unlimited_dims)

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/backends/common.py:482](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/backends/common.py#line=481), in WritableCFDataStore.encode(self, variables, attributes)
    479 def encode(self, variables, attributes):
    480     # All NetCDF files get CF encoded by default, without this attempting
    481     # to write times, for example, would fail.
--> 482     variables, attributes = cf_encoder(variables, attributes)
    483     variables = {k: self.encode_variable(v) for k, v in variables.items()}
    484     attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/conventions.py:795](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/conventions.py#line=794), in cf_encoder(variables, attributes)
    792 # add encoding for time bounds variables if present.
    793 _update_bounds_encoding(variables)
--> 795 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    797 # Remove attrs from bounds variables (issue #2921)
    798 for var in new_vars.values():

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/conventions.py:196](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/conventions.py#line=195), in encode_cf_variable(var, needs_copy, name)
    183 ensure_not_multiindex(var, name=name)
    185 for coder in [
    186     times.CFDatetimeCoder(),
    187     times.CFTimedeltaCoder(),
   (...)
    194     variables.BooleanCoder(),
    195 ]:
--> 196     var = coder.encode(var, name=name)
    198 # TODO(kmuehlbauer): check if ensure_dtype_not_object can be moved to backends:
    199 var = ensure_dtype_not_object(var, name=name)

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py:979](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py#line=978), in CFDatetimeCoder.encode(self, variable, name)
    977 calendar = encoding.pop("calendar", None)
    978 dtype = encoding.get("dtype", None)
--> 979 (data, units, calendar) = encode_cf_datetime(data, units, calendar, dtype)
    981 safe_setitem(attrs, "units", units, name=name)
    982 safe_setitem(attrs, "calendar", calendar, name=name)

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py:728](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py#line=727), in encode_cf_datetime(dates, units, calendar, dtype)
    726     return _lazily_encode_cf_datetime(dates, units, calendar, dtype)
    727 else:
--> 728     return _eagerly_encode_cf_datetime(dates, units, calendar, dtype)

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py:740](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py#line=739), in _eagerly_encode_cf_datetime(dates, units, calendar, dtype, allow_units_modification)
    731 def _eagerly_encode_cf_datetime(
    732     dates: T_DuckArray,  # type: ignore
    733     units: str | None = None,
   (...)
    736     allow_units_modification: bool = True,
    737 ) -> tuple[T_DuckArray, str, str]:
    738     dates = asarray(dates)
--> 740     data_units = infer_datetime_units(dates)
    742     if units is None:
    743         units = data_units

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py:439](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py#line=438), in infer_datetime_units(dates)
    437 else:
    438     reference_date = dates[0] if len(dates) > 0 else "1970-01-01"
--> 439     reference_date = format_cftime_datetime(reference_date)
    440 unique_timedeltas = np.unique(np.diff(dates))
    441 units = _infer_time_units_from_diff(unique_timedeltas)

File [/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py:450](http://localhost:8877/home/datawork-marc/ENVS/pangeo2024/lib/python3.12/site-packages/xarray/coding/times.py#line=449), in format_cftime_datetime(date)
    445 def format_cftime_datetime(date) -> str:
    446     """Converts a cftime.datetime object to a string with the format:
    447     YYYY-MM-DD HH:MM:SS.UUUUUU
    448     """
    449     return "{:04d}-{:02d}-{:02d} {:02d}:{:02d}:{:02d}.{:06d}".format(
--> 450         date.year,
    451         date.month,
    452         date.day,
    453         date.hour,
    454         date.minute,
    455         date.second,
    456         date.microsecond,
    457     )

AttributeError: 'numpy.datetime64' object has no attribute 'year'

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 3.12.53-60.30-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.2.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.17.1
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.3.1
distributed: 2024.3.1
matplotlib: 3.8.3
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2024.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.2.0
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.22.2
sphinx: None

@jfleroux jfleroux added bug needs triage Issue that has not been reviewed by xarray team member labels Mar 20, 2024
Copy link

welcome bot commented Mar 20, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@spencerkclark
Copy link
Member

Thanks for the report. Could you show how you managed to get np.datetime64[us] values into your DataArray? Xarray currently expects np.datetime64[ns] values, which I think is what is tripping things up. Xarray tries hard to cast any non-nanosecond precision np.datetime64 values to nanosecond precision, so we might consider it a bug at the moment that these slipped through.

@kmuehlbauer
Copy link
Contributor

@jfleroux This might be a regression in reading the data. In general datetimes are always represented as numpy.datetime64[ns] when read with xarray. If that's not possible cf_time objects are used. I'm assuming since your data is in numpy.datetime64[us] for some reason the check in

dates = np.asarray(dates).ravel()
if np.asarray(dates).dtype == "datetime64[ns]":
dates = to_datetime_unboxed(dates)
dates = dates[pd.notnull(dates)]
reference_date = dates[0] if len(dates) > 0 else "1970-01-01"
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
reference_date = nanosecond_precision_timestamp(reference_date)
else:
reference_date = dates[0] if len(dates) > 0 else "1970-01-01"
reference_date = format_cftime_datetime(reference_date)
unique_timedeltas = np.unique(np.diff(dates))

fails and the code is entering the second path (cf_time handling).

Could you change your time data to numpy.datetime64[ns] and check if this fixes the export to netcdf? How did you read that data, BTW?

@jfleroux
Copy link
Author

my dataset comes from netcdf files opened through an intake catalog.
ds = cat[cat.name].to_dask()

The individual netcdf files have a time of type datetime64[ns]

@kmuehlbauer kmuehlbauer removed the needs triage Issue that has not been reviewed by xarray team member label Mar 20, 2024
@keewis
Copy link
Collaborator

keewis commented Mar 20, 2024

this sounds very much like #6318. @jfleroux, can you confirm that kerchunk / zarr is involved? If so, the explicit cast suggested by @kmuehlbauer should work:

ds['TEMP'].assign_coords(time=lambda ds: ds["time"].astype("datetime64[ns]")).isel(time=slice(0,1)).to_netcdf(f"./extract.nc")

@spencerkclark
Copy link
Member

spencerkclark commented Mar 20, 2024

Ah right, I remember that issue now. Thanks @keewis. @jfleroux let us know if that seems relevant / helps you work around this (regardless it seems like an underlying issue, given it comes straight from intake!).

@dcherian dcherian added duplicate plan to close May be closeable, needs more eyeballs and removed bug labels Mar 20, 2024
@jfleroux
Copy link
Author

jfleroux commented Mar 20, 2024

Yes it's ok with the work around :
ds['time'] = ds["time"].astype("datetime64[ns]")

@keewis : yes, I use intake catalog with kerchunk and zarr

@spencerkclark
Copy link
Member

Awesome, thanks for confirming that it's the kerchunk-related issue @jfleroux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate plan to close May be closeable, needs more eyeballs
Projects
None yet
Development

No branches or pull requests

5 participants