Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cftime.datetime serialization example failing in latest doc build #2127

Closed
spencerkclark opened this issue May 13, 2018 · 9 comments
Closed

Comments

@spencerkclark
Copy link
Member

Code Sample, a copy-pastable example if possible

In [1]: from itertools import product

In [2]: import numpy as np

In [3]: import xarray as xr

In [4]: from cftime import DatetimeNoLeap

In [5]: dates = [DatetimeNoLeap(year, month, 1) for year, month in product(range
   ...: (1, 3), range(1, 13))]

In [6]: with xr.set_options(enable_cftimeindex=True):
   ...:     da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')
   ...:

In [7]: da.to_netcdf('test.nc')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-306dbf0ba669> in <module>()
----> 1 da.to_netcdf('test.nc')

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataarray.pyc in to_netcdf(self, *args, **kwargs)
   1514             dataset = self.to_dataset()
   1515
-> 1516         return dataset.to_netcdf(*args, **kwargs)
   1517
   1518     def to_dict(self):

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims)
   1143         return to_netcdf(self, path, mode, format=format, group=group,
   1144                          engine=engine, encoding=encoding,
-> 1145                          unlimited_dims=unlimited_dims)
   1146
   1147     def to_zarr(self, store=None, mode='w-', synchronizer=None, group=None,

/Users/spencerclark/xarray-dev/xarray/xarray/backends/api.pyc in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims)
    681     try:
    682         dataset.dump_to_store(store, sync=sync, encoding=encoding,
--> 683                               unlimited_dims=unlimited_dims)
    684         if path_or_file is None:
    685             return target.getvalue()

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims)
   1073
   1074         store.store(variables, attrs, check_encoding,
-> 1075                     unlimited_dims=unlimited_dims)
   1076         if sync:
   1077             store.sync()

/Users/spencerclark/xarray-dev/xarray/xarray/backends/common.pyc in store(self, variables, attributes, check_encoding_set, unlimited_dims)
    356         """
    357
--> 358         variables, attributes = self.encode(variables, attributes)
    359
    360         self.set_attributes(attributes)

/Users/spencerclark/xarray-dev/xarray/xarray/backends/common.pyc in encode(self, variables, attributes)
    441         # All NetCDF files get CF encoded by default, without this attempting
    442         # to write times, for example, would fail.
--> 443         variables, attributes = cf_encoder(variables, attributes)
    444         variables = OrderedDict([(k, self.encode_variable(v))
    445                                  for k, v in variables.items()])

/Users/spencerclark/xarray-dev/xarray/xarray/conventions.pyc in cf_encoder(variables, attributes)
    575     """
    576     new_vars = OrderedDict((k, encode_cf_variable(v, name=k))
--> 577                            for k, v in iteritems(variables))
    578     return new_vars, attributes

python2/cyordereddict/_cyordereddict.pyx in cyordereddict._cyordereddict.OrderedDict.__init__ (python2/cyordereddict/_cyordereddict.c:1225)()

//anaconda/envs/xarray-dev/lib/python2.7/_abcoll.pyc in update(*args, **kwds)
    569                     self[key] = other[key]
    570             else:
--> 571                 for key, value in other:
    572                     self[key] = value
    573         for key, value in kwds.items():

/Users/spencerclark/xarray-dev/xarray/xarray/conventions.pyc in <genexpr>((k, v))
    575     """
    576     new_vars = OrderedDict((k, encode_cf_variable(v, name=k))
--> 577                            for k, v in iteritems(variables))
    578     return new_vars, attributes

/Users/spencerclark/xarray-dev/xarray/xarray/conventions.pyc in encode_cf_variable(var, needs_copy, name)
    232                   variables.CFMaskCoder(),
    233                   variables.UnsignedIntegerCoder()]:
--> 234         var = coder.encode(var, name=name)
    235
    236     # TODO(shoyer): convert all of these to use coders, too:

/Users/spencerclark/xarray-dev/xarray/xarray/coding/times.pyc in encode(self, variable, name)
    384                 data,
    385                 encoding.pop('units', None),
--> 386                 encoding.pop('calendar', None))
    387             safe_setitem(attrs, 'units', units, name=name)
    388             safe_setitem(attrs, 'calendar', calendar, name=name)

/Users/spencerclark/xarray-dev/xarray/xarray/coding/times.pyc in encode_cf_datetime(dates, units, calendar)
    338
    339     if units is None:
--> 340         units = infer_datetime_units(dates)
    341     else:
    342         units = _cleanup_netcdf_time_units(units)

/Users/spencerclark/xarray-dev/xarray/xarray/coding/times.pyc in infer_datetime_units(dates)
    254         reference_date = dates[0] if len(dates) > 0 else '1970-01-01'
    255         reference_date = format_cftime_datetime(reference_date)
--> 256     unique_timedeltas = np.unique(np.diff(dates)).astype('timedelta64[ns]')
    257     units = _infer_time_units_from_diff(unique_timedeltas)
    258     return '%s since %s' % (units, reference_date)

TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [ns] according to the rule 'same_kind'

Problem description

This seems to be an edge case that was not covered in the tests I added in #1252. Strangely if I cast the result of np.unique(np.diff(dates)) as an array before converting to 'timedelta64[ns]' objects things work:

In [9]: np.unique(np.diff(dates)).astype('timedelta64[ns]')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-5d53452b676f> in <module>()
----> 1 np.unique(np.diff(dates)).astype('timedelta64[ns]')

TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [ns] according to the rule 'same_kind'

In [10]: np.array(np.unique(np.diff(dates))).astype('timedelta64[ns]')
Out[10]: array([2419200000000000, 2592000000000000, 2678400000000000], dtype='timedelta64[ns]')

Might anyone have any ideas as to what the underlying issue is? The fix could be as simple as that, but I don't understand why that makes a difference.

Expected Output

da.to_netcdf('test.nc') should succeed without an error.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.14.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

xarray: 0.8.2+dev641.g7302d7e
pandas: 0.22.0
numpy: 1.13.1
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: 2.2.0
bottleneck: None
cyordereddict: 1.0.0
dask: 0.17.1
distributed: 1.21.3
matplotlib: 2.2.2
cartopy: None
seaborn: 0.8.1
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: 3.3.2
IPython: 5.5.0
sphinx: 1.7.1

@shoyer
Copy link
Member

shoyer commented May 13, 2018

I haven't looked into this in detail, but my guess is that this is somehow related to NumPy refusing to convert a timedelta that includes years or months into nanoseconds, because years and months can have different numbers of days.

To help debug this, it would be helpful to know exactly what the input dates is.

@spencerkclark
Copy link
Member Author

With the specified dates in line 5 of my example, one can reproduce the error (see line 9 in the problem description). Line 10 shows that casting the result of np.unique(np.diff(dates)) as an array seems to make this type conversion work.

@spencerkclark
Copy link
Member Author

It's confusing to me, because I don't see where NumPy is getting years or months metadata from the datetime.timedelta objects formed by np.diff(dates):

In [12]: np.diff(dates)
Out[12]:
array([datetime.timedelta(31), datetime.timedelta(28),
       datetime.timedelta(31), datetime.timedelta(30),
       datetime.timedelta(31), datetime.timedelta(30),
       datetime.timedelta(31), datetime.timedelta(31),
       datetime.timedelta(30), datetime.timedelta(31),
       datetime.timedelta(30), datetime.timedelta(31),
       datetime.timedelta(31), datetime.timedelta(28),
       datetime.timedelta(31), datetime.timedelta(30),
       datetime.timedelta(31), datetime.timedelta(30),
       datetime.timedelta(31), datetime.timedelta(31),
       datetime.timedelta(30), datetime.timedelta(31),
       datetime.timedelta(30)], dtype=object)

Unlike np.timedelta64 objects, datetime.timedelta objects cannot be composed of units which have a varying length depending on the year (the coarsest internal resolution is days). The problem seems to occur only after calling np.unique; maybe the solution is to do the type conversion before calling np.unique?

In [19]: np.unique(np.diff(dates).astype('timedelta64[ns]'))
Out[19]: array([2419200000000000, 2592000000000000, 2678400000000000], dtype='timedelta64[ns]')

@shoyer
Copy link
Member

shoyer commented May 14, 2018

This must be a NumPy bug. Here's an even simpler reproduction:

import numpy as np
import datetime

np.array([datetime.timedelta(28)], dtype='timedelta64[D]')
# TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [D] according to the rule 'same_kind'

Any other day I've tried works. Also, the following code works (at least on my laptop):

In [113]: np.array([datetime.timedelta(27)], dtype='timedelta64[D]'), np.array([datetime.timedelta(28)], dtype='timedelta64[D]')
Out[113]: (array([27], dtype='timedelta64[D]'), array([28], dtype='timedelta64[D]'))

@shoyer
Copy link
Member

shoyer commented May 14, 2018

Any multiple of 7 days (one week) seems to trigger it, e.g.,

In [133]: np.array([datetime.timedelta(7)], dtype='timedelta64[D]')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-133-92f07e97d8dc> in <module>()
----> 1 np.array([datetime.timedelta(7)], dtype='timedelta64[D]')

TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [D] according to the rule 'same_kind'

For now, maybe try casting each element individually to np.timedelta64 as a work-around? e.g.,

In [139]: np.array([np.timedelta64(d, 'ns') for d in uniques.tolist()])
Out[139]:
array([2419200000000000, 2592000000000000, 2678400000000000],
      dtype='timedelta64[ns]')

spencerkclark added a commit to spencerkclark/xarray that referenced this issue May 14, 2018
@spencerkclark
Copy link
Member Author

Any multiple of 7 days (one week) seems to trigger it

Interesting, thanks for investigating things further and confirming that it likely is a NumPy bug. I put up a fix following your suggestion in #2128 and also included a test.

@spencerkclark
Copy link
Member Author

Huh...my test is still triggering some failures due to this issue in #2128. Oddly on my laptop the original bug doesn't appear to exist if I use python version 3.6.5 and numpy version 1.14.3 (the versions on Travis where I'm getting a failure):

$ python
Python 3.6.5 | packaged by conda-forge | (default, Apr  6 2018, 13:44:09)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__version__
'1.14.3'
>>> import datetime
>>> np.array([datetime.timedelta(7)], dtype='timedelta64[D]')
array([7], dtype='timedelta64[D]')

and not surprisingly the test passes:

$ pytest -vv test_coding_times.py -k test_infer_cftime_datetime_units
========================================================= test session starts ==========================================================
platform darwin -- Python 3.6.5, pytest-3.5.1, py-1.5.3, pluggy-0.6.0 -- //anaconda/envs/xarray-docs/bin/python
cachedir: ../../.pytest_cache
rootdir: /Users/spencerclark/xarray-dev/xarray, inifile: setup.cfg
collected 269 items / 268 deselected

test_coding_times.py::test_infer_cftime_datetime_units PASSED                                                                    [100%]

So it appears to be platform dependent. Trying this out on a linux machine with these versions I can reproduce the issue (which seems to persist even for individual timedeltas, explaining the test failure):

$ python
Python 3.6.5 | packaged by conda-forge | (default, Apr  6 2018, 13:39:56)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__version__
'1.14.3'
>>> import datetime
>>> np.array([datetime.timedelta(7)], dtype='timedelta64[D]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [D] according to the rule 'same_kind'
>>> np.timedelta64(datetime.timedelta(7), 'D')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [D] according to the rule 'same_kind'
>>> np.timedelta64(datetime.timedelta(1), 'D')
numpy.timedelta64(1,'D')

It's not ideal, but should we try to go with pandas to do the type conversion? It seems to work on the linux platform:

>>> import pandas as pd
>>> pd.to_timedelta([datetime.timedelta(7)])
TimedeltaIndex(['7 days'], dtype='timedelta64[ns]', freq=None)
>>> pd.to_timedelta([datetime.timedelta(7)]).values
array([604800000000000], dtype='timedelta64[ns]')

@shoyer
Copy link
Member

shoyer commented May 14, 2018

Yes, let's use pd.to_timedelta. Please mention numpy/numpy#11096 in a comment in the code -- hopefully this should be fixed in the next NumPy release (1.15?).

@spencerkclark
Copy link
Member Author

Awesome, thanks for tracking that down in NumPy so quickly! I updated #2128 accordingly.

shoyer pushed a commit that referenced this issue May 14, 2018
…ts (#2128)

* Fix #2127

* Fix typo in time-series.rst

* Use pd.to_timedelta to convert to np.timedelta64 objects

* Install cftime through netcdf4 through pip

* box=False
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants