-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Reading and writing a zarr dataset multiple times casts bools to int8 #4826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Apparently my proposed fix broke a bunch of other things, eg. some writing of timedeltas with units and such. Deleting the "dtype" key in the https://github.com/pydata/xarray/blob/v0.16.2/xarray/conventions.py#L119 |
OK here's the other side of the problem. The original dtype (which is i8) is set in the encoding: https://github.com/pydata/xarray/blob/v0.16.2/xarray/conventions.py#L350 |
Tagging a few maintainers: @dcherian @shoyer. Sorry to tag you directly, hope that's ok. I think I've found the issue here and would like to provide a PR to fix, but need some input on what you think would be best. To summarize, the current behavior leading to the bug is:
I can think of a few fixes:
As a local fix while we consider these options, can you confirm that, as the docs state, the |
I ran into this as well with the basic netcdf backends: import xarray as xr
ds = xr.Dataset(
data_vars={"foo":(["x"], [False, True, False])},
coords={"x": [1, 2, 3]},
)
ds.to_netcdf('test.nc')
ds = xr.load_dataset('test.nc')
print(ds.foo.dtype)
ds.to_netcdf('test.nc')
ds = xr.load_dataset('test.nc')
print(ds.foo.dtype) Gives:
|
There are at least two issues here:
|
Closed #5192 in favor of this as I think it's a duplicate. Just NB that it can occur with h5netcdf as well as netcdf4. (Thanks, @andersy005 ) |
Please check #7720 if that fixes the conversion problems. Thanks. |
@kmuehlbauer your fix fixes both the problems set up by @slevang and @amatsukawa on my laptop. I furthermore tested the double roundtrip with |
@JoerivanEngelen Thanks for taking the time. Much appreciated. |
What happened:
Reading and writing zarr dataset multiple times into different paths changes
bool
dtype arrays toint8
. I think this issue is related to #2937.What you expected to happen:
My array's dtype in numpy/dask should not change, even if certain storage backends store dtypes a certain way.
Minimal Complete Verifiable Example:
The above snippet prints the following. In d3, the dtype of
bool_field
isint8
, presumably because d3 inherited d2'sencoding
and it saysint8
, despite the array having abool
dtype.Anything else we need to know?:
Currently workaround is to explicitly set encodings. This fixes the problem:
Environment:
Output of xr.show_versions()
The text was updated successfully, but these errors were encountered: