Skip to content

Writing np.bool8 data array reads in as int8 #5192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shaunc opened this issue Apr 19, 2021 · 2 comments
Closed

Writing np.bool8 data array reads in as int8 #5192

shaunc opened this issue Apr 19, 2021 · 2 comments

Comments

@shaunc
Copy link

shaunc commented Apr 19, 2021

What happened:

I have an dataarray with dtype np.bool_. When I write it using netcdf (engine h5netcdf, or default) and then read in a copy, the copy has dtype int8.

What you expected to happen:

The loaded data array should have dtype bool

Minimal Complete Verifiable Example:

I have had a hard time reducing this to a sample. The data array comes from a larger dataset which exhibits the same problem. I can copy the dataarray using copy() and it still exhibits the problem; however if I build a new data array using the constructor, the new array doesn't exhibit the problem. As far as I can tell, though, the original and the rebuilt dataarray are otherwise identical.

# in a pdb session
(Pdb) ci
<xarray.DataArray 'cut_inclusive' (cut: 15)>
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False])
Dimensions without coordinates: cut
(Pdb) ci.to_netcdf('foo_ci.nca', engine="h5netcdf")
(Pdb) csi = xr.read_dataarray('foo_ci_nca', engine="h5netcdf"); csi
<xarray.DataArray 'cut_inclusive' (cut: 15)>
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
Dimensions without coordinates: cut
(Pdb) ci2 = xr.DataArray(ci, dims=('cut', ))
(Pdb) ci2.equals(ci)
True
(Pdb) ci2.to_netcdf('foo_ci2.nca', engine="h5netcdf")
(Pdb) csi2 = xr.open_dataarray('foo_ci2.nca', engine="h5netcdf"); csi2
<xarray.DataArray 'cut_inclusive' (cut: 15)>
array([False, False, False, False, False, False, False, False, False, False,
       False, False, False, False, False])
Dimensions without coordinates: cut
(Pdb) ci3 = ci.copy()
(Pdb) ci3.to_netcdf('foo_ci3.nca', engine="h5netcdf")
(Pdb) csi3 = xr.open_dataarray('foo_ci3.nca', engine="h5netcdf"); csi3
<xarray.DataArray 'cut_inclusive' (cut: 15)>
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
Dimensions without coordinates: cut

Anything else we need to know?:

I am at a loss how to investigate why ci and ci3 don't survive round-trip, but ci2 does. Unfortunately, I also have been unable to produce a free-standing example -- whenever I try I get an object that survives round trip intact. I suspect that xarray internals is somewhere/somehow keeping a cache to the original ci (presumably still linked to the overall dataset from which ci came), and this is what is causing the problem, but I don't know where to look. (Suggestions welcome!)

Environment:

Output of xr.show_versions()

xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-48-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.12.0
libnetcdf: None

xarray: 0.17.0
pandas: 1.2.4
numpy: 1.20.2
scipy: 1.6.2
netCDF4: None
pydap: None
h5netcdf: 0.10.0
h5py: 3.2.1
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.04.0
distributed: 2021.04.0
matplotlib: 3.4.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 51.0.0
pip: 20.3.1
conda: None
pytest: 6.2.3
IPython: 7.22.0
sphinx: None

@andersy005
Copy link
Member

This is related to #4826

@shaunc
Copy link
Author

shaunc commented Apr 20, 2021

Aha -- I think that is exactly it. I'll close in favor of that (just noting though that h5netcdf also has the problem).

@shaunc shaunc closed this as completed Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants