Closed
Description
Code Sample
import numpy as np
import xarray as xr
np.random.seed(42)
dims = ('a', 'b', 'c', 'd')
shape = (10, 10, 500, 500)
coords = {d: np.arange(s) for d, s in zip(dims, shape)}
# Using data with non-normal distribution
data = np.random.lognormal(size=shape)
data = data.astype(np.float32)
da = xr.DataArray(data, coords=coords, dims=dims)
# Numpy method gives the correct value
print(da.values.mean())
# Explicitly specifying all axis gives the correct value
print(da.mean(axis=(0, 1, 2, 3)))
# Default DataArray mean method gives incorrect value
print(da.mean()) # <- Problem arise here
# float64 arrays produce the correct value
print(da.astype(np.float64).mean())
This is the output I see:
1.6489075
<xarray.DataArray ()>
array(1.648908, dtype=float32)
<xarray.DataArray ()>
array(1.517693)
<xarray.DataArray ()>
array(1.648907)
Problem description
Wrong mean value calculated by DataArray.mean() method with default arguments. I have only observed the problem for float32 arrays. It appears to be sensitive to the shape of the array, e.g. a shape of (10, 10, 10, 10) seems to be fine.
Expected Output
This is the output I expect for the sample above:
1.6489075
<xarray.DataArray ()>
array(1.648908, dtype=float32)
<xarray.DataArray ()>
array(1.648908, dtype=float32)
<xarray.DataArray ()>
array(1.648907)
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-33-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.8
pandas: 0.23.4
numpy: 1.15.1
scipy: 1.1.0
netCDF4: 1.4.1
h5netcdf: 0.6.2
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.19.0
distributed: 1.23.0
matplotlib: 2.2.3
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 40.2.0
pip: 18.0
conda: 4.5.11
pytest: 3.7.4
IPython: 6.5.0
sphinx: None
Metadata
Metadata
Assignees
Labels
No labels