Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping indices doesn't work for dimensions without coordinates #3208

Open
xlambein opened this issue Aug 12, 2019 · 6 comments
Open

Dropping indices doesn't work for dimensions without coordinates #3208

xlambein opened this issue Aug 12, 2019 · 6 comments

Comments

@xlambein
Copy link

xlambein commented Aug 12, 2019

MCVE Code Sample

Calling the following:

arr = xr.DataArray(
    np.arange(10),
    dims=['x']
)

arr.sel(x=[1, 2])  # This works

arr.drop([1, 2], dim='x')  # Raises an exception

Produces this traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/fibricheck-py36/lib/python3.6/site-packages/xarray/core/dataset.py in drop(self, labels, dim, errors)
   3135             try:
-> 3136                 index = self.indexes[dim]
   3137             except KeyError:

~/anaconda3/envs/fibricheck-py36/lib/python3.6/site-packages/xarray/core/indexes.py in __getitem__(self, key)
     32     def __getitem__(self, key):
---> 33         return self._indexes[key]
     34

KeyError: 'x'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-1-d0f96f591c69> in <module>
      9 arr.sel(x=[1, 2])  # This works
     10
---> 11 arr.drop([1, 2], dim='x')  # Raises an exception

~/anaconda3/envs/fibricheck-py36/lib/python3.6/site-packages/xarray/core/dataarray.py in drop(self, labels, dim, errors)
   1690         if utils.is_scalar(labels):
   1691             labels = [labels]
-> 1692         ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
   1693         return self._from_temp_dataset(ds)
   1694

~/anaconda3/envs/fibricheck-py36/lib/python3.6/site-packages/xarray/core/dataset.py in drop(self, labels, dim, errors)
   3137             except KeyError:
   3138                 raise ValueError(
-> 3139                     'dimension %r does not have coordinate labels' % dim)
   3140             new_index = index.drop(labels, errors=errors)
   3141             return self.loc[{dim: new_index}]

ValueError: dimension 'x' does not have coordinate labels

Expected Output

Calling drop should return the following DataArray:

<xarray.DataArray (x: 8)>
array([0, 3, 4, 5, 6, 7, 8, 9])
Dimensions without coordinates: x

Problem Description

Since I can index arr with dimension x, I would expect drop to work as well. The current behaviour seems inconsistent to me.

Output of xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
libhdf5: 1.10.2
libnetcdf: 4.6.3

xarray: 0.12.3
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.1.0
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: 2.7.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.1.0
distributed: None
matplotlib: 3.0.3
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 40.8.0
pip: 19.2.1
conda: None
pytest: 4.5.0
IPython: 7.4.0
sphinx: None
@max-sixty
Copy link
Collaborator

Thanks for the clear report @xlambein .

In your example, x is a data variable. drop operates either on variables or indexes / 'coordinate labels'. There are no indexes here.

Does that make sense?

@shoyer
Copy link
Member

shoyer commented Aug 12, 2019

There is still something to be said for making sel() and drop() consistent. So given that sel() is OK with integer inputs, perhaps drop() should be, too?

@max-sixty
Copy link
Collaborator

max-sixty commented Aug 12, 2019

@xlambein can I clarify something you said above:

Calling drop should return the following DataArray:

<xarray.DataArray (x: 8)>
array([0, 3, 4, 5, 6, 7, 8, 9])
Dimensions without coordinates: x

Is this a mistype and actually you mean it should return arr (i.e. the random numbers) with two values removed?

@max-sixty
Copy link
Collaborator

There is still something to be said for making sel() and drop() consistent. So given that sel() is OK with integer inputs, perhaps drop() should be, too?

Yes. I worry a bit about drop becoming overloaded - it already means both variables and indexes - but given we've done that on sel, I agree.

If there's any support for splitting drop's two functions apart into two different methods, I'd also been keen on that

@xlambein
Copy link
Author

xlambein commented Aug 13, 2019

@max-sixty Yep sorry! I got my testing code mixed up after making a change. I meant to have np.arange(10) as the values of arr (to highlight which values were removed). I edited the original comment.

@xlambein
Copy link
Author

The way I understand it, dimensions without coordinates act as if they had np.arange(...) as their coordinates. Hence, I don't see it as overloading drop. I feel like there would still be only two behaviours to drop: on variables and on indexes.

That being said, it might be nice indeed to split drop into drop_variable and drop_index (or have any of them just be called drop).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants