Skip to content

A way to generate automatically-numbered coords #2067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ExpHP opened this issue Apr 19, 2018 · 4 comments
Closed

A way to generate automatically-numbered coords #2067

ExpHP opened this issue Apr 19, 2018 · 4 comments
Labels

Comments

@ExpHP
Copy link

ExpHP commented Apr 19, 2018

It is great that xarray supports dimensions without coords, but sometimes I think it would be useful to be able to easily opt into autogenerated coords from 0 to n-1. This can be useful to obtain DataArrays for pointwise indexing:

import xarray as xr
ds = xr.Dataset()
# a list of selected indices for each layer
ds['selected'] = (['layer', 'selected-i'], [
    [0, 1, 2],
    [1, 5, 3],
])

# normally, concatenation would drop layer data
print(xr.concat(ds['selected'], dim='selected-i'))
# <xarray.DataArray 'selected' (selected-i: 6)>
# array([0, 1, 2, 1, 5, 3])
# Dimensions without coordinates: selected-i

# if you generate coords from 0 to n-1 for layer, however, the resulting DataArray
# contains 'layer' indices for use in pointwise indexing
print(xr.concat(ds
                .assign_coords(layer=list(range(ds.sizes['layer'])))
                ['selected'], dim='selected-i'))
# <xarray.DataArray 'selected' (selected-i: 6)>
# array([0, 1, 2, 1, 5, 3])
# Coordinates:
#     layer    (selected-i) int64 0 0 0 1 1 1
# Dimensions without coordinates: selected-i

My issue with the above is that layer=list(range(ds.sizes['layer'])) is verbose and fails to be DRY. My thought for such an API is that xarray could maybe have a special constant for auto-assignment, usable in any method that takes input coords:

print(xr.concat(ds
                .assign_coords(layer=xr.AUTO)
                ['selected'], dim='selected-i'))

(Additionally, perhaps xr.AUTO could be a function/class, so that xr.AUTO(start) produces indices starting at start.)

@shoyer
Copy link
Member

shoyer commented Apr 20, 2018

If I understand correctly, instead of writing xr.Dataset({'foo': ('x', data)}, {'x': range(len(data))}), you want to be able to write something like xr.Dataset({'foo': ('x', data)}, {'x': xr.AUTO_NUMBER})?

I agree that it's sometimes convenient to automatically get coordinates 0 to n-1, but using range() is also pretty convenient, so I'm not sure we really need this shortcut.

@ExpHP
Copy link
Author

ExpHP commented Apr 20, 2018

My issue is not so much with the range but the len. If done during construction from numpy arrays, it might need to be range(data.shape[0]) or range(data.shape[1]) or etc. based on the layout of the data. If done after construction with assign_coords, both the name of the dataset and the name of the dimension need to be written twice. This all creates room for silly mistakes.

On the bright side, the mistakes are quickly caught when the code executes at runtime. On the other side, they can be entirely avoided.


Of course, xarray doesn't have to do this; a user concerned about making these mistakes can easily capture this usage pattern in a wrapper function around assign_coords:

def assign_auto_coords(da, *dims):
    """Give each specified dimension of a DataArray or Dataset automatic coords from 0 to n-1."""
    return da.assign_coords(**{dim: range(self.sizes[dim]) for dim in dims})

it's merely that first-class support during construction would be nice to have.

@fujiisoup
Copy link
Member

fujiisoup commented Apr 20, 2018

I think this will share a similar API with the evenly spaced coordinate discussed in issue #1650.

I thought the possible API would be

xr.Dataset({'foo': ('x', data)}, {'x': slice(None)})

where it will construct a evenly spaced coordinate.
But I am not sure if we should add this API now or after #1650 is implemented.

@stale
Copy link

stale bot commented Mar 20, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Mar 20, 2020
@stale stale bot closed this as completed Apr 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants