-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Multi-index repr #879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-index repr #879
Conversation
unique = index.levels[level_num] | ||
labels = index.labels[level_num] | ||
size = min(max_size, labels.size) | ||
filled = pd.core.algorithms.take_1d(unique.values, labels[:size], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to figure out how to implement this only using public API (nothing prefaced with an underscore). Otherwise, pandas will almost certainly break us in a future release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest simply using index.levels[level_num][:max_size]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases it might make sense to just use index.levels[level_num][:max_size]
to show the (first) unique values for each level.
But in other cases I find this
>>> data.isel(x=range(3))
<xarray.DataArray (x: 3)>
array([0, 1, 2])
Coordinates:
* x (x) object MultiIndex
- a_long_level_name object 'a' 'a' 'a'
- level_1 int64 0 1 2
much better than this:
>>> data.isel(x=range(3))
<xarray.DataArray (x: 3)>
array([0, 1, 2])
Coordinates:
* x (x) object MultiIndex
- a_long_level_name object 'a' 'b'
- level_1 int64 0 1 2 3 4 5 6 7 8 9
What about just returning pd.core.algorithms.take_1d(unique.values, labels[:max_size])
or even np.take(unique.values, labels[:max_size])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good point. It's definitely more useful to display the particular index levels that are actually used at those points.
In that case I would use unique[labels[:max_size]]
. It's better to avoid the NumPy methods like np.take
on pandas.Index objects because they don't always always preserve dtypes properly.
I ended up refactoring most of the implementation. It looks cleaner I think, although I'm still not fully satisfied with this version. It isn't as straightforward to implement as I thought, especially for checking (and getting) MultiIndex and for dealing with |
|
||
|
||
def _maybe_summarize_multiindex(var, col_width, max_width): | ||
if isinstance(var.variable._data, PandasIndexAdapter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isinstance(var, Coordinate)
would be a safer check.
Closing this. See #947. |
Another item of #719.
An example:
To be consistent with the displayed coordinates and/or data variables, it displays the actual used level values. Using the
pandas.MultiIndex.get_level_values
method would be expensive for big indexes, so I re-implemented it in xarray so that we can truncate the computation to the first x values, which is very cheap.It still needs testing.
Maybe it would be nice to align the level values.