Multi-index repr #879

benbovy · 2016-06-11T10:58:13Z

Another item of #719.

An example:

>>> index = pd.MultiIndex.from_product((list('ab'), range(10)))
>>> index.names= ('a_long_level_name', 'level_1')
>>> data = xr.DataArray(range(20), [('x', index)])
>>> data
<xarray.DataArray (x: 20)>
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])
Coordinates:
  * x                    (x) object MultiIndex
    - a_long_level_name  object 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'b' ...
    - level_1            int64 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

To be consistent with the displayed coordinates and/or data variables, it displays the actual used level values. Using the pandas.MultiIndex.get_level_values method would be expensive for big indexes, so I re-implemented it in xarray so that we can truncate the computation to the first x values, which is very cheap.

It still needs testing.

Maybe it would be nice to align the level values.

shoyer · 2016-06-14T03:46:28Z

xarray/core/formatting.py

+    unique = index.levels[level_num]
+    labels = index.labels[level_num]
+    size = min(max_size, labels.size)
+    filled = pd.core.algorithms.take_1d(unique.values, labels[:size],


We need to figure out how to implement this only using public API (nothing prefaced with an underscore). Otherwise, pandas will almost certainly break us in a future release.

I would suggest simply using index.levels[level_num][:max_size]

In some cases it might make sense to just use index.levels[level_num][:max_size] to show the (first) unique values for each level.

But in other cases I find this

>>> data.isel(x=range(3)) <xarray.DataArray (x: 3)> array([0, 1, 2]) Coordinates: * x (x) object MultiIndex - a_long_level_name object 'a' 'a' 'a' - level_1 int64 0 1 2

much better than this:

>>> data.isel(x=range(3)) <xarray.DataArray (x: 3)> array([0, 1, 2]) Coordinates: * x (x) object MultiIndex - a_long_level_name object 'a' 'b' - level_1 int64 0 1 2 3 4 5 6 7 8 9

What about just returning pd.core.algorithms.take_1d(unique.values, labels[:max_size]) or even np.take(unique.values, labels[:max_size])?

Yes, that's a good point. It's definitely more useful to display the particular index levels that are actually used at those points.

In that case I would use unique[labels[:max_size]]. It's better to avoid the NumPy methods like np.take on pandas.Index objects because they don't always always preserve dtypes properly.

benbovy · 2016-06-15T12:05:42Z

I ended up refactoring most of the implementation.

It looks cleaner I think, although I'm still not fully satisfied with this version. It isn't as straightforward to implement as I thought, especially for checking (and getting) MultiIndex and for dealing with col_width and max_width.

shoyer · 2016-07-28T02:27:23Z

xarray/core/formatting.py

+
+
+def _maybe_summarize_multiindex(var, col_width, max_width):
+    if isinstance(var.variable._data, PandasIndexAdapter):


I think isinstance(var, Coordinate) would be a safer check.

benbovy · 2016-08-31T21:40:59Z

Closing this. See #947.

Benoit Bovy added 6 commits June 6, 2016 22:27

repr for multi-index showing level names/dtypes

f5bd116

display the actual used levels

7c88d41

much more efficient display of actual used levels

b624381

fixed col width calculation error with attrs

dc35362

clean up

6673138

Don't need to set default multi-index level names

73a828b

shoyer reviewed Jun 14, 2016
View reviewed changes

refactor

4e7793a

shoyer reviewed Jul 28, 2016
View reviewed changes

benbovy closed this Aug 31, 2016

benbovy deleted the multi-index_repr branch September 2, 2016 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Multi-index repr #879

Multi-index repr #879

Uh oh!

benbovy commented Jun 11, 2016 •

edited

Loading

Uh oh!

shoyer Jun 14, 2016

Uh oh!

shoyer Jun 14, 2016

Uh oh!

benbovy Jun 14, 2016

Uh oh!

shoyer Jun 14, 2016

Uh oh!

benbovy commented Jun 15, 2016

Uh oh!

shoyer Jul 28, 2016

Uh oh!

benbovy commented Aug 31, 2016

Uh oh!

Uh oh!



		def _maybe_summarize_multiindex(var, col_width, max_width):
		if isinstance(var.variable._data, PandasIndexAdapter):

Uh oh!

Multi-index repr #879

Multi-index repr #879

Uh oh!

Conversation

benbovy commented Jun 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer Jun 14, 2016

Choose a reason for hiding this comment

Uh oh!

shoyer Jun 14, 2016

Choose a reason for hiding this comment

Uh oh!

benbovy Jun 14, 2016

Choose a reason for hiding this comment

Uh oh!

shoyer Jun 14, 2016

Choose a reason for hiding this comment

Uh oh!

benbovy commented Jun 15, 2016

Uh oh!

shoyer Jul 28, 2016

Choose a reason for hiding this comment

Uh oh!

benbovy commented Aug 31, 2016

Uh oh!

Uh oh!

benbovy commented Jun 11, 2016 •

edited

Loading