ENH: Preserve monotonic descending index order when merging #2972

Huite · 2019-05-18T19:12:11Z

Addresses GH2947
When indexes were joined in a dataset merge, they would always get sorted in ascending order. This is awkward for geospatial grids, which are nearly always descending in the "y" coordinate.
This also caused an inconsistency: when a merge is called on datasets with identical descending indexes, the resulting index is descending. When a merge is called with non-identical descending indexes, the resulting index is ascending.
When indexes are mixed ascending and descending, or non-monotonic, the resulting index is still sorted in ascending order.

Closes xr.merge always sorts indexes ascending #2947
Tests added
Fully documented, including whats-new.rst for all changes and api.rst for new API

Comments

I was doing some work and I kept running into the issue described at #2947, so I had a try at a fix. It was somewhat of a hassle to understand the issue because I kept running into seeming inconsistencies. This is caused by the fact that the joiner doesn't sort with a single index:

def _get_joiner(join):
    if join == 'outer':
        return functools.partial(functools.reduce, operator.or_)

That makes sense, since I'm guessing pandas.Index.union isn't get called at all. (I still find the workings of functools a little hard to infer.)

I also noticed that an outer join gets called with e.g. an .isel operation, even though there's only one index (so there's not really anything to join). However, skipping the join completely in that case makes several tests fail since dimension labels end up missing (I guess the joiner call takes care of it).

It's just checking for the specific case now, but it feels like an very specific issue anyway...

The merge behavior is slightly different now, which is reflected in the updated test outcomes in test_dataset.py. These tests were turning monotonic decreasing indexes into an increasing index; now the decreasing order is maintained.

* Addresses GH2947 * When indexes were joined in a dataset merge, they would always get sorted in ascending order. This is awkward for geospatial grids, which are nearly always descending in the "y" coordinate. * This also caused an inconsistency: when a merge is called on datasets with identical descending indexes, the resulting index is descending. When a merge is called with non-identical descending indexes, the resulting index is ascending. * When indexes are mixed ascending and descending, or non-monotonic, the resulting index is still sorted in ascending order.

shoyer · 2019-05-19T19:04:42Z

I'm not entirely sure this is a good idea -- a discrepancy in the behavior of an outer join between pandas seems non-ideal. Perhaps somebody else has opinions about whether this would be worthwhile?

If we do want to do this, let's put the change inside the _get_joiner function. Specifically, instead of

def _get_joiner(join):
    if join == 'outer':
        return functools.partial(functools.reduce, operator.or_)

you could write something like:

def _outer_join(indexes):
    index = functools.reduce(indexes, operator.or_)
    # your logic for reverse sorting the result
    return index

def _get_joiner(join):
    if join == 'outer':
        return _outer_join

Note that operator.or_ on a pandas.Index is the same thing as pandas.Index.union, so this could be equivalently written (maybe more clearly) as:

def _outer_join(indexes):
    index = indexes[0]
    for other in indexes[1:]
        index = index.union(other)
    # your logic for reverse sorting the result
    return index

Huite · 2019-05-19T22:41:49Z

I'm definitely not convinced it's a great idea either; a pull request is hopefully the best way to get some discussion going! Putting it in the _get_joiner is definitely the sensible choice, and the last suggestion is definitely more clear to me.

I've had a bit more thought and maybe it's useful to consider what xarray's philosophy about these things is. I think flexibility is a primary concern and generally, things *just work*. The reason I'm running into issues here is because I'm writing some code which operates on DataArrays, which isn't nearly as robust or flexible, and doesn't just work.

The answer in this case might be that I should be using xarray's powerful alignment machinery to do the work for me, rather than to assume/demand certain features of the data. Of course, that requires some digging into how xarray does alignment. But I'd end up with a more flexible tool in the end.

Perhaps that should be the general rule: if you're extending xarray, like xarray, don't rely too much about your coordinates staying the way they are. Maybe such a description could belong in the xarray Internals page just to make it explicit.

dcherian · 2019-05-20T17:24:55Z

From a user perspective, I think it's plain weird to merge a bunch of datasets with montonically decreasing coordinates and land up with a new dataset that has monotonically increasing coordinates.

shoyer · 2019-05-27T01:52:29Z

Yes, I do think this deviation from pandas would make sense -- though it would be even better if we could convince pandas to make the same change!

We already have low-level tools for controlling how alignment works (construct the index yourself and use .reindex), but we should make the default behavior as user friendly as possible.

cheginit mentioned this pull request Mar 1, 2021

Why is the vertical axis flipped sometimes? hyriver/py3dep#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Preserve monotonic descending index order when merging #2972

ENH: Preserve monotonic descending index order when merging #2972

Huite commented May 18, 2019

shoyer commented May 19, 2019

Huite commented May 19, 2019 •

edited

Loading

dcherian commented May 20, 2019

shoyer commented May 27, 2019

ENH: Preserve monotonic descending index order when merging #2972

Are you sure you want to change the base?

ENH: Preserve monotonic descending index order when merging #2972

Conversation

Huite commented May 18, 2019

Comments

shoyer commented May 19, 2019

Huite commented May 19, 2019 • edited Loading

dcherian commented May 20, 2019

shoyer commented May 27, 2019

Huite commented May 19, 2019 •

edited

Loading