-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Preserve monotonic descending index order when merging #2972
base: main
Are you sure you want to change the base?
Conversation
* Addresses GH2947 * When indexes were joined in a dataset merge, they would always get sorted in ascending order. This is awkward for geospatial grids, which are nearly always descending in the "y" coordinate. * This also caused an inconsistency: when a merge is called on datasets with identical descending indexes, the resulting index is descending. When a merge is called with non-identical descending indexes, the resulting index is ascending. * When indexes are mixed ascending and descending, or non-monotonic, the resulting index is still sorted in ascending order.
I'm not entirely sure this is a good idea -- a discrepancy in the behavior of an outer join between pandas seems non-ideal. Perhaps somebody else has opinions about whether this would be worthwhile? If we do want to do this, let's put the change inside the def _get_joiner(join):
if join == 'outer':
return functools.partial(functools.reduce, operator.or_) you could write something like: def _outer_join(indexes):
index = functools.reduce(indexes, operator.or_)
# your logic for reverse sorting the result
return index
def _get_joiner(join):
if join == 'outer':
return _outer_join Note that def _outer_join(indexes):
index = indexes[0]
for other in indexes[1:]
index = index.union(other)
# your logic for reverse sorting the result
return index |
I'm definitely not convinced it's a great idea either; a pull request is hopefully the best way to get some discussion going! Putting it in the I've had a bit more thought and maybe it's useful to consider what xarray's philosophy about these things is. I think flexibility is a primary concern and generally, things *just work*. The reason I'm running into issues here is because I'm writing some code which operates on DataArrays, which isn't nearly as robust or flexible, and doesn't just work. The answer in this case might be that I should be using xarray's powerful alignment machinery to do the work for me, rather than to assume/demand certain features of the data. Of course, that requires some digging into how xarray does alignment. But I'd end up with a more flexible tool in the end. Perhaps that should be the general rule: if you're extending xarray, like xarray, don't rely too much about your coordinates staying the way they are. Maybe such a description could belong in the xarray Internals page just to make it explicit. |
From a user perspective, I think it's plain weird to merge a bunch of datasets with montonically decreasing coordinates and land up with a new dataset that has monotonically increasing coordinates. |
Yes, I do think this deviation from pandas would make sense -- though it would be even better if we could convince pandas to make the same change! We already have low-level tools for controlling how alignment works (construct the index yourself and use |
Addresses GH2947
When indexes were joined in a dataset merge, they would always get sorted in ascending order. This is awkward for geospatial grids, which are nearly always descending in the "y" coordinate.
This also caused an inconsistency: when a merge is called on datasets with identical descending indexes, the resulting index is descending. When a merge is called with non-identical descending indexes, the resulting index is ascending.
When indexes are mixed ascending and descending, or non-monotonic, the resulting index is still sorted in ascending order.
whats-new.rst
for all changes andapi.rst
for new APIComments
I was doing some work and I kept running into the issue described at #2947, so I had a try at a fix. It was somewhat of a hassle to understand the issue because I kept running into seeming inconsistencies. This is caused by the fact that the joiner doesn't sort with a single index:
That makes sense, since I'm guessing
pandas.Index.union
isn't get called at all. (I still find the workings offunctools
a little hard to infer.)I also noticed that an outer join gets called with e.g. an
.isel
operation, even though there's only one index (so there's not really anything to join). However, skipping the join completely in that case makes several tests fail since dimension labels end up missing (I guess thejoiner
call takes care of it).It's just checking for the specific case now, but it feels like an very specific issue anyway...
The merge behavior is slightly different now, which is reflected in the updated test outcomes in
test_dataset.py
. These tests were turning monotonic decreasing indexes into an increasing index; now the decreasing order is maintained.