Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenation automatically creates indexes where none existed #8871

Open
5 tasks done
TomNicholas opened this issue Mar 25, 2024 · 1 comment
Open
5 tasks done

Concatenation automatically creates indexes where none existed #8871

TomNicholas opened this issue Mar 25, 2024 · 1 comment
Labels
bug topic-combine combine/concat/merge

Comments

@TomNicholas
Copy link
Member

What happened?

Currently concatenation will automatically create indexes for any dimension coordinates in the output, even if there were no indexes on the input.

What did you expect to happen?

Indexes not to be created for variables which did not already have them.

Minimal Complete Verifiable Example

# TODO once passing indexes={} directly to DataArray constructor is allowed then no need to create coords object separately first
coords = Coordinates(
    {"x": np.array([1, 2, 3])}, indexes={}
)
arrays = [
    DataArray(
        np.zeros((3, 3)),
        dims=["x", "y"],
        coords=coords,
    )
    for _ in range(2)
]

combined = concat(arrays, dim="x")
assert combined.shape == (6, 3)
assert combined.dims == ("x", "y")

# should not have auto-created any indexes
assert combined.indexes == {}  # this fails

combined = concat(arrays, dim="z")
assert combined.shape == (2, 3, 3)
assert combined.dims == ("z", "x", "y")

# should not have auto-created any indexes
assert combined.indexes == {}  # this also fails

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

# nor have auto-created any indexes
>       assert combined.indexes == {}
E       AssertionError: assert Indexes:\n    x        Index([1, 2, 3, 1, 2, 3], dtype='int64', name='x') == {}
E         Full diff:
E         - {
E         -  ,
E         - }
E         + Indexes:
E         +     x        Index([1, 2, 3, 1, 2, 3], dtype='int64', name='x',
E         + )

Anything else we need to know?

The culprit is the call to core.indexes.create_default_index_implicit inside merge.py. If I comment out this call my concat test passes, but basic tests in test_merge.py start failing.

I would like know to how to avoid the internal call to create_default_index_implicit. I tried passing compat='override' but that made no difference, so I think we would have to change merge.collect_variables_and_indexes somehow.

Conceptually, I would have thought we should be examining what indexes exist on the objects to be concatenated, and not creating new indexes for any variable that doesn't already have one. Presumably we should therefore be making use of the indexes argument to merge.collect_variables_and_indexes, but currently that just seems to be empty.

Environment

I've been experimenting running this test on a branch that includes both #8711 and #8714, but actually this example will fail in the same way on main.

@TomNicholas TomNicholas added bug needs triage Issue that has not been reviewed by xarray team member and removed needs triage Issue that has not been reviewed by xarray team member labels Mar 25, 2024
@TomNicholas TomNicholas changed the title Concatenation Concatenation automatically creates indexes where none existed Mar 25, 2024
@TomNicholas
Copy link
Member Author

Actually the main issue just seems to come from the usage of the Dataset constructor in _dataset_concat

result = type(datasets[0])(result_vars, attrs=result_attrs)

If we pass a Coordinates object explicitly here I think we can make this work - see #8872.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-combine combine/concat/merge
Projects
Status: To do
Development

No branches or pull requests

1 participant