-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Allow .chunk
for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files
#8579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
You should have received a warning when opening the file with instructions on what to do (see also the issue you referenced): In [5]: import xarray as xr
...:
...: ds = xr.Dataset({"a": (("x", "x"), [[0, 1], [2, 3]])})
...: ds
.../xarray/namedarray/core.py:487: UserWarning: Duplicate dimension names present: dimensions {'x'} appear more than once in dims=('x', 'x'). We do not yet support duplicate dimension names, but we do allow initial construction of the object. We recommend you rename the dims immediately to become distinct, as most xarray functionality is likely to fail silently if you do not. To rename the dimensions you will need to set the ``.dims`` attribute of each variable, ``e.g. var.dims=('x0', 'x1')``.
warnings.warn(
Out[5]:
<xarray.Dataset>
Dimensions: (x: 2)
Dimensions without coordinates: x
Data variables:
a (x, x) int64 0 1 2 3 The warning itself is not as helpful for duplicated dimensions on a variable within a dataset, though, since for In [6]: ds.variables["a"].dims = ("x0", "x1")
...: ds
Out[6]:
<xarray.Dataset>
Dimensions: (x: 2)
Dimensions without coordinates: x
Data variables:
a (x0, x1) int64 0 1 2 3 |
Alright, thanks! So in this case the chunking fails unless the dimensions are renamed. The solution would therefore be something like: ds = xr.open_dataset("instrument_data.nc", decode_cf=True, mask_and_scale=True)
ds.variables["relative_spectral_covariance"].dims = ("x0", "x1")
ds.chunk(chunks="auto") |
So am I reading this correctly that there is no way to workaround this if we want to use |
I think we can enable |
.chunk
for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files
What is your issue?
Sentinel-3 OLCI files (e.g. taken from Copernicus Data Space Ecosystem) come with duplicate dimensions which causes xarray
2023.12.0
to raise after #8491. Specificallyinstrument_data.nc
cannot be opened anymore:Results in the now expected ValueError:
ncdump -h
prints:The
relative_spectral_covariance
variable has duplicate dimensions. What do you suggest doing in such cases?I guess this is related to #1378.
The text was updated successfully, but these errors were encountered: