xr.open_mfdataset raised duplicate values #6297

shuai-zhou · 2022-02-24T04:17:16Z

shuai-zhou
Feb 24, 2022

I was trying to read multiple NetCDF files downloaded from here use ds = xr.open_mfdataset('file_path', decode_coords='all', decode_times=False). I got an error: cannot reindex or align along dimension 'latitude' because the index has duplicate values.

A similar discussion can be found here. Like the situation in the discussion, the data I was trying to read has missing values, so why do the missing value issue raise duplicate values error? @TomNicholas mentioned dropping duplicates, but .drop_duplicates only works with data array, so how can I call .drop_duplicates to preprocess? Thanks.

Answered by TomNicholas

Feb 24, 2022

You could either .map the drop_duplicates method over the variables in the dataset, or just use the code in the drop_duplicates method directly on the dataset. Then you can create your own function to use within preprocess like this

def drop_duplicates(obj, dim, keep="first"):
    if dim not in obj.dims:
        raise ValueError(f"'{dim}' not found in dimensions")
    indexes = {dim: ~obj.get_index(dim).duplicated(keep=keep)}
    return obj.isel(indexes)

Given that this works on datasets as well as dataarrays I don't know why there isn't a Dataset.drop_duplicates method - seems like we could add one.

View full answer

TomNicholas · 2022-02-24T15:38:24Z

TomNicholas
Feb 24, 2022
Maintainer

You could either .map the drop_duplicates method over the variables in the dataset, or just use the code in the drop_duplicates method directly on the dataset. Then you can create your own function to use within preprocess like this

def drop_duplicates(obj, dim, keep="first"):
    if dim not in obj.dims:
        raise ValueError(f"'{dim}' not found in dimensions")
    indexes = {dim: ~obj.get_index(dim).duplicated(keep=keep)}
    return obj.isel(indexes)

Given that this works on datasets as well as dataarrays I don't know why there isn't a Dataset.drop_duplicates method - seems like we could add one.

7 replies

shuai-zhou Feb 25, 2022
Author

Thanks for your prompt response. Both methods work (the def drop_latitude_duplicates in the second method should be def drop_duplicates). But there is another error saying cannot reindex or align along dimension 'longitude' because the index has duplicate values. So I do not know what dimensions have duplicates or if other dimensions have duplicates. Could those methods drop duplicates for every dimension? Thanks.

TomNicholas Feb 25, 2022
Maintainer

You could just loop over all dimensions, something like this

def drop_duplicates_along_all_dims(obj, keep="first"):
    deduplicated = obj
    for dim in obj.dims:
        indexes = {dim: ~deduplicated.get_index(dim).duplicated(keep=keep)}
        deduplicated = deduplicated.isel(indexes)
    return deduplicated

ds = xr.open_mfdataset('file_path_*.nc',
                        decode_coords="all",
                        decode_times=False,
                        preprocess=drop_duplicates_along_all_dims)

(though perhaps we should promote .drop_duplicates to accept a list of dims, or ... for all dims).

TomNicholas Feb 25, 2022
Maintainer

Actually you don't even need the loop, you can just build a dict of indexes and pass to isel once, i.e.

def drop_duplicates_along_all_dims(obj, keep="first"):
    all_dims = self.dims
    indexes = {dim: ~self.get_index(dim).duplicated(keep=keep) for dim in all_dims}
    return self.isel(indexes)

TomNicholas Feb 25, 2022
Maintainer

(Also see #6307)

shuai-zhou Feb 26, 2022
Author

It works. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

xr.open_mfdataset raised duplicate values #6297

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

xr.open_mfdataset raised duplicate values #6297

Uh oh!

shuai-zhou Feb 24, 2022

Replies: 1 comment · 7 replies

Uh oh!

TomNicholas Feb 24, 2022 Maintainer

Uh oh!

shuai-zhou Feb 25, 2022 Author

Uh oh!

TomNicholas Feb 25, 2022 Maintainer

Uh oh!

TomNicholas Feb 25, 2022 Maintainer

Uh oh!

TomNicholas Feb 25, 2022 Maintainer

Uh oh!

shuai-zhou Feb 26, 2022 Author

shuai-zhou
Feb 24, 2022

Replies: 1 comment 7 replies

TomNicholas
Feb 24, 2022
Maintainer

shuai-zhou Feb 25, 2022
Author

TomNicholas Feb 25, 2022
Maintainer

TomNicholas Feb 25, 2022
Maintainer

TomNicholas Feb 25, 2022
Maintainer

shuai-zhou Feb 26, 2022
Author