Skip to content

xr.open_mfdataset raised duplicate values #6297

Answered by TomNicholas
shuai-zhou asked this question in Q&A
Discussion options

You must be logged in to vote

You could either .map the drop_duplicates method over the variables in the dataset, or just use the code in the drop_duplicates method directly on the dataset. Then you can create your own function to use within preprocess like this

def drop_duplicates(obj, dim, keep="first"):
    if dim not in obj.dims:
        raise ValueError(f"'{dim}' not found in dimensions")
    indexes = {dim: ~obj.get_index(dim).duplicated(keep=keep)}
    return obj.isel(indexes)

Given that this works on datasets as well as dataarrays I don't know why there isn't a Dataset.drop_duplicates method - seems like we could add one.

Replies: 1 comment 7 replies

Comment options

You must be logged in to vote
7 replies
@shuai-zhou
Comment options

@TomNicholas
Comment options

@TomNicholas
Comment options

@TomNicholas
Comment options

@shuai-zhou
Comment options

Answer selected by shuai-zhou
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants