Skip to content

What should Dataset.count return for missing dims? #6749

Open
@headtr1ck

Description

@headtr1ck

What is your issue?

When using a dataset with multiple variables and using Dataset.count("x") it will return ones for variables that are missing dimension "x", e.g.:

import xarray as xr
ds = xr.Dataset({"a": ("x", [1, 2, 3]), "b": ("y", [4, 5])})
ds.count("x")
# returns:
# <xarray.Dataset>
# Dimensions:  (y: 2)
# Dimensions without coordinates: y
# Data variables:
#     a        int32 3
#     b        (y) int32 1 1

I can understand why "1" can be a valid answer, but the result is probably a bit philosophical.

For my usecase I would like it to return an array of ds.sizes["x"] / 0. I think this is also a valid return value, considering the broadcasting rules, where the size of the missing dimension is actually known in the dataset.

Maybe one could make this behavior adjustable with a kwarg, e.g. "missing_dim_value: {int, "size"}, default 1.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions