Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide better error message when dimension name matches argument #3336

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gwgundersen
Copy link
Contributor

WIP PR for #3324. Since either_dict_or_kwargs is called 57 times, I don't want to modify every function call without a preliminary review. My main question is if using locals() at the top of each function is considered acceptable. Of course, if locals() is called after some local variables are created, this code could raise an error when no conflict exists. But explicitly passing in the function argument names in all 57 functions seems brittle. I'd appreciate any thoughts on the PR before I modify all the functions and write some tests.

@max-sixty
Copy link
Collaborator

That's a clever approach. It's a bit magic, but maybe the right tradeoff. Are there any performance implications?

I agree with @gwgundersen re writing each item out being impractical, unless we're only doing for sel & isel.

Any broader thoughts re the need to apply to all methods, vs this sort of magic?

if inter:
raise ValueError(
"the dimension name '%s' matches an argument "
"to .%s" % (inter.pop(), func_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Please pass a dictionary to avoid this conflict." or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do.

indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
func_args = set(locals().keys())
dims = set(self.dims)
indexers = either_dict_or_kwargs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the magic is OK for the convenience; but is there a better way than locals()? Like inspect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this works

func_args = set(inspect.signature(self.sel).parameters)

and is less brittle than locals() without much function-specific overhead. either_dict_or_kwargs could even take self.sel as an argument, and call inspect itself. I did a basic performance test, and locals() is 1/100th of a second faster on average.


That said, do we want to not raise an error on something like this?

arr = xr.DataArray(np.random.random((100, 10)),
                   dims=['foo', 'method'],
                   coords={
                       'foo': range(100),
                       'method': range(10)
                   })
 arr.sel(foo=range(0, 100, 2))

My current implementation will raise an error because while the user did not pass in a value for method, it is still a function parameter. One way to address this is something like

locals_ = locals()
params = inspect.signature(self.sel).parameters
func_args = {p for p in params if locals_[p] != params[p].default}

but now we're getting even more complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what to do here. Perhaps @crusaderky has some ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it's seriously overengineered to me. You really shouldn't be using inspect.signature when you're dealing with a single, hardcoded function...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the idea is to use a helper function anytime we allow dict or kwargs. sel is the test case for now

@max-sixty
Copy link
Collaborator

I did a basic performance test, and locals() is 1/100th of a second faster on average.

Do you know what the absolute numbers are? i.e. how much performance penalty does the faster routine introduce?

@max-sixty
Copy link
Collaborator

Another alternative is a decorator that can encapsulate some of this meta-programming (could also do the work of either_dict_or_kwargs)

But if we're only doing this for a couple of methods, I would advocate we just write out the keywords. I know it's a bit of a inelegant defeat, but it's pythonic.

sel & isel are probably more important than the others

@shoyer do you have thoughts re encapsulation vs magic?

@gwgundersen
Copy link
Contributor Author

I'm almost done with classes and am happy to return to this if there is any interest. I don't want to design something brittle, though, and would love some guidance.

@max-sixty
Copy link
Collaborator

@gwgundersen I'm really sorry to have kept you waiting, especially after this PR and the others strong ones you've recently done. I temporarily paused my xarray oversight while I came up to speed with a new job and let this drop.

@dcherian any chance you could take a glance? Otherwise I'll look properly this weekend. Thanks.

@gwgundersen
Copy link
Contributor Author

The more I think about this PR, the more I dislike this approach. The solution must either be brittle or over-engineered. I discussed this issue with a friend, and other approaches don't seem better: currying the function—arr.sel(method="nearest")(method="foo")—or adding a decorator that caches the results of a stripped down inspect.signature. I think the best approach is to just raise a ValueError if certain string arguments are not in a predefined set of choices:

if method not in ["nearest", ...]:
    raise ValueError(...)

@max-sixty, @dcherian, @shoyer thoughts?


Not to open a can of worms, but the root cause of this issue is that the Xarray API accepts both **kwargs and ordinary named args. This PR—and methods like either_dict_or_kwargs—seems like workarounds to accommodate this model. Is there any interest in moving away from this API long-term?

@max-sixty
Copy link
Collaborator

It's a shame this stalled.

I think the best approach is to just raise a ValueError if certain string arguments are not in a predefined set of choices:

I think this would be a good solution without any tradeoffs. If anyone is up for implementing that, I would be keen to merge...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants