-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unequal treatment of np.NaNs with operators #54
Comments
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, I replied a bit to fast saying a lot of stuff that does not make sense at all ;) As semantique treats it now, it seems indeed to be a bug in any case. We do have the issue with nan values having different meanings, see #15. If a True observations in x has no corresponding observation in y, the result of the AND operator should intuitively be something as "unknown". But we do not have a way yet to distinguish between things like "the value of this observation is unknown" and "there was no observation at all done at this location". Thoughts? |
Even though I do see the issue of having multiple semantics for NaN values (meaning that they could represent missing, unknown or invalid figures), I'm not sure if this is related to the current problem or should be seen as something that is rather independent. I don't see how the unequal treatment of NaN depending if they occur in x or y is currently contributing to resolving the different NaN semantics. I also doubt that there is any potential way of leveraging the NaN treatment within the operators module to resolve the NaN semantics issue even if we would restructure it. Two considerations that justify my point of view:
I therefore regard the current implementation as a bug in any case and also believe that the problematic semantics of the NaN cannot be solved with this - at least not in a simple way ;) For the given case, I suggest modifying the code so that the correct operator algebra is guarenteed. This is already the case for the algebraic bivariate operators (add, subtract, multiply, ...), as NaN in x and y are transferred equally to the result. The groups of functions that would have to be changed are... a) relational bivariate operators (less, greater, equal, ...) The modification would be a replacement by |
Description
When evaluating the operators, np.NaNs are treated unevenly, depending on whether they occur in the array x or in the comparison array y. This is due to the use of
np.where(pd.notnull(x)...)
without the complementarynp.where(pd.notnull(y)...)
in the operators.py, where one-sided NaNs are retained if they occur in x, but not if they occur in y. The consequence of this is, for example, a violation of the commutativity of Boolean operators (see MRE below).Reproducible example
Expected behavior
The correctness of the operator algebra should be ensured by consistent handling of NaNs regardless of whether they occur in x or y.
Proposed solution
Replacing the following operator defintions
with
The text was updated successfully, but these errors were encountered: