Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate less than and greater than comparisons for UFloat #278

Open
jagerber48 opened this issue Dec 29, 2024 · 7 comments
Open

Eliminate less than and greater than comparisons for UFloat #278

jagerber48 opened this issue Dec 29, 2024 · 7 comments

Comments

@jagerber48
Copy link
Contributor

The user guide already calls out this strange behavior

>>> a = ufloat(25, 10)
>>> b = ufloat(25, 8)
>>> a >= b
False
>>> a > b
False
>>> a == b
False
>>> a.nominal_value >= b.nominal_value
True

That is, the order established on UFloat objects does not obey the law of trichotomy that you would expect to be followed by a strict total order.

In the new post #262 framing, UFloats are considered to model random variables. For random variables we talk about equality in distribution. This means that comparing nominal values, or even nominal values and standard deviations, does not suffice to establish equality. Rather two UFloats should only be equal if they give the same weights to the same atomic units of uncertainty (UAtoms).

This means that comparing nominal values shouldn't suffice to establish greater than or less than relations either. When working with random variables I don't think it is typical to establish an ordering on random variables. So I propose uncertainties doesn't try to do so. If users want to know if the mean of some random variable passes some fixed float threshold (including the mean of some other UFloat) then I suggest they explicitly indicate that by extracting the nominal_value from the UFloat they are working with.

Thoughts? Opinions?

@newville
Copy link
Member

That is, the order established on UFloat objects does not obey the law of trichotomy that you would expect to be followed by a strict total order.

That law applies to real numbers. UFloats are not real numbers. One should not expect that to apply to complex numbers, classes, or UFloats.

Nominal values (and std_dev) are real numbers. That law should apply. And it does.

I would assume that most people would expect ufloat1 == ufloat2 to mean
(ufloat1.n == ufloat2.n) and (ufloat1.s == ufloat2.s). "not equal" is similarly easy.

Less than and greater than are more troublesome. It seems okay for these to always return False. It would also be OK to raise a TypeError, as 8 + 2j > 8 + 1j does.

@jagerber48
Copy link
Contributor Author

I would assume that most people would expect ufloat1 == ufloat2 to mean
(ufloat1.n == ufloat2.n) and (ufloat1.s == ufloat2.s). "not equal" is similarly easy.

This expectation is incorrect (not sure if that is what you were trying to point out?). From the docs:

>>> x = ufloat(5, 0.5)
>>> y = ufloat(5, 0.5)
>>> x == x
True
>>> x == y
False

because $x$ and $y$ are uncorrelated, despite having the same standard deviation. This is what I meant above in reference to "equality in distribution". x and y, thought of as random variables, have different distributions because they are correlated differently, so they should not be equal. By contrast, x and 1*x have the same distributions, so they are equal. While I don't love comparing UFloat to float, if I think about this distribution idea I can see that a random variable with zero variance follows the same distribution as a float, so this helps justify the current behavior on comparison with float.


Less than and greater than are more troublesome. It seems okay for these to always return False. It would also be OK to raise a TypeError, as 8 + 2j > 8 + 1j does.

Yeah so right now these return True if the nominal_values follow the requested order. I think this behavior is weird. I would prefer TypeError (like your complex example) to always returning False. Again, guided by the idea that UFloat should model random variables, and we don't associate any ordering with random variables (just like there isn't an ordering on the complex numbers). But I'm open to begin convinced on always returning False if there arguments/people in favor.


One big question is how painful would this change be for users? It's hard for me to know... Recovery would probably be pretty simple if what the user is really wanting to do is a comparison on the nominal value. We could deprecate the __lt__ function on some schedule.

@newville
Copy link
Member

I would assume that most people would expect ufloat1 == ufloat2 to mean
(ufloat1.n == ufloat2.n) and (ufloat1.s == ufloat2.s). "not equal" is similarly easy.

This expectation is incorrect (not sure if that is what you were trying to point out?). From the docs:

Yes, yes, I do understand that. I think that most people will find this current behavior of "==" to be confusing. The hidden correlation with self is not obvious. Any comparison of UFloats is going to struggle with "obvious".

Worse, any comparison with ">" or "<" is basically to "impossible to decide". It could be "always False", or it could raise a TypeError (citing the precedent of complex numbers).

With

>>> ufloat(5.2, 1.5) > ufloat(4.9, 3.3)
False

the ambiguity would be pretty clear, but

>>> ufloat(8000, 4) > ufloat(-2500, 200)
False

seems weird.

That probably argues for preferring "raise TypeError". If that were the case, then "raise TypeError" for "==" and "!=" would then also be defensible....and probably no more confusing than the current situation ;).

But to come back to the main topic: yeah, the law of trichotomy does not apply. It does not apply to complex numbers, sets, vectors, etc. There is no reason to expect it to apply to UFloats.

@newville
Copy link
Member

... and also ... or maybe to summarize: raise TypeError for all comparisons, including equality, seems reasonable.
I think that might be what @jagerber48 is saying too.

@jagerber48
Copy link
Contributor Author

@newville ok, I can't tell if I'm fully following your points or not but I think I am. You are pointing out

  • trichotomy applies to the real numbers but not other classes, so we shouldn't necessarily expect it to apply to e.g. UFloat. Yes, I agree with this.
  • In this case, we shouldn't really be guided by what is surprising or not since the appropriate behavior for equality is already surprising. This is a good point.

So I agree with everything you are saying. Here is what I'm proposing for UFloat:

  • __eq__ works as is. Two UFloat objects are equal if they have the same nominal value and their uncertainties are equal (as linear combinations of uncertain elements). In the current code this is implemented as a check the difference between self and other has zero nominal value and zero standard deviation which is a necessary and sufficient condition for the same.
  • __lt__, __gt__, __le__, __ge__ are all not defined on UFloat. This will have the effect that comparison using <, <=, >, >= involving UFloat objects gives a TypeError.

This proposal is fully guided by the principle that UFloat should model a random variable. On random variables it is very common to consider equality of two random variables. For mathematical random variables there are a few ways to do it. See Equivalence of random variables. For UFloat it is clear that any of these tests for equality corresponds to equality of UFloat.nominal_value and UFloat.error_components.

It is NOT standard to define any partial or total order < on random variables. Also, I don't see any programming case for wanting an ordering on UFloat. For the cases I can think of you would really want an ordering of the nominal values of UFloat, which are of course float and which of course already have an ordering. So for this reason I'm proposing eliminating __lt__, __gt__, __le__, and __ge__ (making attempts to use them raise a TypeError).


Technical note: If __eq__ is defined then in our case __neq__ will, by python default, be defined as the logical negation of __eq__. In many cases, including ours, this is the behavior we want. So no need to define or discuss __neq__.

@newville
Copy link
Member

@jagerber48 I am OK with __eq__ working as is. But, I also think that the link you point to illustrates the inherent confusion - there is not a single obvious answer. The view that "a == b" is equivalent to "a-b = 0" is defensible. It is the current behavior, so any surprise can at least be explained, even if it is "obvious" to lots of people.

@wshanks
Copy link
Collaborator

wshanks commented Dec 31, 2024

It seems like the consensus is to keep the current __eq__ behavior. I don't think it is common for users to rely on doing different calculations and generating different UFloat instances with the same nominal value and same error coefficients and variables (outside of the 0 error float-like case), but when considering the __eq__ case we should keep in mind that __eq__ also gets used for hashing and that is used for dict keys and set membership. I think there could be a use for that, so I wouldn't make __eq__ raise an exception.

I mentioned in #283 that there is the question of how much a UFloat should act like a float. We could consider letting __gt__/etc work when the UFloat has 0 standard deviation and is being compared to a float or other UFloat with 0 standard deviation. Maybe that is also a pedantic case not likely to come up any way though. In general, I think the users will be happiest if they just apply math transformations to UFloats and then use the nominal value and standard deviation to do comparisons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants