Replies: 7 comments 23 replies
-
I am not sure, what behavior I would expect. But considering multiprocessing is a good point. The uuid would also have the benefit, that if you copy a |
Beta Was this translation helpful? Give feedback.
-
@wshanks @NelDav I don't disagree that the conversation at #189 has sort of gone in many directions and it could be helpful to start over. There is also a lot of overlap with #217, which is only a few days old. Discussions are inherenty challenging when different interested parties are participating at very different rates. Like, honestly you two started a discussion here, went back to #189 and continued exchanging multiple messages on a Saturday morning. These conversations are extremely hard to follow. Adding yet another conversation thread could help, but only if you slowed way, way, way down and explained the issues at hand. Communication is hard, but the goal of these discussions, issues, and pull requests is to make sure that the maintainers (and there are sort of a lot of us) all understand and agree with the expected behavior and the changes requested. #189 should be closed. There are a) too many commits in that PR spanning nearly a year, b) too many asides and discussions, and c) no consensus on what the behavior should be. Here, you sort of start with
To which I am going to say, as politely as possible: What?? You did not define Please close #189, and #217, and start over by discussing and explaining the problem(s) to be solved with the current code. |
Beta Was this translation helpful? Give feedback.
-
It seems like this discussion is totally motivated by the pickling problem. But I think we need to step back and have a discussion about what the requirements are and what we're trying to accomplish with these changes. I've also thrown in the idea that a refactor of the data objects in I'm going to say here that this is a very strange question. If I'm not sure how to design such that this is the case though. Perhaps we don't want to try to support correlations between variables that live in separate threads... |
Beta Was this translation helpful? Give feedback.
-
I just noticed, that the title is kind of confusing. It mentions two options to define
But it does not clarify that the first option is equivalent to the current implementation. |
Beta Was this translation helpful? Give feedback.
-
Has this issue been hashed here? I think this is a straight up inconsistency that indicates the code needs to be reworked.
In the first example copy breaks equality, in the second it maintains it. I think copy should consistently keep or break equality in both cases. I think the current behavior is a bug. I think the presence of this bug gives us the leeway to make a decision one way or another whether copying keeps or breaks equality. I think copying should definitely keep equality. Suppose we decide copying breaks equality. Then above we have |
Beta Was this translation helpful? Give feedback.
-
A solution to fix that would be, to have one "reference" object for each UUID. The "reference" object will only have a nominal value and a std_dev. Store the assignment between UUID and "reference" variable in a static dict. Inside the linear part of When updating the std_dev of |
Beta Was this translation helpful? Give feedback.
-
Seems we're learning a lot quickly. Is there any need for Stepping back and thinking conceptually here's what makes sense to me.
One glaring issue here is that the proposal for It seems tricky to have both lazy evaluation and immutability. Especially if you want the object that the user has, I'm stumped here for the moment. edit: Actually I do think By contrast, |
Beta Was this translation helpful? Give feedback.
-
AffineScalarFunc
is defined as an object with anominal_value
and aderivatives
dictionary with[Variable, float]
pairs mappingVariable
s to coefficients of derivatives with respect to those variables.Variable
is a special subclass ofAffineScalarFunc
withderivatives
set to{self: 1.}
. SinceVariable
is used as a dictionary key, it must be hashable.Variable.__hash__
is defined usingid(self)
:uncertainties/uncertainties/core.py
Lines 2779 to 2784 in ea1d664
AffineScalarFunc.__eq__
is defined by(self - other).nominal_value == 0 and (self - other).std_dev == 0
. The way error propagation works gives(self - other).std_dev
like (this is not the exact form of the code):This
std_dev
can only be 0 when theVariable
s inself.derivatives
andother.derivatives
are the same (or there are 0's forVariable
'sstd_dev
or derivative coefficients) because of the**2
. SinceVariable
s are defined with only themselves in their derivatives, they can only be equal to themselves or theAffineScalarFunc
that comes from multiplying them by 1 (again ignoring thestd_dev == 0
case). (More generally, barring thestd_dev
/derivatives equal to 0 edge cases,AffineScalarFunc
s are equal only when the nominal values are equal and thederivatives
dict of[Variable, coefficient]
pairs is equal).There is something a bit tricky happening here.
Variable.__eq__
ends up calling a method that callsself.derivatives[self]
. The way that look up inself.derivatives
is performed is by callingVariable.__hash__
and checking the dict's hash table. If a matching hash is found, Python checks the corresponding matched key against the lookup key for equality. Since we are already inside ofVariable.__eq__
that would trigger an infinite recursion...fortunately Python checksid()
of the two keys first and skips__eq__
if they match. So this lookup is fine as long as there is no hash collision. SinceVariable.__hash__
is based onid()
anduncertainties
controls the construction ofAffineScalarFunc.derivatives
and only allowsVariable
keys there, there is no possibility for such a hash collision (also, on x86_64 at least memory addresses usually stay below2**48
while the Python hash modulus is2**61
).So
Variable.__eq__
is tied toid()
. AVariable
will be equal to itself and there is no way to construct a secondVariable
instance that will be equal to the first (ignoring zerostd_dev
).An alternative to using
id()
would be to assignVariable
s a random id usinguuid.uuid4()
from the standard library duringVariable.__init__
. With this method, each call toVariable()
will still create a new independent variable instance, but the identity of that variable is not tied to the current Python process. (For reference, in Qiskit, another project I work with, there is a class calledParameter
which works this way).Here is an example of how it might be desirable to split
Variable
identity from Python identity:Here, we create a
Variable
nameda
, pass it to a subprocess for calculation (multiply by 2), and receive the result. We might expect that the result we got back would be equal to2 * a
. However, becausemultiprocessing
usespickle
to send data between processes, the resultAffineScalarFunc
and theVariable
inside of it are created as new objects with newid()
values and thatVariable
no longer is the same as the originala
, so the correlation is lost.A similar example would be to calculate
a2 = 2 * a
in a single process and then serializea
anda2
withpickle
. If they are serialized and deserialized together in the same object likeb, b2 = pickle.loads(pickle.dumps([a, a2]))
, thenb2 == b
. If instead they are processed separately likeb = pickle.loads(pickle.dumps(a)); b2 = pickle.loads(pickle.dumps(a2))
, thenb2 != b
.pickle
preserves the correlation only when the objects are serialized together.Another question that was raised was the behavior of
copy
. I do not have much intuition for howcopy
should work. Withid()
-based equality, it has to produce aVariable
that is not equal to the original. Withuuid
, it could work either way, based on the implementation.Another issue that comes up for the
uuid
approach is thatVariable
is currently documented as being intentionally mutable. The documentation suggests changing the value ofVariable.std_dev
to see how the overall error on anAffineScalarFunc
changes. That doesn't work with theuuid
method since with that method you can have differentVariable
instances representing the same independent variable and changing thestd_dev
on one instance might not change the value on the instance actually used inside a particularAffineScalarFunc
.Regarding this last point, I wonder if it would not be better to deprecate that mutable
std_dev
feature in favor of a helper function that could calculate the error on anAffineScalarFunc
with a modifiedstd_dev
for aVariable
, so that we could treatVariable
as immutable. Then we could handle the 0std_dev
edge case of__eq__
more cleanly.Here is the original opening post which provided less context:
I wanted to branch off the discussion started here and here into a separate thread since #189 is already long enough and I don't want to make it harder to follow. I might edit this post later to give more an introduction but for now see the preceding linked comments. Here I respond to the second comment:
I agree that the current behavior matches the Python data model. That is why I said that switching to a uuid for
Variable
could be a follow up after #189. The question is what is the behavior a user would want and expect. If I create a set ofAffineScalarFunc
s and serialize them, I would expect (naively, not looking the current implementation) that deserializing them would preserve their correlations. With the current implementation, that is only true if they are serialized together within a single payload. I don't think it is unreasonable to imagine serializingAffineScalarFunc
s in multiple payloads though -- for example usingmultiprocessing
to generate someAffineScalarFunc
s, send them to a subprocess, and then send the results back to the main process (multiprocessing
uses pickle to serialize data sent between processes). The returned results would have lost correlations with the original values, which I think is surprising.Here is an example:
I think it would be reasonable to have guessed that the last line would be True.
Beta Was this translation helpful? Give feedback.
All reactions