Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the TimeIndependentMDCObjectiveFunction class #515

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

rileyjmurray
Copy link
Contributor

@rileyjmurray rileyjmurray commented Dec 16, 2024

TimeIndependentMDCObjectiveFunction is one of the most complicated classes I've encountered in pyGSTi. Right now ...

  • It's 1000 lines long.
  • It has 12 private instance methods.
  • It has hard-to-track dependencies between functions for computing "lsvec-related" quantities and "terms-related" quantities.

This PR simplifies TimeIndependentMDCObjectiveFunction so that

  • It's 668 lines long.
  • It has 4 private instance methods. It has 5 private instances as of 12/24. One of them is there to fix a stupid sign issue that I can resolve if I change some RawObjectiveFunction class' lsvec and dlsvec methods.
  • All "lsvec-related" quantities are computed by applying simple transformations to "terms-related" quantities.

These changes should materially improve maintainability of TimeIndependentMDCObjectiveFunction, as well as its 8 subclasses that collectively span ~500 lines. The outward behavior of this class should be the same as before up to floating point rounding errors.

For the curious, I went down this rabbit hole when working on PR #506.

Incidental changes related to MPI testing: moved some lightweight MPI tests from test/test_packages to test/unit. This move was prompted by this PR because this PR touches code that involves a lot of MPI. While I was at it I renamed an MPI performance profiling file so that it didn't get misinterpreted as a test file.

@rileyjmurray rileyjmurray changed the title Simplify the TimeIndependentMDCObjectiveFunction class WIP: simplify the TimeIndependentMDCObjectiveFunction class Dec 16, 2024
@rileyjmurray rileyjmurray marked this pull request as ready for review December 17, 2024 11:48
@rileyjmurray rileyjmurray requested a review from a team as a code owner December 17, 2024 11:48
@rileyjmurray rileyjmurray requested a review from sserita December 17, 2024 11:48
@rileyjmurray rileyjmurray changed the title WIP: simplify the TimeIndependentMDCObjectiveFunction class Simplify the TimeIndependentMDCObjectiveFunction class Dec 17, 2024
@rileyjmurray
Copy link
Contributor Author

rileyjmurray commented Dec 18, 2024

Investigating possible changes in numerical behavior

During the 12/17 dev meeting Erik raised the question of how numerics might be different by focusing on "terms" and computing "lsvec" as a function of terms.

changes to TimeIndependentMDCObjectiveFunction.dterms

The screenshot-of-screenshots below shows the new code and old code.
Code that's hidden in folded regions (or otherwise off-screen) is changed only in incidental ways.
From what is shown, the operations are equivalent even in finite-precision arithmetic.
Nothing to worry about here!
image

changes to TimeIndependentMDCObjectiveFunction.dlsvec

This one is trickier to explain. Let's start with new code and old code.
image
In order to assess differences we need to dig into the call that the old code makes to self.raw_objfn.dlsvec_and_lsvec, and specifically into the nature of the first of this function's two returned arrays.
There are three definitions of this function in objectivefns.py.
image
Clearly, only the first of these definitions is nontrivial.
To dig further we need to investigate raw_objfn.dlsvec functions. There is a default implementation of this function and an implementation in RawChi2Function.

Let's dig into the default implementation first.
image
I'll call attention to line 635. The definition of pt5_over_lsvec looks odd, but it's actually equivalent to the definition of p5over_lsvec at lines 4605-4606 in the new code objectivefns.py.
Putting things all together, the old code computes dlsvec by evaluating

dprobs * (raw_dterms * pt5_over_raw_lsvec)[:, None]

while the new code computes dlsvec as

(dprobs * raw_dterms[:, None]) * pt5_over_raw_lsvec[:, None]

The only differences that could arise here are from non-associativity of floating point multiplication. Such differences can certainly happen. However, there is no fundamental reason to prefer the current approach over the new approach. Overall conclusion: no cause for worry when we end up calling RawObjectiveFunction.dlsvec.

So ... what about RawChi2Function.dlsvec? The implementation there is nontrivial, unfortunately.
image

changes to TimeIndependentMDCObjectiveFunction.terms

There are no numerical changes here.

changes to TimeIndependentMDCObjectiveFunction.lsvec

The old code called raw_objfn.lsvec directly, while the new code calls raw_objfn.terms and then takes a square root. In order to understand the differences between these approaches we need to look at how raw_objfn might implement these functions.

The default implementations of these functions in RawObjectiveFunction actually have a circular dependency (see below) so it's necessary for subclasses to override one of these functions.
image

Two subclasses of RawObjectiveFunction end up overriding RawObjectiveFunction.lsvec. The implementation in RawChi2Function can be seen two screenshots ago. The main takeaway there is that its lsvec can contain negative components. Meanwhile, the RawPoissonPicDeltaLogLFunction class uses
image

Based on this, the relationship between the new and old outputs of TimeIndependentMDCObjectiveFunction.lsvec when using these raw objective functions is lsvec_new = ( lsvec_old ** 2) ** 0.5. This kind of transformation might be unsettling. We know that preferred methods for accurately computing the 2-norm of a vector don't literally take the square root of the sum of the squares. However, in that context things are delicate because there is a reduction operation (a sum) in between the evaluation of the squares and the square-root. Since we're applying the square and then immediately applying the square-root, this is less of an issue. The only real issue is that this transformation loses the sign information stored in lsvec_old.

…t commit will simpify further, which may have consequences from the perspective of floating point arithmetic. Checking in NOW in case we want to revert to this version for numerical reasons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant