Replacing Fire minimization with LocalEnergyMinimizer #672

ijpulidos · 2023-03-28T21:50:40Z

Description

Replaces FireMinimizer with LocalEnergyMinimizer to do the replica minimization.

This should enhance the stability of minimization. Supersedes #557

Resolves #668

Todos

Implement feature / fix bug
Add tests
Update documentation as needed
Update changelog to summarize changes in behavior, enhancements, and bugfixes implemented in this PR

Status

Ready to go

Changelog message

L-BFGS based `openmm.LocalEnergyMinimizer` is now used instead of `FireMinimizer` for energy minimization in replicas.

ijpulidos · 2023-03-28T21:51:15Z

I'll be also running a pass of the perses tyk2 benchmark just to check this doesn't change the results from it.

jchodera

Thanks! This looks good except for the question of whether we should be del context at the end of this---see comment.

jchodera · 2023-03-29T15:00:14Z

openmmtools/multistate/multistatesampler.py

        # TODO if energy > 0, use slower openmm minimizer
-
        # Clean up the integrator
        del context


@ijpulidos : I think we might need to remove the del context if we're using self.energy_context_cache.get_context() since this would delete a Context object under the management of the context cache. I can't recall what the correct behavior is here---I think it was to leave the Context under management.

This was perhaps different for the FIRE minimizer since a different integrator would lead to a different Context we would not re-use.

That is a good point, I think we should be okay by removing the del context here.

ijpulidos · 2023-03-29T20:14:09Z

I found out that with these changes the minimization seems to be significantly slower and is not using the 100% (nor close) of the GPU. For the tyk2 benchmark, the differences in times when minimizing complex phase replicas are as follows:

Using FIREMinimizationIntegrator: 32.290s
Using LocalEnergyMinimizer: 500.754s

I could reproduce this both locally with my workstation (that's where I can track the GPU usage) and on lilac (HPC). Is there something special in using our own custom integrators (such as the FireMinimizationIntegrator or the GradientDescentMinimizationIntegrator used in #557 ) that may explain these differences compared to just using openmm.LocalEnergyMinimizer?

@jchodera Maybe you have some suggestions on why this is the case. Thanks!

ijpulidos · 2023-03-29T20:19:53Z

In case it helps, the LocalEnergyMinimizer ends up using the GeodesicBAOABIntegrator from OpenMM in the end.

ijpulidos · 2023-03-29T21:56:04Z

FYI, I'm having a bit of trouble understanding the tolerance units here, we are passing Quantity(value=1.0, unit=kilojoule/(nanometer*mole)) as tolerance, but as far as I understand, tolerance for energy should just be kJ/mol, why does it have nanometers in the denominator? How does this translate between the FireMinimizationIntegrator and LocalEnergyMinimizer? Since we are passing the same value for both.

jchodera · 2023-03-29T23:06:14Z

The tolerance for LocalEnergyMinimizer is a force tolerance, while it is an energy tolerance for FIRE. We probably need to relax the force tolerance considerably so that we don't max out the number of iterations. Can you try timing with something much less stringent, like 10 or 100 kJ/mol/nm?

mikemhenry · 2023-03-30T20:09:03Z

Another thing to consider is, how often do we run this minimizer? It is just at the start of the simulation, right? So adding 9 minutes to a 12-hour-long simulation isn't too bad if this will be more correct and robust--but we should first explore if any of the slowdown is something we can improve (ex John's comment #672 (comment))

jchodera · 2023-03-31T17:27:11Z

It's just at the beginning of the calculation, but (1) it makes for a terrible user experience if startup times are really long, (2) it makes testing really slow, and (3) it shouldn't take 9 minutes to get to the point where we can simulate something stably.

ijpulidos · 2023-04-03T18:36:59Z

Timings are as follows (initial and final energies are just for the first replica):

Minimizer	tolerance	time minimizing all reps.	initial E kT	Final E. kT
FIRE	1 kJ/mol/nm	32.290s	-306594.761	-309780.056
LocalEnergyMinimizer	1 kj/mol	489.617s	-306554.708	-326469.138
LocalEnergyMinimizer	10 kj/mol	126.885s	-313782.497	-330901.743
LocalEnergyMinimizer	100 kj/mol	3.570s	-313902.645	-314011.075

jchodera · 2023-04-03T18:43:15Z

Hm, this is more difficult since we're not starting from the same initial energy. Any idea what is causing the non-determinisim?

I think we want to compare the quick (<60 second) methods: FIRE and LocalEnergyMinimizer with a loose tolerance (e.g. 50 kJ/mol or so).

I'll look at whether I can fix the FIRE minimizer since it is really fast.

jchodera · 2023-04-03T18:44:00Z

@ijpulidos : Were you able to sort out whether we should or should not delete the Context at the end of minimization?

ijpulidos · 2023-04-03T18:47:45Z

@ijpulidos : Were you able to sort out whether we should or should not delete the Context at the end of minimization?

Deleting or not deleting does not seem to affect the outcome in terms of energies. And with regards to times I only see a small discrepancy. With deleting: 500s -- Without deleting: 520s. This could be just due to the load on my local workstation at the time, I don't think the difference is statistically significant.

ijpulidos · 2023-04-03T18:49:45Z

Hm, this is more difficult since we're not starting from the same initial energy. Any idea what is causing the non-determinisim?

Yeah, i also noticed that, the only thing I can tell so far is that the first two (which start from similar energies) were run on the same day and around the same time of the day. Whereas the other two were run at a different day but around the same time. Maybe some correlation with the seed? (This shouldn't happen though, right?)

jchodera · 2023-04-04T00:35:16Z

Deleting or not deleting does not seem to affect the outcome in terms of energies.

The deleting of Contexts is a different issue:

if we do not delete Context objects we will never re-use, we consume GPU resources and can end up with new simulations accidentally being run on the CPU
if we do delete a Context under the control of the context cache, we could potentially corrupt the ContextCache

We may want to make sure to create a new Context and clean it up, rather than use the ContextCache.

jfennick · 2023-08-09T02:01:52Z

This PR only 'supercedes' my PR #557 coincidentally, because L-BFGS does not (currently) modify the unit cell vectors, even if pressure is enabled. I still maintain that the proper fix is to simply temporarily disable the pressure during initial minimization, as I have done here f8dd5c7 That said, it has been >1 year without any movement on this issue and I have moved on to other projects. If you want to merge any of my PRs, now is the time; I will be deleting my fork soon.

IAlibay · 2023-11-30T17:33:42Z

@ijpulidos I think this switch would still be really useful for us. Is there anything we can help out with to get this to a merged state?

Replacing FireMinimizer with LocalEnergyMinimizer

88bd0bf

ijpulidos requested review from jchodera and mikemhenry March 28, 2023 22:33

jchodera requested changes Mar 29, 2023

View reviewed changes

Removing unnecessary import

0782749

ijpulidos changed the title ~~Replacing FireMinimizer with LocalEnergyMinimizer~~ Replacing Fire minimization with LocalEnergyMinimizer Mar 29, 2023

ijpulidos modified the milestones: 0.22.0, 0.22.1 Apr 4, 2023

shoudn't need to delete the context here

cc49fbc

This was referenced Apr 18, 2023

FIREMinimizationIntegrator fails on explicitly solvated system using CPU platform #686

Open

Cleanup contexts upon calculation completion, failure OpenFreeEnergy/openfe#354

Merged

ijpulidos removed this from the 0.22.1 milestone May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing Fire minimization with LocalEnergyMinimizer #672

Replacing Fire minimization with LocalEnergyMinimizer #672

ijpulidos commented Mar 28, 2023

ijpulidos commented Mar 28, 2023

jchodera left a comment

jchodera Mar 29, 2023

ijpulidos Mar 29, 2023

ijpulidos commented Mar 29, 2023

ijpulidos commented Mar 29, 2023

ijpulidos commented Mar 29, 2023

jchodera commented Mar 29, 2023

mikemhenry commented Mar 30, 2023

jchodera commented Mar 31, 2023

ijpulidos commented Apr 3, 2023

jchodera commented Apr 3, 2023

jchodera commented Apr 3, 2023

ijpulidos commented Apr 3, 2023

ijpulidos commented Apr 3, 2023

jchodera commented Apr 4, 2023

jfennick commented Aug 9, 2023

IAlibay commented Nov 30, 2023

Replacing Fire minimization with LocalEnergyMinimizer #672

Are you sure you want to change the base?

Replacing Fire minimization with LocalEnergyMinimizer #672

Conversation

ijpulidos commented Mar 28, 2023

Description

Todos

Status

Changelog message

ijpulidos commented Mar 28, 2023

jchodera left a comment

Choose a reason for hiding this comment

jchodera Mar 29, 2023

Choose a reason for hiding this comment

ijpulidos Mar 29, 2023

Choose a reason for hiding this comment

ijpulidos commented Mar 29, 2023

ijpulidos commented Mar 29, 2023

ijpulidos commented Mar 29, 2023

jchodera commented Mar 29, 2023

mikemhenry commented Mar 30, 2023

jchodera commented Mar 31, 2023

ijpulidos commented Apr 3, 2023

jchodera commented Apr 3, 2023

jchodera commented Apr 3, 2023

ijpulidos commented Apr 3, 2023

ijpulidos commented Apr 3, 2023

jchodera commented Apr 4, 2023

jfennick commented Aug 9, 2023

IAlibay commented Nov 30, 2023