Adding Simulation and MultistateSimulation reporter #18

wiederm · 2024-01-10T14:23:30Z

Description

For a given MCMC sequence we want to obtain statistics, snapshots (set of specified indices) and log properties (e.g. density) for each MCMC sampler. The multistate sampler needs its reporter to provide the potential energies and state indices to calculate free energies. Additionally, the multistate sampler needs to save regular checkpoint files from which it is possible to restart the MCMC chain.

Note: This PR also contains the improved PRNG implementation. Now, the random number stream is a broadcaster in the sampler state of each MCMC move. This was necessary to ensure that random number streams were separated, which became an issue during this PR since trajectories were synchronizing.

Todos

BaseReporter class
LangevinReporter
MultistateReporter

Status

Ready to go

…timation

…ationReporter

…intainability

…tMove constructors

chrisiacovella

A few things about how we are implementing the RNG in sampler_state and I think we need to only be opening the HDF5 when writing data (and then closing when done) to avoid corrupt data and allow examination during a run

chrisiacovella · 2024-01-16T16:39:36Z

chiron/utils.py

+        cls._key = random.PRNGKey(seed)
+
+    @classmethod
+    def get_random_key(cls) -> int:


If we want to use this as a wrapper, this should also probably take in an optional number of keys to split.

But bigger question: do we want to have this separate wrapper or do we want to just keep this functionality within the SamplerState class?

I agree that there might be a better way to do this. I wanted to have a solution that (a) requires only a single random seed that the user provides for any number of SamplerState and (b) passes the responsibility for the PRN stream to the SamplerState (which then can live on different machines). This lets us reproduce (if so desired) the random numbers consistently, but also ensures that each SamplerState has an unique PRN stream.

chrisiacovella · 2024-01-16T17:02:16Z

chiron/states.py

@@ -22,6 +23,7 @@ class SamplerState:
    def __init__(
        self,
        x0: unit.Quantity,
+        random_seed: random.PRNGKey,


random.PRNGKey is the current state of the sampler...hence I don't think random_seed is the current name. This should probably be called current_key or current_PRNG_key.

I think the sampler state init function should accept two things:

random_seed that would be used to generate the first key.

current_key which would just set self._current_key.
where if current_key is defined, we will ignore the random_seed if set, or throw an error.

According to https://jax.readthedocs.io/en/latest/jax-101/05-random-numbers.html, seed and key are synonyms :-)
But, I agree that it is clearer if we call it current_PRNG_key.

I would prefer to only pass current_PRNG_key and use it to generate the first key of the new stream. Is there a use case in which we would like to manually set the current_key?

Yes I agree they are interchangeable in jax, but I think confusion could come up from how people normally use seed in a stateless RNG (i.e., seed setting up your initial state, not something called all the time). I think calling it current_PRNG_key is fine.

The only use case I can think of where we'd want to reset this manually during time would be for restarting a workflow, but even then it would be dependent on the order in which things are read in and initialize (like, the code could execute the same script as before but automatically look in the cached directory for a restart file, in which case the internal classes would already be initialized, but just need their current states and stats updated). I'm not sure it is necessary right now but it is easy to add that code in depending on how we implement other things.

chiron/reporters.py

chiron/integrators.py

chiron/mcmc.py

chrisiacovella · 2024-01-16T17:26:08Z

chiron/mcmc.py

@@ -420,7 +411,7 @@ def apply(
            log.debug(
                f"Move accepted. Energy change: {delta_energy:.3f} kT. Number of accepted moves: {self.n_accepted}."
            )
-            reporter.report(
+            self.reporter.report(


Add some output frequency logic here.

Done! Even though I think we want to log every successful MCMove, so the report_frequency should be 1

I'm sure there are cases where we would not want to log every move (certainly not in a lot of the simple test cases), but setting default to be 1 is probably reasonable

chrisiacovella · 2024-01-16T17:27:55Z

chiron/mcmc.py

-            self.apply(
-                thermodynamic_state, sampler_state, self.simulation_reporter, nbr_list
-            )
+            self.apply(thermodynamic_state, sampler_state, nbr_list)
            if trials % 100 == 0:


I think this is fine for now, because the MC moves are getting a bit refactoring in the #14 PR, but this will cause a bit of "conflict" as the "apply" function is also writing to the reporter, but at a different frequency, and this is writing out duplicate and potentially less useful information than in apply.

chiron/reporters.py

chrisiacovella · 2024-01-16T17:38:34Z

chiron/reporters.py

        self.buffer = {}
-        self.h5file = h5py.File(filename, "a")
-        log.info(f"Writing simulation data to {filename}")
+        self.h5file = h5py.File(self.file_path, "a")


We might want to change this to be where the file is only opened when writing to it. HDF5 files can be rather grumpy. If we kill the simulation and the file is still open, it may not be readable (give an error trying to open it). Also, unlike a text file you can't even make a temporary copy of an open HDF5 file to then examine the temporary one (also throws an error being open)

That is a good point! I have switched to context manager for the read-and-write operations.

I'm not seeing that change to the context manager in the code, did you forget to commit that change?

…tems.py, chiron/tests/test_multistate.py, chiron/tests/test_potential.py, Examples/LJ_langevin.py, chiron/tests/test_integrators.py, chiron/tests/test_convergence_tests.py, chiron/tests/test_minization.py, chiron/multistate.py, chiron/states.py, chiron/tests/test_pairs.py

chrisiacovella

I think this looks good and can be merged.

wiederm added 7 commits January 9, 2024 18:28

Add MBAREstimator class and MultistateReporter class

c3bd752

Add MBAREstimator class for performing mbar analysis

84a85be

Remove online analysis and add offline estimator

6ce2d70

Fix initialization of MBAREstimator and update offline free energy es…

1006be5

…timation

Add MBAR class and update free energy estimators

0d7c944

Refactor MBAREstimator class in analysis.py

f3340a3

Update reporters in integrators and mcmc modules

060947a

wiederm changed the base branch from main to multistage January 10, 2024 14:23

wiederm marked this pull request as draft January 10, 2024 14:24

wiederm added 21 commits January 11, 2024 10:13

Update LangevinIntegrator class in integrators.py

4a42807

Refactor LangevinDynamicsReporter constructor signature

faa027e

Remove save_traj_in_memory parameter from LangevinIntegrator constructor

36507ab

Add MCReporter class for Monte Carlo simulations

2128015

Update MCMCMove and LangevinDynamicsMove constructors

bf1d3c9

Update chiron.reporters import

46c116f

Fix MBAREstimator initialization in MultiStateSampler

cdc664c

Refactor reporters and tests

5f760c5

Add new reporters and update tests

398c869

Add MultistateReporter class and update MultiStateSampler constructor

083ace2

Refactor reporters in test files

6c83aed

Add MultistateReporter to MultiStateSampler

46e4cc3

Refactor energy reporting and add debug logs

0f61230

Wrap and rebuild neighborlist in LangevinIntegrator

e472867

Refactor MultiStateSampler and reset reporter file

b3b1ca1

Refactor code to transpose u_kn array in MultiStateSampler and _Simul…

d53fbc2

…ationReporter

Refactor _SimulationReporter class to improve code readability and ma…

b02317c

…intainability

Remove unused variable 'a' in MultiStateSampler class

918395c

Refactor position reporting in MultiStateSampler

4c6993c

Add save_traj_in_memory flag to LangevinIntegrator

9ebac30

Remove unused seed variable and update LangevinDynamicsMove constructor

4eb4599

wiederm added 3 commits January 14, 2024 23:40

Refactor code and add random seed functionality

7d593eb

Remove debug print statements and logging

eef82d1

Remove seed parameter from MetropolizedMove and MetropolisDisplacemen…

8152b9f

…tMove constructors

wiederm marked this pull request as ready for review January 15, 2024 07:56

wiederm requested a review from chrisiacovella January 15, 2024 07:56

wiederm self-assigned this Jan 15, 2024

wiederm added the enhancement New feature or request label Jan 15, 2024

Add PRNG seed initialization to test files

f6e2bae

chrisiacovella requested changes Jan 16, 2024

View reviewed changes

wiederm mentioned this pull request Jan 17, 2024

Consistent treatment of frequency reporters #19

Open

wiederm added 6 commits January 17, 2024 18:38

Add PRNG seed initialization to test files

5455143

Merge branch 'rep' of github.com:choderalab/chiron into rep

bd3cc7c

refactoring

79634ff

Refactor reporter logging in integrators and mcmc

f203aca

Refactor MultiStateSampler and reporters

20fabb6

chrisiacovella approved these changes Jan 18, 2024

View reviewed changes

wiederm added 3 commits January 18, 2024 08:28

Add PRNG class with seed functionality

337eec1

Fix LangevinIntegrator bug and update test_multistate.py

99ee26b

Fix reporter visibility and add test for multistate reporter

ebc27d3

wiederm merged commit ab0d114 into multistage Jan 18, 2024

wiederm mentioned this pull request Jan 18, 2024

Multistate sampling #8

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Simulation and MultistateSimulation reporter #18

Adding Simulation and MultistateSimulation reporter #18

wiederm commented Jan 10, 2024 •

edited

Loading

chrisiacovella left a comment

chrisiacovella Jan 16, 2024

chrisiacovella Jan 16, 2024

wiederm Jan 17, 2024

chrisiacovella Jan 16, 2024

wiederm Jan 17, 2024

chrisiacovella Jan 17, 2024

chrisiacovella Jan 16, 2024

wiederm Jan 17, 2024

chrisiacovella Jan 17, 2024

chrisiacovella Jan 16, 2024

chrisiacovella Jan 16, 2024

wiederm Jan 17, 2024

chrisiacovella Jan 17, 2024

chrisiacovella left a comment

Adding Simulation and MultistateSimulation reporter #18

Adding Simulation and MultistateSimulation reporter #18

Conversation

wiederm commented Jan 10, 2024 • edited Loading

Description

Todos

Status

chrisiacovella left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisiacovella left a comment

Choose a reason for hiding this comment

wiederm commented Jan 10, 2024 •

edited

Loading