[WIP] SAMS samplers and documentation #214

jchodera · 2017-06-05T14:15:14Z

Supercedes #203.

This is an initial import of the SAMS sampler infrastructure.

Docs preview can be found here.

Remaining tasks:

document current storage layer
add rudimentary free energy analysis via MBAR

We will switch the storage layer to be more similar to repex in a subsequent PR.

jchodera · 2017-06-05T15:39:44Z

@andrrizzi : I'm running into an issue I don't quite understand here that involves the new ThermodynamicState:

lski1962:openmmtools choderaj$ nosetests -vv --nocapture openmmtools/samplers/sams/testsystems.py 
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
Failure: ValueError (<openmmtools.states.ThermodynamicState object at 0x10ee62898> is not in list) ... ERROR

======================================================================
ERROR: Failure: ValueError (<openmmtools.states.ThermodynamicState object at 0x10ee62898> is not in list)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/nose/loader.py", line 252, in generate
    for test in g():
  File "/Users/choderaj/github/choderalab/openmmtools/openmmtools/openmmtools/samplers/sams/testsystems.py", line 582, in test_testsystems
    test = testsystem()
  File "/Users/choderaj/github/choderalab/openmmtools/openmmtools/openmmtools/samplers/sams/testsystems.py", line 170, in __init__
    self.exen_sampler = ExpandedEnsemble(self.mcmc_sampler, self.thermodynamic_states)
  File "/Users/choderaj/github/choderalab/openmmtools/openmmtools/openmmtools/samplers/sams/expanded_ensemble.py", line 133, in __init__
    self.thermodynamic_state_index = thermodynamic_states.index(sampler.thermodynamic_state)
ValueError: <openmmtools.states.ThermodynamicState object at 0x10ee62898> is not in list

----------------------------------------------------------------------

What is happening here is that MCMCSampler makes a deep copy of ThermodynamicState, so I can no longer identify the current thermodynamic state via

        self.thermodynamic_state_index = thermodynamic_states.index(sampler.thermodynamic_state)

Any ideas here? Does MCMCSampler need to make a deep copy? Can we check if two thermodynamic states are the same in a loop instead?

andrrizzi · 2017-06-05T15:44:47Z

Does MCMCSampler need to make a deep copy?

I don't think that's necessary for now. It's just a precautionary thing in case the thermodynamic_state becomes private in the future. Feel free remove the deepcopy if it's causing problems! repex isn't using it, so there should be no repercussions on YANK.

jchodera · 2017-06-05T17:48:05Z

@andrrizzi : I'm trying to reconcile the sams.MCMCSampler and openmmtools.mcmc.MCMCSampler approaches to timing.

The sams version stores a ._timing dict that allows the timing statistics to be written to disk later:
https://github.com/choderalab/sams/blob/master/sams/samplers.py#L1109

The openmmtools version uses the Timer class, which seems to only support writing to the logger:
https://github.com/choderalab/openmmtools/blob/master/openmmtools/mcmc.py#L274-L292

What do you think our approach should be here?

jchodera · 2017-06-05T17:56:42Z

Also, what should we call the method that updates the sampler chain by one Monte Carlo step? Should this be apply(), as in MCMCMove; only run(), as in MCMCSampler, or update(), as in sams?

…mpler

andrrizzi · 2017-06-05T18:11:32Z

What do you think our approach should be here?

The Timer object is essentially a dict that compute elapsed times automatically, so I think one way can be converted to the other very easily. I'd try to use the Timer if it doesn't complicate things, but it's just a helper class, so I don't have a strong preference.

apply(), as in MCMCMove; only run(), as in MCMCSampler, or update(), as in sams

I would reserve apply for the moves. I'm not sure what update in sams does actually, but in mcmc.MCMCSampler the internal states are always updated when you call run() so it may be redundant.

jchodera · 2017-06-05T18:15:23Z

The Timer object is essentially a dict that compute elapsed times automatically, so I think one way can be converted to the other very easily. I'd try to use the Timer if it doesn't complicate things, but it's just a helper class, so I don't have a strong preference.

OK, I'll just do the minimal amount of refactoring to have MCMCSampler store a Timer object as ._timer to the higher-level samplers can make use of this.

I would reserve apply for the moves. I'm not sure what update in sams does actually, but in mcmc.MCMCSampler the internal states are always updated when you call run() so it may be redundant.

update() runs a single iteration, run() can run multiple iterations. This was to allow the logic for a single iteration to be encapsulated in a single method.

andrrizzi · 2017-06-05T18:23:27Z

I'll just do the minimal amount of refactoring

Ok. I think it's all internal anyway, so we can easily postpone it to later modifications if you have higher priority things to do.

update() runs a single iteration

Ah, I see, thanks. Both sounds good then.

jchodera · 2017-06-05T18:44:16Z

OK, I'll just do the minimal amount of refactoring to have MCMCSampler store a Timer object as ._timer to the higher-level samplers can make use of this.

Is it OK if, instead of creating/destroying a new Timer object in each method, I have an optional Timer object passed to each MCMCMove apply() call so that we can aggregate timing data across calls within an iteration?

andrrizzi · 2017-06-05T18:56:40Z

Hmm. That'd be a good feature to have, but I'd avoid making changes to the API of MCMCMoves just to accomodate debugging features if possible. What about just use the Timer outside the apply call? Something like

for move in mcmc_moves:
    timer.start(move.__class__.__name__)
    move.apply(...)
    timer.stop(...)

andrrizzi · 2017-06-05T19:02:10Z

Otherwise we can have a centralized and more sophisticated timing utility, but I'd still try to keep it outside the API of the single objects.

jchodera · 2017-08-02T04:00:18Z

I'm revisiting what needs to be done here to finish up this PR.
Most of what I wanted to do is implemented, but it would be useful to talk about how we wanted to handle storage. The current implementation uses NetCDF throughout, but we probably want to introduce a layer of abstraction between the simulation and the storage. We have several choices

Keep the current NetCDF integration, which would make it harder to swap out a different storage layer later
Go straight to a database as the storage layer
Use MDTraj plus something else (since MDTraj doesn't allow arbitrary variables to be attached to trajectory data)
Use a more general abstraction layer like what @Lnaden implemented in New storage system with auto-magical variable handling #135
Use something else entirely

jchodera · 2017-08-02T04:08:31Z

This question is relevant to adding optional storage backing to the MCMCSampler as well.

Standard could involve creating a stack of samplers that can each bind to storage:

mcmc_sampler = MCMCSampler(thermodynamic_state, sampler_state, move=mcmc_move, storage=storage)
exen_sampler = ExpandedEnsemble(mcmc_sampler, thermodynamic_states, storage=storage)
sams_sampler = SAMS(exen_sampler, storage=storage)

Alternatively, we could detect if the lowest-level sampler has a storage layer bound and use that if so:

# Use storage defined in MCMCSampler for all layers
mcmc_sampler = MCMCSampler(thermodynamic_state, sampler_state, move=mcmc_move, storage=storage)
exen_sampler = ExpandedEnsemble(mcmc_sampler, thermodynamic_states)
sams_sampler = SAMS(exen_sampler)

Lnaden · 2017-08-03T21:00:42Z

For storage: We'll merge in the storage PR I have open (after I fix its clashes)
Fix the tests here
Possibly move to new type of storage beyond netcdf

jchodera added 4 commits June 5, 2017 10:03

Initial import of SAMS samplers and documentation

df02715

Fix typo in netcdf4 conda package requirements

7fba8d3

Add analysis module temporarily

f0ee1e6

Rework MCMCSampler usage to match openmmtools.mcmc

e366db5

jchodera added 2 commits June 5, 2017 13:58

Don't make a deep copy of ThermodynamicState when constructing MCMCSa…

10e2eca

…mpler

Comment out timing and ncfile use until interface is replaced

070a894

jchodera added 3 commits June 9, 2017 09:15

Draft storage and timing infrastructure.

5df3dd5

Update docs

f691f53

Merge remote-tracking branch 'origin/master' into sams

f77417b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] SAMS samplers and documentation #214

[WIP] SAMS samplers and documentation #214

jchodera commented Jun 5, 2017

jchodera commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

jchodera commented Jun 5, 2017

jchodera commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

jchodera commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

jchodera commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

jchodera commented Aug 2, 2017

jchodera commented Aug 2, 2017

Lnaden commented Aug 3, 2017

[WIP] SAMS samplers and documentation #214

Are you sure you want to change the base?

[WIP] SAMS samplers and documentation #214

Conversation

jchodera commented Jun 5, 2017

jchodera commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

jchodera commented Jun 5, 2017

jchodera commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

jchodera commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

jchodera commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

andrrizzi commented Jun 5, 2017

jchodera commented Aug 2, 2017

jchodera commented Aug 2, 2017

Lnaden commented Aug 3, 2017