Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] SAMS samplers and documentation #214

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

[WIP] SAMS samplers and documentation #214

wants to merge 9 commits into from

Conversation

jchodera
Copy link
Member

@jchodera jchodera commented Jun 5, 2017

Supercedes #203.

This is an initial import of the SAMS sampler infrastructure.

Docs preview can be found here.

Remaining tasks:

  • document current storage layer
  • add rudimentary free energy analysis via MBAR

We will switch the storage layer to be more similar to repex in a subsequent PR.

@jchodera
Copy link
Member Author

jchodera commented Jun 5, 2017

@andrrizzi : I'm running into an issue I don't quite understand here that involves the new ThermodynamicState:

lski1962:openmmtools choderaj$ nosetests -vv --nocapture openmmtools/samplers/sams/testsystems.py 
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
Failure: ValueError (<openmmtools.states.ThermodynamicState object at 0x10ee62898> is not in list) ... ERROR

======================================================================
ERROR: Failure: ValueError (<openmmtools.states.ThermodynamicState object at 0x10ee62898> is not in list)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/nose/loader.py", line 252, in generate
    for test in g():
  File "/Users/choderaj/github/choderalab/openmmtools/openmmtools/openmmtools/samplers/sams/testsystems.py", line 582, in test_testsystems
    test = testsystem()
  File "/Users/choderaj/github/choderalab/openmmtools/openmmtools/openmmtools/samplers/sams/testsystems.py", line 170, in __init__
    self.exen_sampler = ExpandedEnsemble(self.mcmc_sampler, self.thermodynamic_states)
  File "/Users/choderaj/github/choderalab/openmmtools/openmmtools/openmmtools/samplers/sams/expanded_ensemble.py", line 133, in __init__
    self.thermodynamic_state_index = thermodynamic_states.index(sampler.thermodynamic_state)
ValueError: <openmmtools.states.ThermodynamicState object at 0x10ee62898> is not in list

----------------------------------------------------------------------

What is happening here is that MCMCSampler makes a deep copy of ThermodynamicState, so I can no longer identify the current thermodynamic state via

        self.thermodynamic_state_index = thermodynamic_states.index(sampler.thermodynamic_state)

Any ideas here? Does MCMCSampler need to make a deep copy? Can we check if two thermodynamic states are the same in a loop instead?

@andrrizzi
Copy link
Contributor

Does MCMCSampler need to make a deep copy?

I don't think that's necessary for now. It's just a precautionary thing in case the thermodynamic_state becomes private in the future. Feel free remove the deepcopy if it's causing problems! repex isn't using it, so there should be no repercussions on YANK.

@jchodera
Copy link
Member Author

jchodera commented Jun 5, 2017

@andrrizzi : I'm trying to reconcile the sams.MCMCSampler and openmmtools.mcmc.MCMCSampler approaches to timing.

The sams version stores a ._timing dict that allows the timing statistics to be written to disk later:
https://github.com/choderalab/sams/blob/master/sams/samplers.py#L1109

The openmmtools version uses the Timer class, which seems to only support writing to the logger:
https://github.com/choderalab/openmmtools/blob/master/openmmtools/mcmc.py#L274-L292

What do you think our approach should be here?

@jchodera
Copy link
Member Author

jchodera commented Jun 5, 2017

Also, what should we call the method that updates the sampler chain by one Monte Carlo step? Should this be apply(), as in MCMCMove; only run(), as in MCMCSampler, or update(), as in sams?

@andrrizzi
Copy link
Contributor

What do you think our approach should be here?

The Timer object is essentially a dict that compute elapsed times automatically, so I think one way can be converted to the other very easily. I'd try to use the Timer if it doesn't complicate things, but it's just a helper class, so I don't have a strong preference.

apply(), as in MCMCMove; only run(), as in MCMCSampler, or update(), as in sams

I would reserve apply for the moves. I'm not sure what update in sams does actually, but in mcmc.MCMCSampler the internal states are always updated when you call run() so it may be redundant.

@jchodera
Copy link
Member Author

jchodera commented Jun 5, 2017

The Timer object is essentially a dict that compute elapsed times automatically, so I think one way can be converted to the other very easily. I'd try to use the Timer if it doesn't complicate things, but it's just a helper class, so I don't have a strong preference.

OK, I'll just do the minimal amount of refactoring to have MCMCSampler store a Timer object as ._timer to the higher-level samplers can make use of this.

I would reserve apply for the moves. I'm not sure what update in sams does actually, but in mcmc.MCMCSampler the internal states are always updated when you call run() so it may be redundant.

update() runs a single iteration, run() can run multiple iterations. This was to allow the logic for a single iteration to be encapsulated in a single method.

@andrrizzi
Copy link
Contributor

I'll just do the minimal amount of refactoring

Ok. I think it's all internal anyway, so we can easily postpone it to later modifications if you have higher priority things to do.

update() runs a single iteration

Ah, I see, thanks. Both sounds good then.

@jchodera
Copy link
Member Author

jchodera commented Jun 5, 2017

OK, I'll just do the minimal amount of refactoring to have MCMCSampler store a Timer object as ._timer to the higher-level samplers can make use of this.

Is it OK if, instead of creating/destroying a new Timer object in each method, I have an optional Timer object passed to each MCMCMove apply() call so that we can aggregate timing data across calls within an iteration?

@andrrizzi
Copy link
Contributor

Hmm. That'd be a good feature to have, but I'd avoid making changes to the API of MCMCMoves just to accomodate debugging features if possible. What about just use the Timer outside the apply call? Something like

for move in mcmc_moves:
    timer.start(move.__class__.__name__)
    move.apply(...)
    timer.stop(...)

@andrrizzi
Copy link
Contributor

Otherwise we can have a centralized and more sophisticated timing utility, but I'd still try to keep it outside the API of the single objects.

@jchodera
Copy link
Member Author

jchodera commented Aug 2, 2017

I'm revisiting what needs to be done here to finish up this PR.
Most of what I wanted to do is implemented, but it would be useful to talk about how we wanted to handle storage. The current implementation uses NetCDF throughout, but we probably want to introduce a layer of abstraction between the simulation and the storage. We have several choices

  • Keep the current NetCDF integration, which would make it harder to swap out a different storage layer later
  • Go straight to a database as the storage layer
  • Use MDTraj plus something else (since MDTraj doesn't allow arbitrary variables to be attached to trajectory data)
  • Use a more general abstraction layer like what @Lnaden implemented in New storage system with auto-magical variable handling #135
  • Use something else entirely

@jchodera
Copy link
Member Author

jchodera commented Aug 2, 2017

This question is relevant to adding optional storage backing to the MCMCSampler as well.

Standard could involve creating a stack of samplers that can each bind to storage:

mcmc_sampler = MCMCSampler(thermodynamic_state, sampler_state, move=mcmc_move, storage=storage)
exen_sampler = ExpandedEnsemble(mcmc_sampler, thermodynamic_states, storage=storage)
sams_sampler = SAMS(exen_sampler, storage=storage)

Alternatively, we could detect if the lowest-level sampler has a storage layer bound and use that if so:

# Use storage defined in MCMCSampler for all layers
mcmc_sampler = MCMCSampler(thermodynamic_state, sampler_state, move=mcmc_move, storage=storage)
exen_sampler = ExpandedEnsemble(mcmc_sampler, thermodynamic_states)
sams_sampler = SAMS(exen_sampler)

@Lnaden
Copy link
Contributor

Lnaden commented Aug 3, 2017

For storage: We'll merge in the storage PR I have open (after I fix its clashes)
Fix the tests here
Possibly move to new type of storage beyond netcdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants