bayes-implicit-solvent

experiments with Bayesian calibration of implicit solvent models

Highlights

Colab notebook illustrating continuous parameter sampling

Our likelihood function depends on comparing ~600 calculated and experimental hydration free energies, which is computationally expensive and must be done at each sampling iteration.

Gradients of this likelihood are computed efficiently using Jax, and used to compare Langevin Monte Carlo with gradient descent.

Demonstration of automatic parameterization

A few Markov Chain Monte Carlo algorithms (implemented in samplers.py) are applied to the task of sampling the continuous parameters of implicit solvent models.

Comparisons of Gaussian and Student-t likelihood behavior

One observation from this study has been that the tail behavior of the likelihood function comparing experimental and predicted free energies has a pronounced affect on the behavior of samplers.

RJMC experiments

Atom-typing schemes are represented using trees of SMIRKS patterns, implemented in the file typers.py, along with uniform cross-model proposals that elaborate upon or delete the types within these schemes.

Scripts for numerical experiments with RJMC and various choices of prior, likelihood, within-model sampler, and constraints on the discrete model space are in bayes_implicit_solvent/rjmc_experiments/.

Using Langevin Monte Carlo for within-model sampling, and enforcing that elemental types are retained (see the script tree_rjmc_w_elements.py), we obtain the following result.

In ongoing work, we are attempting to define better-informed cross-model proposals and use more reasonable prior restraints to improve the chance of converging cross-model sampling. Priors and cross-model proposals that are informed by the number of atoms that fall into each type are being prototyped here informed_tree_proposals.py.

Differentiable atom-typing experiments

As an alternative to assigning parameters using trees of SMIRKS, we also briefly considered assigning parameters using differentiable functions of SMIRKS features. This would allow uncertainty in the parameter-assignment scheme to be represented using a posterior distribution over continuous variables only, rather than a challenging mixed continuous/discrete space.

Linear functions of SMIRKS fingerprints to radii and scales (notebook)

Multilayer perceptron from SMIRKS fingerprints to radii, scales, and parameters controlling charge-hydration asymmetry (notebook)

(Convolutional typing schemes appeared more difficult to optimize numerically, but may be an interesting direction for future work (notebook))

Detailed contents

`bayes_implicit_solvent`

gb_models/ -- Clones the OpenMM GBSA OBC force using autodiff frameworks such as jax and HIPS autograd, to allow differentiating w.r.t. per-particle parameters.
molecule.py -- Defines a class Molecule that predicts solvation free energy as function of GB parameters and compares to an experimental value, for use in posterior sampling.
prior_checking.py -- methods for checking whether a typing scheme is legal
samplers.py -- defines parameter samplers: random-walk Metropolis-Hastings, Langevin (unadjusted and Metropolis-Adjusted), RJMC
smarts.py -- definitions of SMARTS primitives and decorators
solvation_free_energy.py -- functions for computing solvation free energy using GB models
typers.py -- defines the following classes: DiscreteProposal, BondProposal, AtomSpecificationProposal, BondSpecificationProposal, SMIRKSElaborationProposal, SMARTSTyper, FlatGBTyper, GBTypingTree, which hopefully encapsulate the bookkeeping needed to sample typing schemes using RJMC
utils.py -- un-filed utilities for: interacting with OpenEye, getting or applying GB parameters in OpenMM systems, caching substructure matches
constants.py -- temperature, unit conventions, etc.

(Currently contains some code that needs to be removed or refactored. proposals.py defines the following classes: Proposal, RadiusInheritanceProposal, AddOrDeletePrimitiveAtEndOfList, AddOrDeletePrimitiveAtRandomPositionInList, SwapTwoPatterns, MultiProposal, which were used in initial experiments that did not use a tree representation of the typing scheme. prepare_freesolv.py uses OpenEye to construct OEMol objects, assign partial charges, etc. starting from a list of SMILES strings.)

`bayes_implicit_solvent/continuous-parameter-experiments/`

elemental_types_mala.py -- Use Metropolis-adjusted Langevin to sample the radii and scales in the elemtnal-types-only model
hydrogen_or_not.py -- Toy model containing just two "types" -- "hydrogen" vs "not hydrogen" so we can plot the parameter space in 2D for inspection. Attempt to fit GB radii using this restricted typing scheme on subsets of FreeSolv. Also check how the results depend on the number of configuration-samples used in the hydration free energy estimates.
smirnoff_types.py -- Use random-walk Metropolis-Hastings to sample GB radii for models restricted to use the same types defined in the nonbonded force section of smirnoff99frosst.

and many more to be documented further

`bayes_implicit_solvent/rjmc_experiments/`

informed_tree_proposals.py -- Experiments with constructing guided discrete-model proposals, as well as with defining more effective priors for the discrete models
tree_rjmc_start_from_wildcard.py -- Experiments running RJMC on GB typing trees starting from wildcard type and building up from there.
tree_rjmc_w_elements.py -- Experiments running RJMC on GB typing trees, keeping elemental types as un-delete-able nodes.

`bayes_implicit_solvent/data`

See its readme: contains freesolv and some numerical results in pickle or numpy archives.

`bayes_implicit_solvent/hierarchical_typing`

Out-dated -- initial experiments where types were introduced by truncating the smirnoff nonbonded definitions

`bayes_implicit_solvent/tests`

test_bayes_implicit_solvent.py
test_rjmc.py -- unit tests and integration tests that RJMC on typing trees is correct

`bayes_implicit_solvent/vacuum_samples`

Scripts to generate configuration samples of FreeSolv set in vacuum, for use in reweighting.

`devtools`

Copied from MolSSI's cookiecutter-compchem. Requirements listed in devtools/conda-recipe/meta.yaml.

`docs`

To-do

`notebooks`

Exploratory or visualization-focused notebooks.

`elaborate_typing_animation/`

animated GIFs of initial slow typing-tree sampling code (also affected by a bug that was later corrected, where the charges for some molecules were drastically affected incorrectly prepared). The number of types sampled increased much more than expected, and the sampler became slower the more types were present. (notebook)

`bugfixed_typing_animation/`

animated GIF of early tree-RJMC run

`extended-sim-projections/`

projections and of slow-relaxing torsions in some molecules from FreeSolv

`carboxyl-torsion-plots/`

diagnostic plots for some slow torsional degrees of freedom involving carboxylic acids, encountered when preparing vacuum samples for use in reweighting-based likelihood estimator

`projections/`

diagnostic tICA projections of gas-phase simulations

`nelder_mead_plots/`

baseline of using Nelder-Mead simplex minimization rather than gradient-informed optimization or sampling (notebook)

`rjmc-figures/`

illustrative example of using RJMC to sample Gaussian mixture models (notebook)

`rjmc_animation_march30/`, `rjmc_animation_march30_running/`, `rjmc_animation_march30_weighted/`

animated GIFs inspecting a long run of tree RJMC, plotting either raw RMSE, RMSE of running-median prediction, or uncertainty-weighted RMSE (notebook)

TODO in this section: describe freesolvbest-case-rmse.png, AIS results, AIS vs RJMC diagnostic tests

Name		Name	Last commit message	Last commit date
Latest commit History 289 Commits
bayes_implicit_solvent		bayes_implicit_solvent
devtools		devtools
docs		docs
notebooks		notebooks
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.lgtm.yml		.lgtm.yml
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
setup.cfg		setup.cfg
setup.py		setup.py
versioneer.py		versioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

bayes-implicit-solvent

Highlights

Colab notebook illustrating continuous parameter sampling

Demonstration of automatic parameterization