experiments with Bayesian calibration of implicit solvent models
Our likelihood function depends on comparing ~600 calculated and experimental hydration free energies, which is computationally expensive and must be done at each sampling iteration.
Gradients of this likelihood are computed efficiently using Jax, and used to compare Langevin Monte Carlo with gradient descent.
A few Markov Chain Monte Carlo algorithms (implemented in samplers.py
) are applied to the task of sampling the continuous parameters of implicit solvent models.
One observation from this study has been that the tail behavior of the likelihood function comparing experimental and predicted free energies has a pronounced affect on the behavior of samplers.
Atom-typing schemes are represented using trees of SMIRKS patterns, implemented in the file typers.py
, along with uniform cross-model proposals that elaborate upon or delete the types within these schemes.
Scripts for numerical experiments with RJMC and various choices of prior, likelihood, within-model sampler, and constraints on the discrete model space are in bayes_implicit_solvent/rjmc_experiments/
.
Using Langevin Monte Carlo for within-model sampling, and enforcing that elemental types are retained (see the script tree_rjmc_w_elements.py
), we obtain the following result.
In ongoing work, we are attempting to define better-informed cross-model proposals and use more reasonable prior restraints to improve the chance of converging cross-model sampling. Priors and cross-model proposals that are informed by the number of atoms that fall into each type are being prototyped here informed_tree_proposals.py
.
As an alternative to assigning parameters using trees of SMIRKS, we also briefly considered assigning parameters using differentiable functions of SMIRKS features. This would allow uncertainty in the parameter-assignment scheme to be represented using a posterior distribution over continuous variables only, rather than a challenging mixed continuous/discrete space.
Linear functions of SMIRKS fingerprints to radii and scales (notebook)
Multilayer perceptron from SMIRKS fingerprints to radii, scales, and parameters controlling charge-hydration asymmetry (notebook)
(Convolutional typing schemes appeared more difficult to optimize numerically, but may be an interesting direction for future work (notebook))
gb_models/
-- Clones the OpenMM GBSA OBC force using autodiff frameworks such asjax
and HIPSautograd
, to allow differentiating w.r.t. per-particle parameters.molecule.py
-- Defines a classMolecule
that predicts solvation free energy as function of GB parameters and compares to an experimental value, for use in posterior sampling.prior_checking.py
-- methods for checking whether a typing scheme is legalsamplers.py
-- defines parameter samplers: random-walk Metropolis-Hastings, Langevin (unadjusted and Metropolis-Adjusted), RJMCsmarts.py
-- definitions of SMARTS primitives and decoratorssolvation_free_energy.py
-- functions for computing solvation free energy using GB modelstypers.py
-- defines the following classes:DiscreteProposal
,BondProposal
,AtomSpecificationProposal
,BondSpecificationProposal
,SMIRKSElaborationProposal
,SMARTSTyper
,FlatGBTyper
,GBTypingTree
, which hopefully encapsulate the bookkeeping needed to sample typing schemes using RJMCutils.py
-- un-filed utilities for: interacting with OpenEye, getting or applying GB parameters in OpenMM systems, caching substructure matchesconstants.py
-- temperature, unit conventions, etc.
(Currently contains some code that needs to be removed or refactored. proposals.py
defines the following classes: Proposal
, RadiusInheritanceProposal
, AddOrDeletePrimitiveAtEndOfList
, AddOrDeletePrimitiveAtRandomPositionInList
, SwapTwoPatterns
, MultiProposal
, which were used in initial experiments that did not use a tree representation of the typing scheme. prepare_freesolv.py
uses OpenEye to construct OEMol objects, assign partial charges, etc. starting from a list of SMILES strings.)
elemental_types_mala.py
-- Use Metropolis-adjusted Langevin to sample the radii and scales in the elemtnal-types-only modelhydrogen_or_not.py
-- Toy model containing just two "types" -- "hydrogen" vs "not hydrogen" so we can plot the parameter space in 2D for inspection. Attempt to fit GB radii using this restricted typing scheme on subsets of FreeSolv. Also check how the results depend on the number of configuration-samples used in the hydration free energy estimates.smirnoff_types.py
-- Use random-walk Metropolis-Hastings to sample GB radii for models restricted to use the same types defined in the nonbonded force section of smirnoff99frosst.
and many more to be documented further
informed_tree_proposals.py
-- Experiments with constructing guided discrete-model proposals, as well as with defining more effective priors for the discrete modelstree_rjmc_start_from_wildcard.py
-- Experiments running RJMC on GB typing trees starting from wildcard type and building up from there.tree_rjmc_w_elements.py
-- Experiments running RJMC on GB typing trees, keeping elemental types as un-delete-able nodes.
See its readme: contains freesolv and some numerical results in pickle or numpy archives.
Out-dated -- initial experiments where types were introduced by truncating the smirnoff nonbonded definitions
test_bayes_implicit_solvent.py
test_rjmc.py
-- unit tests and integration tests that RJMC on typing trees is correct
Scripts to generate configuration samples of FreeSolv set in vacuum, for use in reweighting.
Copied from MolSSI's cookiecutter-compchem
. Requirements listed in devtools/conda-recipe/meta.yaml
.
To-do
Exploratory or visualization-focused notebooks.
- animated GIFs of initial slow typing-tree sampling code (also affected by a bug that was later corrected, where the charges for some molecules were drastically affected incorrectly prepared). The number of types sampled increased much more than expected, and the sampler became slower the more types were present. (notebook)
- animated GIF of early tree-RJMC run
- diagnostic plots for some slow torsional degrees of freedom involving carboxylic acids, encountered when preparing vacuum samples for use in reweighting-based likelihood estimator
- diagnostic tICA projections of gas-phase simulations
- baseline of using Nelder-Mead simplex minimization rather than gradient-informed optimization or sampling (notebook)
- illustrative example of using RJMC to sample Gaussian mixture models (notebook)
- animated GIFs inspecting a long run of tree RJMC, plotting either raw RMSE, RMSE of running-median prediction, or uncertainty-weighted RMSE (notebook)
TODO in this section: describe freesolvbest-case-rmse.png
, AIS results, AIS vs RJMC diagnostic tests