Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to preserve information that isn't stored in coordinate/topology files between nodes #112

Open
jmichel80 opened this issue Aug 13, 2019 · 7 comments
Assignees
Labels

Comments

@jmichel80
Copy link
Contributor

jmichel80 commented Aug 13, 2019

On the latest devel version BioSimSpace-2019.1.0+297.gd5cc5b7
the attached BSS script fails on this pair of input files with this error message


~/biosimspace.app/bin/python custom_prepareFEP.py --input1 input/THROM\:throm_lig6a/* --input2 input/THROM\:throm_lig6b/* --output 6a_6b

Traceback (most recent call last):
  File "/export/users/julien/biosimspace.app/lib/python3.7/site-packages/BioSimSpace-2019.1.0+297.gd5cc5b7-py3.7.egg/BioSimSpace/FreeEnergy/_binding.py", line 117, in __init__
UserWarning: Exception 'SireBase::missing_property' thrown by the thread 'master'.
None of the contained forcefields have a property called water_model. Available properties are [ fileformat, space, time ].

This could be because BSS.Solvent was not used in this script to solvate system1 as it was already loaded solvated.

Inserting

from Sire import Base as _SireBase
system1._sire_object.setProperty("water_model", _SireBase.wrap("tip3p") )

Just before instantiating freerng solves the issue

I have uploaded a zip file with script and inputs to reproduce the error.

This points to an issue with the API to BSS.FreeEnergy . The constructor expects a merged molecule, but if that merged molecule hasn’t been obtained by loading solvated prm7/rst7 files in the script (rather than using solvate within the script) the system will miss properties required for the initialisation of the free energy object.

This means we cannot currently modularise the binding free energy pipeline by chaining different BSS nodes.

One solution would be to preserve more setup information in the system object (perhaps by saving pickles?), or to be more clever at guessing missing information (thought I could be difficult to guess what exact solvent model was used), or a combination of that.

custFEP.zip

  • Operating System: Linux
  • Installation method: latest BSS devel binary
@lohedges
Copy link
Member

lohedges commented Aug 13, 2019

Thanks, @jmichel80. You are correct, we currently need to know the water topology in order to re-solvate the system for the free leg of a binding free energy simulation. I have code in SIre to detect water molecules, so it should be possible to guess the closest topology and use that. (Perhaps warning the user.)

However, the main cause of this limitation is a design flaw in SOMD, i.e. that it requires an AMBER water topology naming convention. This means that we need to know the water topology ahead of time in order to convert the water molecules in the system to the correct format required by SOMD. (For any free energy simulation, not just binding.) There is code in Sire to do this, but it doesn't auto-detect the topology.

I personally don't see the need for this restriction in SOMD since the code just seems to use the names in order to detect the waters. There is now code in Sire that can search for water molecules, e.g. system.search("water") so it should be an easy fix. It also would have just been easy to check the atoms by element type rather than name in the first place. I'll add a Sire issue for this to make sure it gets fixed.

@jmichel80
Copy link
Contributor Author

jmichel80 commented Aug 13, 2019 via email

@lohedges
Copy link
Member

I currently guess the water model topology if the information is missing in free_energy.py. For now, I'll add the same logic to the binding free energy code and warn the user, so that they are aware if the topology has changed.

I think a perturbable molecule reader would be a good goal too. This should be easy enough for the SOMD input, i.e. just reading the pert file and the lambda = 0 rst7/prm7. (Other than the naming/ordering not corresponding to the original system, so this may differ between nodes!). GROMACS is a little harder, but should be doable too.

Let me know if there are other things that you find are breaking the modularity. Most of this seems a special case for the free energy perturbation set-up/simulation, since additional information is required that is not stored the coordinate and topology files, e.g. the pert file itself and the name of the water topology. (We could always pass the water topology name to the FreeEnergy constructor to choose a water topology for the free leg.)

@lohedges
Copy link
Member

lohedges commented Aug 13, 2019

I have pushed a workaround for your issue which should work in most cases. The only thing I don't account for is detecting SPC/E rather than TIP3P. You should no longer need the following in your script:

from Sire import Base as _SireBase
system1._sire_object.setProperty("water_model", _SireBase.wrap("tip3p") )

@lohedges
Copy link
Member

lohedges commented Sep 6, 2019

Hi @jmichel80, is this workaround satisfactory for your use case? Assuming you'd like to solvate in one BioSimSpace node, then setup an FEP simulation in another, then BioSimSpace would correctly detect the water model that was used in the solvation stage, i.e. you wouldn't need to pass the name of the chosen water model through as an extra input argument.

As you suggest above, I think we should still have a discussion about what other information might need to be preserved between nodes and how we'd like to go about handling this. Ideally the node output should be human readable, so it might be easiest to have some kind of simple metadata file that's passed through. This could contain basic system properties such as the name of force field used to parameterise molecules, and the name of the water model, i.e. things that could be inferred from the information in the topology file, but isn't explicitly included there.

@jmichel80
Copy link
Contributor Author

jmichel80 commented Sep 6, 2019 via email

@lohedges lohedges changed the title API design flaw BSS.FreeEnergy.Binding How to preserve information that isn't stored in coordinate/topology files between nodes Oct 16, 2019
@lohedges
Copy link
Member

Yes, it could just be an additional record in the YAML file that is already created by a node.

skfegan added a commit to CCPBioSim/BioSimSpace that referenced this issue Nov 19, 2019
* Added documentation for Metadynamics package.

* Clearer error message.

* Added Bound and Grid types to simplify CollectiveVariable instantiation.

* Default to SOMD engine for free energy simulations.

* Fixed issue clearing grid.

* Added metadynamics protocol support.

* Expose the molecule number.

* Adding functionality to interface with PLUMED.

* Updated GitHub bug template.

* Typos.

* Fixed comparison bug.

* Added functionality for creating PLUMED configuration files.

* Locate PLUMED exe and check that version is supported.

* Updated list of input files and command-line arguments.

* Fixed atom number range in ENTITY record.

* Print to COLVAR file.

* Match frequency of GROMACS output to PLUMED.

* Switch to PLUMED interface class so that we can track units, etc.

* Let PLUMED interface class handle its own files.

* Added functionality for logging COLVAR and HILLS time-series records.

* Added functionality for getting collective variable record data.

* Added convenience function for initialising metadynamics processes.

* Added functionality for computing free energy estimates.

* Added functionality for generating contour plots.

* Added instructions for installing and configuring PLUMED.

* Fixed __str__ output.

* Explictly state name of PLUMED file.

* Revert print formatting for BioSimSpace types.

* Lower contour grid resolution.

* Move hill_width to the collective variables so we can handle units.

* Added metadynamics example notebook.

* Fixed test node.

* Increase contour resolution.

* Fixed angle string conversion.

* Excude non-unit based types from test.

* Fix initialisation of num_bins member data.

* Store wall and grid bounds in default units.

* Automatically estimate number of bins from hill width.

* Updated metadynamics documentation.

* Allow easier real-time update of free energy plots.

* Fix Sphinx docstrings.

* Fix documentation. ***NO_CI***

* Get time records from COVLAR file so that they are in sync.

* Rebuild index lookup dictionaries when molecules are added/removed.

* Move duplicated functionality into helper function.

* Added link to GitHub repository.

* Added note regarding Conda install/update issue.

* Grammar tweak.

* Dont update base environment.

* Added Dockerfile for building third-party applications. ***NO_CI***

* Updated Dockerfile with new third-party package links. ***NO_CI***

* Slight refactor. ***NO_CI***

* Fix PATH. ***NO_CI***

* Fix name of PLUMED archive. ***NO_CI***

* Fix WORKDIR for ROOT user. ***NO_CI***

* Install PLUMED dependencies. ***NO_CI***

* Fix WORKDIR for NB_USER. ***NO_CI***

* Suppress unnecessary warnings from RDKit import.

* Fix __str__ method.

* Allow indexing for lists of molecules, residues, and atoms.

* Trash Conda package since it doesn't play nicely with setup.py install.
***NO_CI***

* Fileupload is bundled with Conda package. ***NO_CI***

* Use internal copy functions.

* Add OpenBabel to Conda dependencies.

* Added formalCharge function and option to override net charge.

* Added missing module.

* Updated YAML template for notebook server. ***NO_CI***

* Fix license header formatting issues. ***NO_CI***

* Typo. ***NO_CI***

* Handle dimensionless collective variables.

* Typo. ***NO_CI***

* Restrict grid for periodic Torsion collective variable.

* Added basic support for restarting metadynamics simulations.

* Add net_charge option to GAFF2 protocol.

* Validate 'net_charge' in GAF22 protocol constructor.

* Added cyclohexane example molecule. ***NO_CI***

* Use RDKit for formal charge calculation.

* Added support for redirecting stdout/stderr.

* Suppress stderr when computing formal charge with RDKit.

* Typo.

* Store BioSimSpace System object to reduce conversions.

* Added support for perturbable molecules in all SOMD protocols.

* Typo.

* Formatting tweak in warning message. ***NO_CI***

* Added functionality for computing axis-aligned bounding boxes.

* Improved SOMD trajectory and snapshot handling.

* Use buffering to handle trajectories for short simulations.

* Add trajectory support for free energy protocols.

* Renamed Docker directory.

* Can now use RDKit from conda-forge channel.

* Fix exception.

* Use trjconv to get latest trajectory frame.

* Downgrade Conda to resolve RDKit environment issue.

* Unpin conda-build version. ***NO_CI***

* Use RDKit from conda-forge channel. ***NO_CI***

* Raise warning if trjconv fails to extract trajectory frame.

* Re-pin conda-build version.

* Copy updated space property into original system.

* Fix trajectory run time and buffering interval.

* Improved comment.

* Copy updated space property into original system.

* Adding missing blocking code to getSystem method.

* Expose documentation for stdout/stderr redirection contextmanagers.
***NO_CI***

* Update documentation for dimensionless Bound and Grid types.

* Added note about Conda 4.7 issue. ***NO_CI***

* Always copy coordinates back into original system.

* Improved ring breaking detection.

* Handle perturbations that change ring size.

* Updated ring size test input.

* Improved test for ring size change.

* Added option to detect and allow changes in ring size.

* Updated node with option to allow ring size changes during perturbation.

* Handle merged molecule with variable number of bonds.

* Simplify setup script.

* Conda 4.7.10 is now working.

* Try unpinning conda-build version.

* Fix pip install command.

* Only import NGLView just before first use.

* Fall back on Sire MCS if RDKit mapping doesn't include prematch.
[closes michellab#102]

* - fixes import failure

* Switch to relative import for consistency with metadynamics package.

* - added a missing parenthesis
- replaced os.system() calls with native Python calls

* Fix corrupt file and bring inline with prepareFEP.py script. ***NO_CI***

* Refactor import statements.

* Improve performance when adding System objects together.

* Remove lazy_import package.

* Typo.

* Fixed typos in Sphinx doc strings.

* Added wrapper around Sire.Mol.Molecules.

* Updated docstring.

* Allow charge calculate on Molecule container.

* Allow Molecule container to be translated.

* Expose getAxisAlignedBoundingBox method for Molecule container.

* Allow slicing of and iterating over SearchResult object.

* Update re-indexing test to new API.

* Catch "Incompatibility" error message from Sire.

* Update notebook with new SearchResult API. ***NO_CI***

* Fix __setitem__ method.

* Allow indexing, slicing, and iterating over System object.

* Added getMolecule method.

* Update nodes and demos to faster molecule indexing API. ***NO_CI***

* Update with new improved molecule indexing.

* Fixes due to API changes.

* Fixes to preserve correct molecule ordering.

* Use search functionality to speed up search for non-water molecules.

* More API updates to example notebooks. ***NO_CI***

* - get the unit tests to pass on Windows

* Further optimisations.

* Formatting tweak.

* Fixed docstring. ***NO_CI***

* Added test for multiple ways of indexing molecules in a system.

* Added test for object iterators.

* Updated name of test script.

* Formatting tweak.

* - BSS considers rings to be broken also when they are not

* - added test cases for growing a ring

* - enforce more rigorous checks for ring cleavage to avoid false positives

* Fix stderr redirection issue from within Jupyter notebook. [closes michellab#109]

* Formatting tweak.

* Added missing atom information in bond records.

* - if test_somd.py fails due to a "RuntimeError: Particle coordinate is nan" error, it should pass with a warning rather than failing

* Make Residue object indexable, sliceable, and iterable.

* Handle upper case water model names.

* Guess water topology if "water_model" property is absent. [Re: michellab#112]

* Formatting tweak.

* Also warn user when guessing water topology for SOMD conversion.
[Re: michellab#112]

* Update method name to extract underlying Sire object. ***NO_CI***

* Formatting tweak. ***NO_CI***

* Formatting tweaks. ***NO_CI***

* Added _getFrame method to aid modularity.

* Added functionality to sample configs by collective variables value.

* Show how to sample configurations from the free energy surface.

* Return None for both return values.

* Grammar tweak. ***NO_CI***

* Added warning about non-sampled phase space regions. ***NO_CI***

* Added hydration free energy demo. ***NO_CI***

* Typo. ***NO_CI***

* Make sure node library is repopulated after ugprade. ***NO_CI***

* Allow BioSimSpace notebook usage stats to be tracked. ***NO_CI***

* Avoid use of built-in function name. ***NO_CI***

* Make clear that latest version is dev package. ***NO_CI***

* Add Conda dependencies for migrated PyPi packages.

* Added info regarding changing IPython kernel. [closes michellab#113]

* Explain how to use sire_python with terminal IPython console.

* Remove macOS Python interpreter workaround.

* Add import test.

* Added image showing phi and psi dihedral angles. ***NO_CI***

* Set hill_width in constructor.

* Handle Molecules container as System argument.

* Formatting tweaks.

* Restrict number of threads on notebook server.

* Fix viewMolecules function for updated NGLView.

* Populate container with new/updated workshops. ***NO_CI***

* PLUMED requires a minimum of two threads.  ***NO_CI***

* Remove OMP_NUM_THREADS since it causes more problems than it solves!
***NO_CI***

* Added new workshops to update script. ***NO_CI***

* Add openblas Conda depdendency and GROMACS AVX_512 support. ***NO_CI***

* Patch BioSimSpace to restrict number of threads on notebook server.
***NO_CI***

* Use a patch file to reduce file size. ***NO_CI***

* Install patch via Conda. ***NO_CI***

* Fix patch command. ***NO_CI***

* Re-apply GROMACS patch during update. ***NO_CI***

* Fix update_biosimspace script. ***NO_CI***

* Formatting tweak in Exception message.

* Improved handling of string options, e.g. protocol names.

* Updated CHANGELOG for the 2019.2.0 release.

* Added download links for 2019.2.0 binaries.

* Fixed CHANGELOG entry.

* Added instructions for creating a BioSimSpace release. ***NO_CI***

* Added example showing how to load a custom GROMACS topology.

* Use Conda to install dependencies to ensure matching versions.

* Fix solvation of system already containing water molecules.

* Fix Sphinx documentation issues.

* Added media page to website.

* Fix path to to patch file.

* Make FKCOMBU download resilient to server down time.

* Expose metadynamics protocol to "protocols" function.

* Fix issue calling _is_notebook from thread in interactive mode.

* Added missing import.

* Handle xtc trajectory files and custom protocols. [closes michellab#126]

* Formatting tweak. ***NO_CI***

* Revert to conda-build 3.17 due to issues with version 3.18 on macOS.

* Improved trajectory handling for custom protocols.

* Pin conda-build version in macOS Azure Pipeline config file.

* Fix indentation of helper function.

* Whitespace trim.

* Allow user to set path to node directory.

* Added section on interoperable nodes to website.

* Added license badge. ***NO_CI***

* Added separate page with details regarding notebook server.

* Added "verbose" mode to enable access to full error trace. [ref michellab#128]

* Added "compatibility" section to website. [ref michellab#128]

* Whitespace trim.

* Added first draft of "protocols" section. [ref michellab#128]

* Add status badges to website.

* Standardise syntax and improve setting of member data. ***NO_CI***

* Fixed typos in docstrings. ***NO_CI***

* Remove confusing and redundant viewMolecules function.

* Allow view to be created from a list of molecular input files.

* Update list of protocols. ***NO_CI***

* Typos. ***NO_CI***

* Restrict size of ring to speed up search.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants