[TUTORIAL] Funnel metadynamics #194

lohedges · 2021-03-02T16:11:20Z

This is a thread to discuss the creation of a tutorial showing how to implement funnel metadynamics within BioSimSpace.

jmichel80 · 2021-03-03T21:48:41Z

Adding current diagram summarising the use case:

dlukauskis · 2021-03-22T13:02:18Z

I've ran a fun-metaD sim that was setup for gromacs and I got a few errors.

Missing definition for the ligand center of mass, after the definition of p1 and p2 points.
lig: COM ATOMS=first_ligand_atom_id-last_ligand_atom_id
Missing sigma parameter for the extent CV, so instead of
metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025 ...

it should be

metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025,0.05 ...

lohedges · 2021-03-22T13:48:28Z

Thanks for catching these. I'll be able to push a fix tomorrow.

…

On Mon, 22 Mar 2021, 13:02 dlukauskis, ***@***.***> wrote: I've ran a fun-metaD sim that was setup for gromacs and I got a few errors. 1. Missing definition for the ligand center of mass, after the definition of p1 and p2 points. lig: COM ATOMS=first_ligand_atom_id-last_ligand_atom_id 2. Missing sigma parameter for the extent CV, so instead of metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025 ... it should be metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025,0.05 ... — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#194 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAE6K3IIUF5ATUZHWHMVRDDTE45W3ANCNFSM4YPIREAQ> .

dlukauskis · 2021-03-22T19:40:37Z

I've made some functions that draw the funnel for NGLview, here's the code. The idea behind using NGLview is to rapidly check that the funnel points out the right way and includes/excludes the right bits of the protein. @lohedges, could you have a go at making my functions built into BSS code?

lohedges · 2021-03-23T10:31:54Z

Thanks, @dlukauskis. I've fixed the issues with the GROMACS PLUMED file and have added a check so that the ProjectionOnAxis.cpp file is only copied to the working directory when the PLUMED version is less than 2.7.

The plotting functionality looks great. I'll start implementing this now. Just to check... you define three vectors: origin, v1, and v2. Both origin and v1 are identical in this case. Which system did you use to define the values of v1 and v2? Are these just the COM of p0 and p1 used to define the funnel CV? (From reading the function doc strings, it would seem so.) I see you include a solvated.pdb file. Does this correspond to one of the systems from your original tutorial?

dlukauskis · 2021-03-23T12:35:42Z

Oh yeah, origin = v1 = p1 and v2 = p2. This was based on some older bit of code so I forgot to remove all of the redundancies like this. Solvated.pdb is 2WI3, which is almost identical to 2WI2 from the tutorial, except for a slightly different binding pose.

lohedges · 2021-03-23T14:34:35Z

I've now implemented the visualisation code natively in BioSimSpace. You can do the following to generate a BioSimSpace.Notebook.View object, e.g. using files from your tutorial:

import BioSimSpace as BSS

# Load the system.
system = BSS.IO.readMolecules(["input/2WI2.prmtop", "input/2WI2.rst7"])

# Create the funnel parameters.
p0, p1 = BSS.Metadynamics.CollectiveVariable.makeFunnel(system)

# Define the collective variable.
cv = BSS.Metadynamics.CollectiveVariable.Funnel(p0, p1)

# Create a view.
view = BSS.Metadynamics.CollectiveVariable.viewFunnel(system, cv)

This can then be used to visualise the funnel.

View the entire solvated system and funnel:

view.system()

View the protein, ligand, and funnel: (Assuming the protein and ligand are the first two molecules)

view.molecules([0, 1, -1])

Just view the funnel:

view.molecule(-1)

Let me know if I've missed anything, or if you come across any issues.

dlukauskis · 2021-04-12T12:55:47Z

Yep, that visualisation code works really well! I noticed one issue with the PLUMED file still, where line 473 in _plumed.py
colvar_string = "lig: COM=%d-%d" % (idx+1, idx+num_atoms)
should be
colvar_string = "lig: COM ATOMS=%d-%d" % (idx+1, idx+num_atoms)

I'm also attaching a metadynamics.py for OpenMM 7.4.2 that allows to print hill heights.
metadynamics.zip

lohedges · 2021-04-12T13:30:44Z

Thanks, I'll try to figure out how we can monkey-patch the version of OpenMM that we bundle (either in the binary or from conda). It looks like nothing has changed in the metadynamics code for OpenMM 7.5, so this should work for the conda-forge build of OpenMM too.

dlukauskis · 2021-04-13T13:47:22Z

This is a slightly better version of metadynamics.py that I sent yesterday, removed some lines that were in the older version for testing and debugging. Also, checkout the changes I've made to the typical OpenMM run.py that makes it write a HILLS file.
run.zip

lohedges · 2021-04-13T14:06:41Z

Fantastic, thanks for these. I'll try to incorporate the changes tomorrow morning and report on progress during the meeting. For simplicity I'll probably bundle your modified metadynamics script and copy it to the working directory at run-time, e.g. like we do for the additional PLUMED file.

lohedges · 2021-04-14T09:24:07Z

Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the openmm.py script that is generated to see if it looks okay?

When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.

lohedges · 2021-04-14T09:25:59Z

We might also need to write the OpenMM colvar to a text file, but that should be easy enough to do, or check for the COLVAR.npy file and convert it within the PLUMED code (or just use the NumPy binary directly).

dlukauskis · 2021-04-14T12:16:41Z

Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the openmm.py script that is generated to see if it looks okay?

Sure.

When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.

The COLVAR.npy only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would. HILLS file written by my code is formatted to be identical to HILLS made by PLUMED. I've tested getting 2D and 1D FES from my HILLS by feeding that file into PLUMED and it spits out FES that are identical to ones made by OpenMM. Here's the zipped HILLS file.

lohedges · 2021-04-14T13:11:06Z

Thanks, I'm glad that the FES agree with OpenMM 👍 I'll run the HILLS file through our own sum_hills wrapper when I get a chance to make sure all is okay.

The COLVAR.npy only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would.

That's okay. I only use the file for getting instantaneous and time series values of the collective variables, with some tricks to make sure that I extract the correct variables based on their names in the original PLUMED file. I should just be able to use the information in your file for the same purposes. I could just use the HILLS file too, although in general (using PLUMED) I write to the COLVAR more frequently than HILLS.

dlukauskis · 2021-04-16T12:24:08Z

I'm trying to use OpenMM to minimize and equilibrate as part of my fun-metaD tutorial. I've ran into an issue with minimization using OpenMM, just giving me a generic
simtk.openmm.OpenMMException: Particle coordinate is nan
Then I checked the RST7 and PRM7 files and they look really off when I visualise them using VMD. The protein and the ligand bonds are fine, but the solvent seems broken. This might be an issue with read/write of the files. Here are the inputs.

lohedges · 2021-04-16T13:17:10Z

Was the system completely prepared in BIoSimSpace, or was this a fully solvated system that you loaded in initially? If the latter, could you upload those files too. We do some internal conversions to make sure that waters are formatted correctly for different MD engines, so something might have gone wrong here? (I've tested that I get the same doing round-trip conversions this way, though.)

dlukauskis · 2021-04-16T13:22:23Z

Here are all the inputs and the script that I used to set up and run the minimization. I started from a protein PDB with some water I discard and a ligand MOL2.

lohedges · 2021-04-16T13:32:51Z

Running the minimisation with GROMACS (using the solvated system directly) works so something must be going wrong with the conversion to AMBER, which is what is used for OpenMM. I'll see if preventing the water model conversion makes any difference.

lohedges · 2021-04-16T13:45:57Z

Actually, we don't convert the water topology when running with OpenMM, only with AMBER or GROMACS. This means that OpenMM is using AMBER format files with a GROMACS water topology. If I explicitly convert the topology then it still fails, i.e. doing the following before creating the process.

solvated._set_water_topology("AMBER")

I'll try using the GROMACS files directly with OpenMM to see whether that works.

I'm not sure of the issue with this system, since this is the basic setup for pretty much all of our simulations and I haven't come across many blowups like this.

lohedges · 2021-04-16T13:59:12Z

I think it's an issue with the protein. If I just solvate the protein, I see the same blow up. If I just solvate the ligand, then it works.

lohedges · 2021-04-16T17:16:40Z

For reference, the minimisation crashes if you just use the protein in vacuum, so I don't think it's anything specifically to do with the water molecules in the system.

dlukauskis · 2021-05-14T14:27:17Z

report_log_every should be == to openmm.log write frequency. I suppose if the user changed the frequency between the restarts that would break things.

I don't think the binary checkpoint file tracked time or steps, but now that you mentioned it, I checked openmm.xml contents and it does tell you the write time!

<State openmmVersion="7.4.2" time="3201.9999999381066" type="State" version="1">

That makes things much easier, just look inside the XML file and figure out the elapsed number of steps.

dlukauskis · 2021-05-14T14:29:53Z

If OpenMM does start from zero again this would mess up my reporting of time series data and I'd need to add some extra logic to fix the lists of steps and time values that would be returned to the user.

I don't know if you can tell openMM to restart from step X (as read from the state file), I think each time you create a simulation object, it just starts the step count from zero.

lohedges · 2021-05-14T17:38:35Z

Thanks for the info. Since we're working in Python land I imagine that it will be possible to set some attribute of the simulation object (probably private) to specify the starting step and time. If not, we should be able to just monkey-patch the state reporter so that it appends the correct values, i.e. offsetting them by the final step and time from the previous run. I'll play around on Monday.

lohedges · 2021-05-17T08:54:37Z

The context member of the simulation object has a setTime method. The is also setParameter, which takes a key-value pair, so could presumably be used to set the step too. I'll see if I can get something working. (I'm still surprised that these aren't loaded and set from the state, though.)

lohedges · 2021-05-17T12:13:29Z

It turns out that the simulation time is already set correctly, i.e. it continues from the previous simulation. It's also very easy to set the step:

if os.path.isfile('openmm.xml'):
    simulation.loadState('openmm.xml')
    with open('openmm.log') as f:
        lines = f.readlines()
        last_line = lines[-1].split()
        step = int(last_line[0])
        simulation.currentStep = step

I think I'll try to monkey-patch the state reporter so that it doesn't write the header for repeats. Alternatively, I'll let it write the header, then make sure that it's consistent with the first one that was found in the log file. This will make sure that the information from the repeats is consistent.

dlukauskis · 2021-05-17T13:37:12Z

Looks awesome! You could open a PR on openmm's repo, they'd be interested in incorporating this properly. See #3071.

lohedges · 2021-05-17T14:58:37Z

Okay, I think I've almost got this working. A quick question from testing... Do you know why I always get the following error if the bias factor is set to 1?

Traceback (most recent call last):
  File "openmm.py", line 166, in <module>
    current_cvs = np.array(list(meta.getCollectiveVariables(simulation)) + [meta.getHillHeight(simulation)])
  File "/home/lester/Downloads/fixed_restarts/metadynamics.py", line 197, in getHillHeight
    currentHillHeight = self.height*np.exp(-energy/(unit.MOLAR_GAS_CONSTANT_R*self._deltaT))
  File "/home/lester/sire.app/lib/python3.7/site-packages/simtk/unit/quantity.py", line 406, in __truediv__
    return (self/other._value) / other.unit
  File "/home/lester/sire.app/lib/python3.7/site-packages/simtk/unit/quantity.py", line 409, in __truediv__
    return self * pow(other, -1.0)
ZeroDivisionError: 0.0 cannot be raised to a negative power

It looks like I'll need to figure this out, or set a different default value.

lohedges · 2021-05-17T15:20:38Z

Setting it to anything above 1.0 works, i.e. 1.000001, so I'll just do that. Must be a rounding issue.

lohedges · 2021-05-17T15:42:55Z

I've pushed an update the implements restarts for production and metadynamics protocols with OpenMM. I've done some basic testing, but could you check that it works as expected for your metadynamics runs?

I've also updated the way restarts are handled for the regular PLUMED implementation so that things are consistent. When I'll get time, I'll look at implementing something simulation for the regular production (and possibly equilibration) protocols with the other engines. (Equilibration is trickier, since you might need to know and be able to set the current temperature.)

lohedges · 2021-05-17T16:57:40Z

Just realised that I need to fix the hardcoded check for the checkpoint frequency, i.e. make it work regardless of the integration time step, etc. I'll update that tomorrow.

dlukauskis · 2021-05-20T14:51:20Z

Hey Lester, I'm having some PBC-related issues with BSS funnel assignment. If I use a truncatedOctagedron box and run the setup simulations independently multiple times, sometimes BSS will fail to make a funnel, telling me about not finding any nearby CA atoms. I've had a look at the structures and it's an issue with PBC, where equilibration will sometimes end up translating the ligand across the periodic boundary. Here's a ZIP with the input files and a NB. It's odd that MDAnalysis doesn't account for that.

lohedges · 2021-05-20T16:11:52Z

Hi Dom. I'll take a look when I'm back towards the end of next week. We actually use Sire's native search functionality rather than MDAnalysis since it's much faster. Quickly looking at the code it appears to do the distance search in an infinite cartesian space so isn't taking the periodic boundaries into account. You can pass through a different space though, so it should be able to handle periodic orthorhmombic and tricnlinic systems too. Orthorhombic systems seem to work fine but I'm not getting the same results for a cubic system represented as a triclinic space. I'll come back to this next week.

[ref #194]

lohedges · 2021-05-23T19:48:50Z

I managed to quickly fix this. (The joys of yet another washout day and having finished all of the books that I brought with me.) The makeFunnel code should now work for orthorhombic and triclinic systems. I also found that Sire's built in center-of-mass evaluator also doesn't consider periodic boundaries, so I've manually adjusted that too, i.e. for locating the binding site from the ligand CoM.

In adding support for periodic systems I also discovered a subtle issue with the Sire TriclinicBox object that could cause memory corruption on copy. This didn't affect reading or writing of triclinic systems, only the internal calculation of distances etc. using a copied space object, which is what happened to be required to solve this problem. As such, you'll need to update both Sire and BioSimSpace to access the new functionality. (Probably easiest to recreate your environment from scratch.)

dlukauskis · 2021-05-26T09:15:26Z

Thanks for that Lester. I've noticed one other thing that I've overlooked. When I do hydrogen mass repartitioning, use a 4 fs timestep and deposit hills in half the usual number of steps, the COLVAR and HILLS file still records the CVs and hill heights every 1000 steps, instead of 500 steps. This basically leads to information loss, with half the hills missing in the record, as we deposit every 500 steps, but record only every 1000. PLUMED wouldn't be able to reconstruct the resulting FES correctly.

My proposal is instead of

# Run the simulation.
total_steps = 2500000
total_cycles = 2500
remaining_steps = 2500000
steps_per_cycle = math.ceil(total_steps / total_cycles)
remaining_cycles = math.ceil(remaining_steps / steps_per_cycle)
start_cycles = total_cycles - remaining_cycles
checkpoint = 100

It could be

# Run the simulation.
total_steps = 2500000
steps_per_cycle = 500 # ie hill deposition rate
total_cycles = math.ceil(total_steps/steps_per_cycle)
remaining_steps = 2500000
remaining_cycles = math.ceil(remaining_steps / steps_per_cycle)
start_cycles = total_cycles - remaining_cycles
checkpoint = 100

lohedges · 2021-05-26T19:54:21Z

Thanks for catching this. It will need a little more thought, since I need to be consistent with what I do for the other engines that support metadynamics where I decouple the frequency at which I report to the log file and deposit the hills. (PLUMED's reporting is independent of the engine to which it is coupled.) This would basically mean that I would need to remove the cycles part and just have a checkpoint system that writes to the log and hills at whatever frequency the user specifies, e.g.:

for x in range(start_step, total_steps):
    while x >= report_checkpoint:
        # Write to log file.
        report_checkpoint += report_interval
    while x >= hills_checkpoint:
        # Write to the hills file.
        hills_checkpoint += hills_interval

lohedges · 2021-05-26T19:58:02Z

Actually, it's easier than I thought since the OpenMM state reporter is independent of the cycle logic. I'll just use the hill frequency from the protocol to determine the number of cycles.

lohedges · 2021-05-27T14:20:09Z

This is now fixed. The COLVAR and HILLS files are written at the hill deposition frequency, whereas the OpenMM state reporter uses the report interval from the protocol. This is consistent with what I do for the other metadynamics engines.

dlukauskis · 2021-07-19T06:16:55Z

Hi Lester, I've been working on a manuscript on a new fun-metaD variation. I had issues with convergence using projection/extent so I tried using RMSD of the ligand as a CV. It's much better at rebinding the ligand. I call this combination of CVs (proj/RMSD) fun-RMSD. Instead of realigning the protein to calculate ligand RMSD, I used p1 set of atoms as part of the indices to calculate the CV. The p1 atoms don't seem to be affected much by the bias, so I think this is a reasonable approach.

Could you implement fun-RMSD into BioSimSpace? Here are the files that show how fun-RMSD differs from fun-metaD. It's basically just 5 lines of code. The analysis part will look exactly the same as well, we integrate out the RMSD to construct the 1D FES along the projection CV. Let me know if you got any questions.

lohedges · 2021-07-19T08:50:30Z

Yes, no problem, this looks super easy to implement.

Just a quick question: Is it possible to do something similar with the regular PLUMED implementation? I guess you could add an RMSD collective variable in the same way, and we already have functionality to write the required PDB file that is required (this was needed for the steered MD tutorial). If not, then I worry that we a providing two different implementations, and this might not be transparent to the user without printing some warnings or renaming some of the objects. (For example, we could have FunnelProjExt and FunnelProjRMSD CVs.) It might even be possible to do something by combining the existing Funnel and RMSD collective variable objects. (Originally I wanted to provide a way of building multi-dimensional CVs, but this is quite tricky in practice.)

lohedges · 2021-07-19T09:00:57Z

Looking at the existing code I think it would be easiest to have two collective variable objects. Currently, all the docstrings refer to extent as the second component so, if we went for a single object, this would need to be updated to extent or RMSD.

francoviscarra · 2022-01-27T18:21:12Z

Hi everyone, I'm trying to follow the tutorial here, but the script prompts me with AttributeError: module 'BioSimSpace.Metadynamics.CollectiveVariable' has no attribute 'makeFunnel' .

lohedges · 2022-01-28T09:22:31Z

Hello there,

Could you confirm how you installed BioSImSpace and what version of the code you are using. I imagine that you have an old package that doesn't have the funnel metadynamics functionality.

import BioSimSpace as BSS
print(BSS.__version__)

francoviscarra · 2022-01-28T12:46:11Z

I installed it from the binary install , as the conda install gets stuck while solving environment. The version is the 2020.1.0.

lohedges · 2022-01-28T12:51:44Z

Yes, you'll need to use a more recent version with the dev or workshop label. These changes were added in early 2021. Could you try the following, which installs for me:

conda create -n biosimspace -c conda-forge -c omnia -c michellab/label/workshop biosimspace
conda activate biosimspace

Cheers.

francoviscarra · 2022-01-28T14:54:55Z

I ended up using mamba mamba create -n biosimspace -c conda-forge -c omnia -c michellab/label/workshop biosimspace and it seems to work fine now. Thank you very much!

Backport fix from PR #193

lohedges self-assigned this Mar 2, 2021

lohedges added a commit that referenced this issue Mar 23, 2021

Fix definition of ligand COM. [ref #194]

9c48467

lohedges added a commit that referenced this issue Mar 23, 2021

Use independent hill width for each CV component. [ref #194]

13ee1dd

lohedges added a commit that referenced this issue Mar 23, 2021

Only copy aux file to workdir if PLUMED version < 2.7. [ref #194]

e510c89

lohedges added a commit that referenced this issue Mar 23, 2021

Added functionality for visualising funnels. [ref #194]

1c4b6b9

lohedges added a commit that referenced this issue Apr 12, 2021

Fix COM string. [ref #194]

05821a2

lohedges added a commit that referenced this issue Apr 14, 2021

Added implementation of PLUMED compatible HILLS file. [ref #194]

1ee8939

lohedges added a commit that referenced this issue Apr 14, 2021

Backup old HILLS file. [ref #194]

b828ef0

lohedges added a commit that referenced this issue May 17, 2021

Implement restarts in OpenMM. [ref #194]

b32c180

lohedges added a commit that referenced this issue May 17, 2021

Make checkpointing adaptive to integration time step. [ref #194]

c4513ef

lohedges added a commit that referenced this issue May 20, 2021

Make sure distance search queries work across periodic boundaries.

cbaac55

[ref #194]

lohedges added a commit that referenced this issue May 26, 2021

Set COLVAR and HILLS reporting to hill deposition frequency. [ref #194]

e682046

annamherz pushed a commit that referenced this issue Mar 5, 2024

Merge pull request #194 from OpenBioSim/backport_193

1bd4e02

Backport fix from PR #193

[TUTORIAL] Funnel metadynamics #194

[TUTORIAL] Funnel metadynamics #194

Comments

lohedges commented Mar 2, 2021 • edited Loading

jmichel80 commented Mar 3, 2021 • edited by lohedges Loading

dlukauskis commented Mar 22, 2021

lohedges commented Mar 22, 2021 via email

dlukauskis commented Mar 22, 2021

lohedges commented Mar 23, 2021

dlukauskis commented Mar 23, 2021

lohedges commented Mar 23, 2021

dlukauskis commented Apr 12, 2021

lohedges commented Apr 12, 2021

dlukauskis commented Apr 13, 2021

lohedges commented Apr 13, 2021

lohedges commented Apr 14, 2021

lohedges commented Apr 14, 2021

dlukauskis commented Apr 14, 2021

lohedges commented Apr 14, 2021

dlukauskis commented Apr 16, 2021

lohedges commented Apr 16, 2021

dlukauskis commented Apr 16, 2021

lohedges commented Apr 16, 2021

lohedges commented Apr 16, 2021

lohedges commented Apr 16, 2021

lohedges commented Apr 16, 2021

dlukauskis commented May 14, 2021

dlukauskis commented May 14, 2021

lohedges commented May 14, 2021

lohedges commented May 17, 2021

lohedges commented May 17, 2021

dlukauskis commented May 17, 2021

lohedges commented May 17, 2021

lohedges commented May 17, 2021

lohedges commented May 17, 2021

lohedges commented May 17, 2021

dlukauskis commented May 20, 2021

lohedges commented May 20, 2021

lohedges commented May 23, 2021

dlukauskis commented May 26, 2021

lohedges commented May 26, 2021

lohedges commented May 26, 2021

lohedges commented May 27, 2021

dlukauskis commented Jul 19, 2021

lohedges commented Jul 19, 2021

lohedges commented Jul 19, 2021

francoviscarra commented Jan 27, 2022

lohedges commented Jan 28, 2022

francoviscarra commented Jan 28, 2022

lohedges commented Jan 28, 2022

francoviscarra commented Jan 28, 2022

lohedges commented Mar 2, 2021 •

edited

Loading

jmichel80 commented Mar 3, 2021 •

edited by lohedges

Loading