Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TUTORIAL] Funnel metadynamics #194

Open
lohedges opened this issue Mar 2, 2021 · 96 comments
Open

[TUTORIAL] Funnel metadynamics #194

lohedges opened this issue Mar 2, 2021 · 96 comments
Assignees

Comments

@lohedges
Copy link
Member

lohedges commented Mar 2, 2021

This is a thread to discuss the creation of a tutorial showing how to implement funnel metadynamics within BioSimSpace.

@lohedges lohedges self-assigned this Mar 2, 2021
@jmichel80
Copy link
Contributor

jmichel80 commented Mar 3, 2021

Adding current diagram summarising the use case:

funnel_md-white

@dlukauskis
Copy link

I've ran a fun-metaD sim that was setup for gromacs and I got a few errors.

  1. Missing definition for the ligand center of mass, after the definition of p1 and p2 points.
    lig: COM ATOMS=first_ligand_atom_id-last_ligand_atom_id

  2. Missing sigma parameter for the extent CV, so instead of
    metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025 ...

    it should be

    metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025,0.05 ...

@lohedges
Copy link
Member Author

lohedges commented Mar 22, 2021 via email

@dlukauskis
Copy link

I've made some functions that draw the funnel for NGLview, here's the code. The idea behind using NGLview is to rapidly check that the funnel points out the right way and includes/excludes the right bits of the protein. @lohedges, could you have a go at making my functions built into BSS code?

@lohedges
Copy link
Member Author

Thanks, @dlukauskis. I've fixed the issues with the GROMACS PLUMED file and have added a check so that the ProjectionOnAxis.cpp file is only copied to the working directory when the PLUMED version is less than 2.7.

The plotting functionality looks great. I'll start implementing this now. Just to check... you define three vectors: origin, v1, and v2. Both origin and v1 are identical in this case. Which system did you use to define the values of v1 and v2? Are these just the COM of p0 and p1 used to define the funnel CV? (From reading the function doc strings, it would seem so.) I see you include a solvated.pdb file. Does this correspond to one of the systems from your original tutorial?

@dlukauskis
Copy link

Oh yeah, origin = v1 = p1 and v2 = p2. This was based on some older bit of code so I forgot to remove all of the redundancies like this. Solvated.pdb is 2WI3, which is almost identical to 2WI2 from the tutorial, except for a slightly different binding pose.

@lohedges
Copy link
Member Author

I've now implemented the visualisation code natively in BioSimSpace. You can do the following to generate a BioSimSpace.Notebook.View object, e.g. using files from your tutorial:

import BioSimSpace as BSS

# Load the system.
system = BSS.IO.readMolecules(["input/2WI2.prmtop", "input/2WI2.rst7"])

# Create the funnel parameters.
p0, p1 = BSS.Metadynamics.CollectiveVariable.makeFunnel(system)

# Define the collective variable.
cv = BSS.Metadynamics.CollectiveVariable.Funnel(p0, p1)

# Create a view.
view = BSS.Metadynamics.CollectiveVariable.viewFunnel(system, cv)

This can then be used to visualise the funnel.

View the entire solvated system and funnel:

view.system()

View the protein, ligand, and funnel: (Assuming the protein and ligand are the first two molecules)

view.molecules([0, 1, -1])

Just view the funnel:

view.molecule(-1)

Let me know if I've missed anything, or if you come across any issues.

@dlukauskis
Copy link

Yep, that visualisation code works really well! I noticed one issue with the PLUMED file still, where line 473 in _plumed.py
colvar_string = "lig: COM=%d-%d" % (idx+1, idx+num_atoms)
should be
colvar_string = "lig: COM ATOMS=%d-%d" % (idx+1, idx+num_atoms)

I'm also attaching a metadynamics.py for OpenMM 7.4.2 that allows to print hill heights.
metadynamics.zip

lohedges added a commit that referenced this issue Apr 12, 2021
@lohedges
Copy link
Member Author

Thanks, I'll try to figure out how we can monkey-patch the version of OpenMM that we bundle (either in the binary or from conda). It looks like nothing has changed in the metadynamics code for OpenMM 7.5, so this should work for the conda-forge build of OpenMM too.

@dlukauskis
Copy link

This is a slightly better version of metadynamics.py that I sent yesterday, removed some lines that were in the older version for testing and debugging. Also, checkout the changes I've made to the typical OpenMM run.py that makes it write a HILLS file.
run.zip

@lohedges
Copy link
Member Author

Fantastic, thanks for these. I'll try to incorporate the changes tomorrow morning and report on progress during the meeting. For simplicity I'll probably bundle your modified metadynamics script and copy it to the working directory at run-time, e.g. like we do for the additional PLUMED file.

@lohedges
Copy link
Member Author

Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the openmm.py script that is generated to see if it looks okay?

When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.

@lohedges
Copy link
Member Author

We might also need to write the OpenMM colvar to a text file, but that should be easy enough to do, or check for the COLVAR.npy file and convert it within the PLUMED code (or just use the NumPy binary directly).

@dlukauskis
Copy link

Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the openmm.py script that is generated to see if it looks okay?

Sure.

When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.

The COLVAR.npy only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would. HILLS file written by my code is formatted to be identical to HILLS made by PLUMED. I've tested getting 2D and 1D FES from my HILLS by feeding that file into PLUMED and it spits out FES that are identical to ones made by OpenMM. Here's the zipped HILLS file.

@lohedges
Copy link
Member Author

Thanks, I'm glad that the FES agree with OpenMM 👍 I'll run the HILLS file through our own sum_hills wrapper when I get a chance to make sure all is okay.

The COLVAR.npy only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would.

That's okay. I only use the file for getting instantaneous and time series values of the collective variables, with some tricks to make sure that I extract the correct variables based on their names in the original PLUMED file. I should just be able to use the information in your file for the same purposes. I could just use the HILLS file too, although in general (using PLUMED) I write to the COLVAR more frequently than HILLS.

lohedges added a commit that referenced this issue Apr 14, 2021
@dlukauskis
Copy link

I'm trying to use OpenMM to minimize and equilibrate as part of my fun-metaD tutorial. I've ran into an issue with minimization using OpenMM, just giving me a generic
simtk.openmm.OpenMMException: Particle coordinate is nan
Then I checked the RST7 and PRM7 files and they look really off when I visualise them using VMD. The protein and the ligand bonds are fine, but the solvent seems broken. This might be an issue with read/write of the files. Here are the inputs.

@lohedges
Copy link
Member Author

Was the system completely prepared in BIoSimSpace, or was this a fully solvated system that you loaded in initially? If the latter, could you upload those files too. We do some internal conversions to make sure that waters are formatted correctly for different MD engines, so something might have gone wrong here? (I've tested that I get the same doing round-trip conversions this way, though.)

@dlukauskis
Copy link

Here are all the inputs and the script that I used to set up and run the minimization. I started from a protein PDB with some water I discard and a ligand MOL2.

@lohedges
Copy link
Member Author

Running the minimisation with GROMACS (using the solvated system directly) works so something must be going wrong with the conversion to AMBER, which is what is used for OpenMM. I'll see if preventing the water model conversion makes any difference.

@lohedges
Copy link
Member Author

Actually, we don't convert the water topology when running with OpenMM, only with AMBER or GROMACS. This means that OpenMM is using AMBER format files with a GROMACS water topology. If I explicitly convert the topology then it still fails, i.e. doing the following before creating the process.

solvated._set_water_topology("AMBER")

I'll try using the GROMACS files directly with OpenMM to see whether that works.

I'm not sure of the issue with this system, since this is the basic setup for pretty much all of our simulations and I haven't come across many blowups like this.

@lohedges
Copy link
Member Author

I think it's an issue with the protein. If I just solvate the protein, I see the same blow up. If I just solvate the ligand, then it works.

@lohedges
Copy link
Member Author

For reference, the minimisation crashes if you just use the protein in vacuum, so I don't think it's anything specifically to do with the water molecules in the system.

@dlukauskis
Copy link

report_log_every should be == to openmm.log write frequency. I suppose if the user changed the frequency between the restarts that would break things.

I don't think the binary checkpoint file tracked time or steps, but now that you mentioned it, I checked openmm.xml contents and it does tell you the write time!

<State openmmVersion="7.4.2" time="3201.9999999381066" type="State" version="1">

That makes things much easier, just look inside the XML file and figure out the elapsed number of steps.

@dlukauskis
Copy link

If OpenMM does start from zero again this would mess up my reporting of time series data and I'd need to add some extra logic to fix the lists of steps and time values that would be returned to the user.

I don't know if you can tell openMM to restart from step X (as read from the state file), I think each time you create a simulation object, it just starts the step count from zero.

@lohedges
Copy link
Member Author

Thanks for the info. Since we're working in Python land I imagine that it will be possible to set some attribute of the simulation object (probably private) to specify the starting step and time. If not, we should be able to just monkey-patch the state reporter so that it appends the correct values, i.e. offsetting them by the final step and time from the previous run. I'll play around on Monday.

@lohedges
Copy link
Member Author

The context member of the simulation object has a setTime method. The is also setParameter, which takes a key-value pair, so could presumably be used to set the step too. I'll see if I can get something working. (I'm still surprised that these aren't loaded and set from the state, though.)

@lohedges
Copy link
Member Author

It turns out that the simulation time is already set correctly, i.e. it continues from the previous simulation. It's also very easy to set the step:

if os.path.isfile('openmm.xml'):
    simulation.loadState('openmm.xml')
    with open('openmm.log') as f:
        lines = f.readlines()
        last_line = lines[-1].split()
        step = int(last_line[0])
        simulation.currentStep = step

I think I'll try to monkey-patch the state reporter so that it doesn't write the header for repeats. Alternatively, I'll let it write the header, then make sure that it's consistent with the first one that was found in the log file. This will make sure that the information from the repeats is consistent.

@dlukauskis
Copy link

Looks awesome! You could open a PR on openmm's repo, they'd be interested in incorporating this properly. See #3071.

@lohedges
Copy link
Member Author

Okay, I think I've almost got this working. A quick question from testing... Do you know why I always get the following error if the bias factor is set to 1?

Traceback (most recent call last):
  File "openmm.py", line 166, in <module>
    current_cvs = np.array(list(meta.getCollectiveVariables(simulation)) + [meta.getHillHeight(simulation)])
  File "/home/lester/Downloads/fixed_restarts/metadynamics.py", line 197, in getHillHeight
    currentHillHeight = self.height*np.exp(-energy/(unit.MOLAR_GAS_CONSTANT_R*self._deltaT))
  File "/home/lester/sire.app/lib/python3.7/site-packages/simtk/unit/quantity.py", line 406, in __truediv__
    return (self/other._value) / other.unit
  File "/home/lester/sire.app/lib/python3.7/site-packages/simtk/unit/quantity.py", line 409, in __truediv__
    return self * pow(other, -1.0)
ZeroDivisionError: 0.0 cannot be raised to a negative power

It looks like I'll need to figure this out, or set a different default value.

@lohedges
Copy link
Member Author

Setting it to anything above 1.0 works, i.e. 1.000001, so I'll just do that. Must be a rounding issue.

@lohedges
Copy link
Member Author

I've pushed an update the implements restarts for production and metadynamics protocols with OpenMM. I've done some basic testing, but could you check that it works as expected for your metadynamics runs?

I've also updated the way restarts are handled for the regular PLUMED implementation so that things are consistent. When I'll get time, I'll look at implementing something simulation for the regular production (and possibly equilibration) protocols with the other engines. (Equilibration is trickier, since you might need to know and be able to set the current temperature.)

@lohedges
Copy link
Member Author

Just realised that I need to fix the hardcoded check for the checkpoint frequency, i.e. make it work regardless of the integration time step, etc. I'll update that tomorrow.

@dlukauskis
Copy link

Hey Lester, I'm having some PBC-related issues with BSS funnel assignment. If I use a truncatedOctagedron box and run the setup simulations independently multiple times, sometimes BSS will fail to make a funnel, telling me about not finding any nearby CA atoms. I've had a look at the structures and it's an issue with PBC, where equilibration will sometimes end up translating the ligand across the periodic boundary. Here's a ZIP with the input files and a NB. It's odd that MDAnalysis doesn't account for that.

@lohedges
Copy link
Member Author

Hi Dom. I'll take a look when I'm back towards the end of next week. We actually use Sire's native search functionality rather than MDAnalysis since it's much faster. Quickly looking at the code it appears to do the distance search in an infinite cartesian space so isn't taking the periodic boundaries into account. You can pass through a different space though, so it should be able to handle periodic orthorhmombic and tricnlinic systems too. Orthorhombic systems seem to work fine but I'm not getting the same results for a cubic system represented as a triclinic space. I'll come back to this next week.

@lohedges
Copy link
Member Author

I managed to quickly fix this. (The joys of yet another washout day and having finished all of the books that I brought with me.) The makeFunnel code should now work for orthorhombic and triclinic systems. I also found that Sire's built in center-of-mass evaluator also doesn't consider periodic boundaries, so I've manually adjusted that too, i.e. for locating the binding site from the ligand CoM.

In adding support for periodic systems I also discovered a subtle issue with the Sire TriclinicBox object that could cause memory corruption on copy. This didn't affect reading or writing of triclinic systems, only the internal calculation of distances etc. using a copied space object, which is what happened to be required to solve this problem. As such, you'll need to update both Sire and BioSimSpace to access the new functionality. (Probably easiest to recreate your environment from scratch.)

@dlukauskis
Copy link

Thanks for that Lester. I've noticed one other thing that I've overlooked. When I do hydrogen mass repartitioning, use a 4 fs timestep and deposit hills in half the usual number of steps, the COLVAR and HILLS file still records the CVs and hill heights every 1000 steps, instead of 500 steps. This basically leads to information loss, with half the hills missing in the record, as we deposit every 500 steps, but record only every 1000. PLUMED wouldn't be able to reconstruct the resulting FES correctly.

My proposal is instead of

# Run the simulation.
total_steps = 2500000
total_cycles = 2500
remaining_steps = 2500000
steps_per_cycle = math.ceil(total_steps / total_cycles)
remaining_cycles = math.ceil(remaining_steps / steps_per_cycle)
start_cycles = total_cycles - remaining_cycles
checkpoint = 100

It could be

# Run the simulation.
total_steps = 2500000
steps_per_cycle = 500 # ie hill deposition rate
total_cycles = math.ceil(total_steps/steps_per_cycle)
remaining_steps = 2500000
remaining_cycles = math.ceil(remaining_steps / steps_per_cycle)
start_cycles = total_cycles - remaining_cycles
checkpoint = 100

@lohedges
Copy link
Member Author

Thanks for catching this. It will need a little more thought, since I need to be consistent with what I do for the other engines that support metadynamics where I decouple the frequency at which I report to the log file and deposit the hills. (PLUMED's reporting is independent of the engine to which it is coupled.) This would basically mean that I would need to remove the cycles part and just have a checkpoint system that writes to the log and hills at whatever frequency the user specifies, e.g.:

for x in range(start_step, total_steps):
    while x >= report_checkpoint:
        # Write to log file.
        report_checkpoint += report_interval
    while x >= hills_checkpoint:
        # Write to the hills file.
        hills_checkpoint += hills_interval

@lohedges
Copy link
Member Author

Actually, it's easier than I thought since the OpenMM state reporter is independent of the cycle logic. I'll just use the hill frequency from the protocol to determine the number of cycles.

@lohedges
Copy link
Member Author

This is now fixed. The COLVAR and HILLS files are written at the hill deposition frequency, whereas the OpenMM state reporter uses the report interval from the protocol. This is consistent with what I do for the other metadynamics engines.

@dlukauskis
Copy link

Hi Lester, I've been working on a manuscript on a new fun-metaD variation. I had issues with convergence using projection/extent so I tried using RMSD of the ligand as a CV. It's much better at rebinding the ligand. I call this combination of CVs (proj/RMSD) fun-RMSD. Instead of realigning the protein to calculate ligand RMSD, I used p1 set of atoms as part of the indices to calculate the CV. The p1 atoms don't seem to be affected much by the bias, so I think this is a reasonable approach.

Could you implement fun-RMSD into BioSimSpace? Here are the files that show how fun-RMSD differs from fun-metaD. It's basically just 5 lines of code. The analysis part will look exactly the same as well, we integrate out the RMSD to construct the 1D FES along the projection CV. Let me know if you got any questions.

@lohedges
Copy link
Member Author

Yes, no problem, this looks super easy to implement.

Just a quick question: Is it possible to do something similar with the regular PLUMED implementation? I guess you could add an RMSD collective variable in the same way, and we already have functionality to write the required PDB file that is required (this was needed for the steered MD tutorial). If not, then I worry that we a providing two different implementations, and this might not be transparent to the user without printing some warnings or renaming some of the objects. (For example, we could have FunnelProjExt and FunnelProjRMSD CVs.) It might even be possible to do something by combining the existing Funnel and RMSD collective variable objects. (Originally I wanted to provide a way of building multi-dimensional CVs, but this is quite tricky in practice.)

@lohedges
Copy link
Member Author

Looking at the existing code I think it would be easiest to have two collective variable objects. Currently, all the docstrings refer to extent as the second component so, if we went for a single object, this would need to be updated to extent or RMSD.

@francoviscarra
Copy link

Hi everyone, I'm trying to follow the tutorial here, but the script prompts me with AttributeError: module 'BioSimSpace.Metadynamics.CollectiveVariable' has no attribute 'makeFunnel' .

@lohedges
Copy link
Member Author

Hello there,

Could you confirm how you installed BioSImSpace and what version of the code you are using. I imagine that you have an old package that doesn't have the funnel metadynamics functionality.

import BioSimSpace as BSS
print(BSS.__version__)

@francoviscarra
Copy link

I installed it from the binary install , as the conda install gets stuck while solving environment. The version is the 2020.1.0.

@lohedges
Copy link
Member Author

Yes, you'll need to use a more recent version with the dev or workshop label. These changes were added in early 2021. Could you try the following, which installs for me:

conda create -n biosimspace -c conda-forge -c omnia -c michellab/label/workshop biosimspace
conda activate biosimspace

Cheers.

@francoviscarra
Copy link

I ended up using mamba mamba create -n biosimspace -c conda-forge -c omnia -c michellab/label/workshop biosimspace and it seems to work fine now. Thank you very much!

annamherz pushed a commit that referenced this issue Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants