-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TUTORIAL] Steered MD #192
Comments
I have started testing GROMACS+PLUMED for steered MD here. I looked at I have also looked at creating a |
Following today's discussion, I'm linking the functions I use to prepare input PLUMED files for steered MD. The CV itself (RMSD here) and the |
Thanks, @AdeleLip, that's really useful. I'll see what's possible to re-implement within BioSimSpace. FYI: I've just pushed an update that adds a |
Yes, most functionality for doing this already exists in BioSimSpace so I'll cook up an example tomorrow. Do you have example input files for the system and target (RMSD reference) that you could share? (Perhaps they are already in the repository, but I missed them.) Would the system and target normally be separate files to begin with, or could the target be extracted from the system with the coordinates replaced with the those of the desired target conformation? I'm just trying to figure out the logical workflow within BioSimSpace, i.e. would the user load two PDB files to start with, one for the system, one for the target, or could we actually create the target in BioSimSpace? (Apologies for any incorrect notation.) |
Yes I've adjusted the trajectory analysis notebook to use Here are some example input files. The system files are just results of equilibration, and the target is a crystal structure (with a bit of modification since I use it to prepare other systems). The way I prepare RMSD reference is to simply take the PDB file and replace the atom indices with the corresponding atom indices from the system itself. I wrote that code very early on (before I used BSS), but it simply loops over all of the residues of the system and the target protein, and within each residue finds atoms with the same names, then changes target protein indices with the ones from the system. This assumes that the proteins have the same sequence start (e.g. in my case the target has to have an ACE cap as well) but can handle things like hydrogens missing from the reference PDB (so you can get residues with indices like [6, 8, 10] where the hydrogens were just ignored). One of the things that it doesn't handle (which I think it really should) is to check that the sequences match, e.g. that the residues with the same index have the same name. Because of the nature of my system, the renumbering gets cut off when it reaches the NME cap, which in the target is a GLY, so only the N atom matches and gets carried over. It isn't used for RMSD calculations and doesn't affect the alignment for this particular system but this is a patchy solution. It could also be possible to replace the system coordinates with those extracted from the PDB, and then save the BioSimSpace molecule as a PDB for plumed to use. I get the indices directly from the whole system PDB, which can be useful especially if the protein isn't the first molecule, but I remember you mentioning you have ways to deal with that. |
Thanks for the detailed explanation. For the example files that you provided there are residues in the target PDB that aren't found in the system. Using the Sire/BioSimSpace search functionality I find 284 out of the 301 residues. Is this to be expected? If so, can I safely ignore any unmatched residues and only update the indices of the ones that I find. With regards to the PDB file required by PLUMED: Do you use the target PDB with the indices updated (and the beta and occupancy columns tweaked)? Would it also work if I wrote a PDB for the corresponding molecule in the system after updating the coordinates? (Perhaps only writing the residues that match.) |
In this case the target protein (WPD loop closed) sequence is a bit longer than the system (WPD loop open) since the C terminus of the WPD loop open is disordered and we are modelling this protein truncated. It is safe to ignore any unmatched residues, since PLUMED doesn't care about continuity, just the atom index. In some of their RMSD examples they only pass the few atoms they want to use for the calculation (we need the others for system alignment, but it illustrates the point). I use the target PDB with the indices updated and then the beta and occupancy columns tweaked. All PLUMED needs is that the indices are the same between the target and the system, and the two columns let it know which indices to use for alignment and which for RMSD calculation. So if it's easier in BSS to find the matching residues/atoms and update coordinates in some copy of the system that should work perfectly. |
I have now added some more tutorial background in the setup-sMD notebook. Some of it is subject to change depending on what you decide to implement in BSS, but it might also give more background on how I prepare the simulation at the moment. |
Okay, here is an archive containing a hokey script that uses Sire/BioSimSpace to create a PDB file for PLUMED. Just run The logic is as follows:
The code would be simplified massively if we could be sure that the target PDB file contained all residues from the matching molecule in the system, since we could use the I've not extensively tested this, so let me know if there is something obviously wrong with the logic. For example, I'm not sure if I'm correctly setting the occupancy and beta factors for atoms that aren't matched, or whether I'm including too many atoms in the PDB. (It contains the entire molecule from the system.) It should also work for cases where the matched molecule isn't the first in the system (which it is in this example). The PDB writer already uses the atom number, which accumulates across all molecules, rather than the index, which counts from 0 for each molecule. It should be easy for me to revise this to get something that works a little more robustly. Cheers. |
Note that if we don't need additional info in the PDB file, e.g. |
I think the problem at the moment is the beta and occupancy columns - with how they are now, PLUMED will use the atoms that do not exist in the target structure to align the system and the all the atoms that matched to calculate the RMSD. I can see you are copying the molecule and updating coordinates. In that context the logic once you've created the atom mapping I think should be:
At the moment the alignment will be carried out to the first frame of the system's H atoms and some left over residues, which wouldn't work to correctly align to the target structure, and the RMSD would be calculated to the entire protein, which also wouldn't make steering possible. I'm attaching example renumbered and modified target files to illustrate this (apologies, I should have included them before). About simplifying the code, in this case it would be easy to make the target PDB contain only what it contained in the system, since I could just also truncate it and leave out the N terminus cap, that would only require deleting some lines. But in the cases where it is swapped around, i.e. the system has the loop closed (and 301 residues) and the target has the loop open (and 284 residues) that would involve adding those residues to the target structure, and I'm not sure if the additional residues would throw off alignments. This is what I meant by saying RMSD is annoying to work with... I see the script does a lot of things to determine which molecule the target structure is for - could we just ask the users to specify this? From my own usage of the scripts, I wouldn't mind that at all. Then there could be two assumptions to make:
I feel like those are reasonable expectations and would only need some modification when preparing target structure. This would be able to handle mismatched sequence length (assuming they are numbered accordingly) and also gaps in the system (full residues and one off atoms). The only issue for the user would be to make sure the residues are numbered correctly if there is an offset (e.g. sequence start is longer in the system or the target) or if there are any gaps. What do you think? |
Ah brilliant, thanks for the clarification, I thought I was missing something obvious ;-) I had also assumed that all atoms in the target PDB that appear in the system were used for the RMSD, rather than using a subset of the atoms using the "RMSD residue range". (Looking again at your code, this is obvious.) I think both of the assumptions that you suggest are perfectly sensible and would massively simplify the problem. If we are clear with the expectations for the target PDB file, then it will likely save a lot of headaches. The only reason for trying to detect the target molecule was in case the system had been re-ordered since it was originally loaded. I am imagining a situation where the user might load a single molecule, then solvate it, do some other operations, delete things, recombine things, etc. If we make the assumption that all residues in the target PDB are in the system, then this would simplify the search too. (I imagine we could just take the molecule with the closest number of residues, which would likely work in the majority of cases.) We could provide an option to pass the index, which always takes precedence. (For example, this is what I do with the funnel metadyamics code when detecting the ligand.) With regards to the "RMSD residue range": Is this always contiguous, or are there cases were you might want multiple ranges, or the ability to pass specific indices rather than a range, or a combination of indices and ranges? This would complicate the atom matching a little, but it could be broken up into stages for each contiguous chunk. I'll rework things on Monday. |
In my experience the RMSD residues have always been continuous, but I'm not sure if that would always be the case. Most other libraries I have used let you give specific indices and/or have some sort of selection language. I think anywhere where the topology is easily accessible, passing indices can give a lot of freedom and makes your work easier. That would also allow for modifications such as using backbone or heavy atoms only, which is common in RMSD calculations. |
Sorry for the slow update on this. I realised the steered-MD is generic to any collective variable supported by PLUMED, with targeted MD (using RMSD as the target) being a special case. I'm just working on making things general so that we can run steered-MD simulations with any of the collective variables that we currently support (besides the funnel). Just looking at your example files above (from
Has something gone wrong here, or am I missing something? (I know that the two columns just represent weights, so can be any number, but your code and the description above suggests that they will always be 0 or 1.) |
You are looking at |
Ah, perfect. I actually thought I had opened the reference file but my terminal autocomplete must have filled in renumbered instead when I hit tab. Bah 🤦 |
I've pushed some changes to the feature_steered_md branch. Here I've implemented an RMSD collective variable that can be used for metadynamics simulations as well as steered MD, although I've yet to write the driver code for steered MD. You can pass in a system and reference molecule to the RMSD constructor and it will do the work of generating the reference PDB file for you, e.g. import BioSimSpace as BSS
# Load a (boring) example system.
system = BSS.IO.readMolecules("amber/ala/*")
# Extract a molecule and translate it for fun.
reference = system[0]
reference.translate(3*[BSS.Units.Length.angstrom])
# Create a RMSD collective variable. Since we don't pass an index for the reference molecule the code
# chooses the molecule in the system with the closest number of residues. Here we've not specified any
# indices for the atoms involved in the RMSD calculation, so all are used. If the user passes a list of
# 'rmsd_indices' then any matching atoms that aren't in this list are used for alignment. We require that
# all of the atoms in the reference are present in the system and that the ordering of residues is the
# same.
rmsd = BSS.Metadynamics.CollectiveVariable.RMSD(system, reference)
# Print the list of PDB strings. These are written to 'reference.pdb' in the working directory when setting up
# a process for metadynamics or steered MD.
print(rmsd.getReferencePDB()) I'm now working on implementing the required moving restraint code for PLUMED. For now I'll probably limit this to a subset of the collective variables that we currently support. (Probably distance and RMSD.) |
Let me know if the constraints are too stringent, e.g. if we want to be able to handle reference molecules that have atoms that aren't in the system. (This is the case for your original example.) It would just be nice to have a way of knowing that we've obtained a good match without the user needing to look at the PDB file, especially for a large molecule. Perhaps we could add a flag to allow a mismatch in the atom number if the user knows that this is the case, e.g. when modelling a truncated version of a molecule. |
I'm also not sure whether it's best to specify the atoms that are involved in the RMSD calculation by index, or to use the residue index as you have done. The atom index seems more flexible, but possibly more work for the user. The |
Also, do we need any special handling for the case were all atoms in the reference are used for the RMSD, e.g. should both the occupancy and beta values be set to 1? Your examples always match a subset of the atoms, where we have 0/1 for the RMSD atoms, and 1/0 for the others. (The example in the PLUMED docs just uses 1 for both sets of weights.) |
About specifying atoms vs whole residues - I think atom indices are the way to go. In my case I use all heavy atoms and the reference PDB does not have Hs so it's not as apparent, but I believe it's common to not use all atoms in a residue to calculate RMSD (e.g. only backbone atoms). To handle the cases where all atoms are used for RMSD (and possibly allow more flexibility) the user could pass two lists, one with occupancy and one with beta values? It is possible users may want to both use atoms for alignment and for RMSD calculation, such as when they're using a large chunk of the protein for RMSD and the leftover atoms don't give good alignment. I'm not sure how common that would be though, so there could of course be an option to use I don't think having to have all reference atoms in the system is that constraining or difficult, but I guess allowing 'unmatched' atoms to be ignored would give it more flexibility? |
I've now created a |
For simplicity I've decided to leave the RMSD CV as is for now. It should work for the purposes of the steered MD (assuming that inputs are edited so that the reference doesn't contain any atoms that aren't in the system). I'll think about making it more flexible after the workshop, i.e. allowing the user to customise the RMSD and alignment weights on a per-atom basis. |
With regards to output from PLUMED when running steered MD: It looks like you can just print exactly the same information as you would when running metadynamics, i.e. the time series values of the collective variables and the associated bias. If so, I should be able to re-use the existing code for getting this information dynamically from a running process. |
I've now got steered MD working with GROMACS in the feature_steered_md branch. I'll test it with your example input (I'll tweak the reference file to only include the atoms in the system) then post an example that you can try locally once I've merged. |
Also looking at the |
Yes, you're right! I'll put that later once I've worked out the partial molecule match. I'm still surprised that it's failing though, since the indies exist in the molecule. I'll update the code and report back. |
Okay, moving it later gives a different error. I'll need to think about this since it looks like the indexing for the (I tested this RMSD evaluation on about 10 moleclue pairs on Friday, bah!) |
Alright, I've just commented out the lines in Another thing that I noticed is that when I create a GROMACS process, the arguments do not include PLUMED: >type(protocol)
BioSimSpace.Protocol._steering.Steering
> process = BSS.Process.Gromacs(system, protocol)
> process.getArgs()
OrderedDict([('mdrun', True), ('-v', True), ('-deffnm', 'gromacs')]) I believe it should be: OrderedDict([('mdrun', True),
('-v', True),
('-deffnm', 'gromacs'),
('-plumed', 'plumed.dat')]) as is for |
Thanks for catching, I'll update now. I've tested the RMSD evaluator and it works if I do it on a per atom basis, rather than passing the entire dictionary of matches in one go. I'm still not sure why that's not working, since it does work for other molecules and for subsets of the matches with this pair. I've checked that it agrees with the distance computed in the space, so I'll just do it per atom and take the average. |
I've got something working just looping over each matching atom pair and computing the RMSD myself using the system |
the example value I think may be a "close enough" value. I get 0.32 nm with PLUMED, so I think that's close enough. Depending on alignment algorithms RMSD values can slightly differ I think. |
Whoops, I didn't realise that it had performed an RMSD alignment using the alignment indices prior to computing the RMSD. If that's the case, then I get the same value: cv.getInitialValue().nanometers()
0.3236 nm |
The updated CV calculation has now been pushed. |
How's it working. I'll merge across to devel if things look okay and continue working on improving the restraint handling. I just had a quick look through your docs and it all looks good. One thing that I hadn't realised is that it uses the system from this issue, i.e. with the strange unfolding issue despite single-point energies agreeing with ParmEd output, etc. It would be good to revisit this before the workshop since I don't think it's a good look to start a tutorial with input that we know is problematic, at least without being able to provide some kind of reason for the discrepancy. I chatted with @chryswoods about this a while ago (since he wrote the AmberPrm parser) and he suggested looking at snapshots from your trajectory and comparing single point energies along it. If we agree, then we must be sampling the same energy surface. |
I've ran some short tests and it's all working great, I'm also hoping to do full-length benchmarking on the cluster. I'm sorry I didn't make it more clear it's the same system! I thought I'd mentioned it at one of the initial meetings. What would be a good way to share the data with you? The trajectory file would be too big, but I could pull a few frames from it? |
Fantastic, glad to hear that it's working. No worries, I think it's good to show how BioSimSpace is being used in an actual and active research project, rather than just using canonical examples that aren't that interesting (any more) and are known to work. Yes, just pull some frames at regular intervals. I imagine 10 should be enough to see if there's a problem. We can choose to get more fine grained if it turns out that something goes wrong after some period of time. |
I was also thinking, if AMBER supports steering in exactly the same way as GROMACS than I could just expose the PLUMED options to AMBER too. If it's the same for metadynamics too, then I could also add this in. I guess the issue is that it only works with |
Okay, I've merged across into devel and will make further fixes there. |
PLUMED works with sander as well, the config files are the same (at least from what I remember from when I started working with PLUMED and tested sander). |
Oh great, that makes things very easy then. I'm not sure why we only supported GROMACS by default then. I guess the metadynamics folks we were working with only used GROMACS and I was led to believe that there was something more complicated with AMBER. |
Direct metadynamics and steered metadynamics support with AMBER is now available in devel. Let me know if it works for you. |
Just a heads up that you can use |
That's a good point! The manual zip is a relic from before native Amber sMD, I will change it. |
For working out the indices of the atoms in the RMSD residues you could use the built in search functionality. Not sure it's too much clearer than what you have, but avoids the need of a string comparison that a user might not know how to do without digging into the code a little deeper. For example: # Here a list of the residues of interest as strings so we can use them in our search.
resnums = [str(x) for x in range(174, 185)]
# Perform the search.
atoms = system.search(f"atoms in resnum {','.join(resnums)} and not element H")
# Now work out the absolute indices of the atoms in the system.
idxs = []
for atom in atoms:
idxs.append(system.getIndex(atom)) |
I've updated the code so that rmsd_cv = BSS.Metadynamics.CollectiveVariable.RMSD(system, reference, rmsd_indices, 0) This won't be an issue for the workshop runs, since they'll be using the old build, but we'll want to update things for the paper. (The paper will hopefully coincide with a new release of BioSimSpace so that people can use the exact same version when following it.) |
I'm reviving this discussion since it's a continuation of the work that was done before to implement sMD into BSS (feel free to move it to a new issue). As I have been using BSS to run a variety of sMD in the past year, I have some small fixes that I now made on
Let me know if you are happy with them, or if you want to redo them in a way that is more consistent with how BSS is written. |
Thanks, @AdeleHardie. Feel free to open a pull request when you are happy with things and I can review. At a quick glance everything looks fine to me. Are you planning on adding any further edits, or is this it for the time being? Cheers. |
I would also like to add an additional CV for evaluating a custom expression from other CVs (PLUMED documentaiton). I was going to do a feature request issue once I finish with the example code. This is a bigger change and isn't necessary to have a working tutorial for the BSS workshop. Would you prefer for me to just open a pull request for these simple changes for now, and set up a separate branch/issue for the new CV? |
Yes, that makes sense. The fixes relate to existing functionality whereas the additional CV will be a new feature, so would be best implemented on a new branch with separate PR. Cheers. |
This is a thread to discuss the creation of a tutorial showing how to implement steered molecular dynamics in BioSimSpace.
The text was updated successfully, but these errors were encountered: