Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many clashes when aligning the decoupled structures of ligands in solvent into the apo state pocket #1

Open
xiaotianzhou1982 opened this issue Mar 31, 2022 · 14 comments

Comments

@xiaotianzhou1982
Copy link

Dear Khalak,

I found there are many clashes when aligning the decoupled structures of ligands in solvent into the apo state pocket. And some of them cannot be solved by just do a minimization and the following calculation cannot continue. Can you please tell me how do you figure out such a problem? For example, in my 165 transition runs, just 60 succeed. I ran a example for cdk2 with ligand 30, but I got a very large dG result (-69.09 kJ/mol) compared to experiment (-42.3 kJ/mol).
CDK2_lig30

@zetadin
Copy link
Contributor

zetadin commented Mar 31, 2022

Hello!

From the W in solvent graph on the left, it seems the coupled ligand had significant structural changes during the equilibrium simulation, hence the second peak and poor overlap between work distributions. The work values themselves also look off. Were you maybe using couple-intramol = yes?
It is possible the system wasn't stable during equilibrium simulations already. We often needed to do energy minimization of original structures in double precision (not single) to ensure stability, and then run 20 ps NVT before running the production NPT simulations for the equilibrium trajectories.

Would you mind sharing the input files and maybe the equilibrium trajectories that gave rise to these results? Also, which version of gromacs are you using?

For comparison here are my results and input files for cdk2 with ligand 30:
cdk2_lig_30_debug.tar.gz
Work in solvent:
wplot
Work in complex:
wplot

Cheers,
Yuriy.

@xiaotianzhou1982
Copy link
Author

Dear Yuriy,

Thank you very much for your reply! I use the same mdp as downloaded from your github. So I used couple-intramol = no. The only different is that I changed "couple-moltype=MOL" to "couple-moltype = ligand" to follow my format of the force field. All my runs are based on single-precision GPU version of gromacs2019.06. I did energy minimization, nvt and npt following the method section in your paper. I noticed there is a large descrepency between my result and yours. I also did the second run for the ligand in solvent, and surprisely, this time the result is also different from my first run. Please download all my input and the procedure I used for every step from this link. Thanks again for your help! Yours Sincerely, Fred.
https://www.jianguoyun.com/p/DYbDxkUQkfWECRiHobYEIAA

wplot

@zetadin
Copy link
Contributor

zetadin commented Apr 1, 2022

Hi, Fred!
This looks like gromacs bug 3403 present in versions before 2021. When couple-intramol = no, the ligand gets decoupled from the rest of the system, but interactions within the ligand itself must remain. This leads to a smaller perturbation between the end states and hence a better overlap between work distributions (which is why we use it). To achieve this, intramolecular interactions in the ligand are added to the exclusion list, so they aren't handled by short range kernels, and instead handled by the free energy kernel. Unfortunately, the free energy kernel did not handle these excluded interactions correctly when they are beyond the electrostatic cutoff. As soon as an excluded pair of atoms gets beyond rcoulomb, we get an abrupt jump in F, U, and dHdl.
.
Ligand 30 can easily go beyond rcoulomb=1.1 nm.
lig30_long

I just uploaded the fix for this bug in gmx 2019.4 that I used for the simulations in the paper. If the goal is to reproduce my results, then use this fix.

A more efficient and safer fix is included in gmx 2021 and newer. Safer because my 2019.4 fix does not catch the case where the excluded pair is separated by more than rlist. Actual rlist is larger than the 1.1 nm requested in the mdp file, because mdrun instead calculates it from verlet-buffer-tolerance and nstlist. When run with GPUs, the remaining error in free energy from this is smaller than uncertainty from repeating the measurement (tested on tyk2).

2021 versions also added a check & a crash for exclusion pairs beyond rlist, so even marginally incorrect results don't accidentally occur. To avoid this crash with 2021, one needs to make sure rlist is larger than the maximum distance within the ligand during the simulation, but still smaller than half the simulation box. Mdrun automatically tunes rlist at start, but if you choose to use gmx 2021, you can force a value with these mdp settings:

verlet-buffer-tolerance  = -1; disables automatic determination of rlist at start of mdrun
rlist                    = 2.0 ; (nm) set this to a bit above expected max size of ligand. This may also need a larger simulation box than in paper and will make the simulation slower.

A much easier alternative is to use couple-intramol = yes (with any recent gmx version) instead and accept the larger perturbation between the end states. The work distributions in solvent and complex will be different from what I shared, but the overall free energy will be close.

I hope this helps. If not, please let me know.
Yuriy.

@xiaotianzhou1982
Copy link
Author

Dear Yuriy,
Thanks a lot for your fix. Now I can get the similar result for the ligand in water leg.
However, for the decoupled ligand in apo pocket there are still some problems. The procedure I used for creating the decoupled ligand in apo pocket is as following:

  1. Get the apo protein structure (165 structures in 10ns npt run). (i.e. apo0.pdb)
  2. Get the holo complex structure. (165 structures in 10ns npt run). (i.e. holo0.pdb)
  3. Get the decoupled ligand structures in solvent. (165 structures in 10ns npt run). (i.e. MOL0.pdb)
  4. Align the holo complex structure onto the apo protein structure using Cα only (holo_aligned.pdb obtained).
    gmx confrms -f1 apo0.pdb -f2 holo0.pdb -n1 index.ndx -n2 index.ndx -o holo_aligned.pdb -one
  5. Extract ligand coordinates from holo_aligned.pdb, obtained holo_MOL0.pdb
  6. Align the decoupled ligand structure onto holo_MOL0.pdb using ligand heavy atom only (MOL0_aligned.pdb obtained)
    gmx confrms -f1 holo_MOL0.pdb -f2 MOL0.pdb -n1 index.ndx -n2 index.ndx -o MOL0_aligned.pdb -one
  7. combine apo protein structure and decoupled ligand aligned.pdb using cat command. (conf.pdb abtained)
  8. Make a box for conf.pdb, gmx editconf -f conf.pdb -o box.gro -d 1.5 -bt dodecahedron
  9. Solvate the box, gmx solvate -cp box.gro -cs spc216.gro -o wat.gro -p top_protein_0.top
  10. Add ions to the box, got ions.gro
  11. Add restrains to the top_protein_0.top
  12. Run em_posre_l1_normal.mdp with double precision because there are crashed between ligand and protein in conf.pdb
    gmx_d mdrun -deffnm em_0
  13. Run eq_nvt_posre_l1_20ps.mdp, the job crashed even though the em step succeed.
    The link is the mdp and all the structure I used for these steps.
    https://www.jianguoyun.com/p/DY-Gdq4QkfWECRie_rYEIAA
    By the way, the structure optimization with double precision gromacs is very slow because it cannot compile with GPU. It takes me 16 hours for 1 run.

@zetadin
Copy link
Contributor

zetadin commented Apr 5, 2022

Re procedure:
In contrast to the procedure you described, in the main procedure of the paper, the aligned structures were not explicitly equillibrated. Non-equilibrium simulations were started directly from the aligned structures instead and corrections were later applied to fix the effect on the free energy:
1-3. as you did
4. Wrap molecules from 1-3 into the periodic box and center protein trajectories on the first chain. If multiple chains are present (like in cdk2), in some frames they can be wraped to different sides of the box and mess up the alignment if centering isn't done.
5. Align apo trajectory onto the holo trajectory by C-alpha, retaining all apo atoms (water, ions, protein)
6. Align decoupled ligand in solvent trajectory onto the holo protein's ligand trajectory via all atoms (only heavy atoms would probably have been better).
7. Add the new decoupled ligand coordinates to the aligned apo trajectory for each frame.
8. Split this new trajectory into individual frames.
9. Start non-equilibrium coupling simulations (ti_l1.mdp) from these frames. No em/nvt/npt necessary. This is a non-physical distribution of states, but we correct for that with post hock decorrelation in the paper.
10. Post hock decorrelation: find the protein-ligand restraining potential that would have produced the aligned distribution of restrained degrees of freedom. Compute the free energy difference between that potential and the potential applied by the actual restraints. Add the difference to the W integrated from dHdl files from 9.
12. Combine with W from non-equilibrium decoupling dHdl files and compute the final dG.

I have written my own python script for for the alignment procedure. It may be helpful to illustrate those steps. It assumes particular file names though, so you'd need to edit it if you want to use it with your current folder structure. Plus it relies on a somewhat modified version of pmx (git clone --branch abs_dG_workflow https://github.com/deGrootLab/pmx.git).
The same branch also has a complete automated workflow that was used for the main results of the paper, though it is lacking in documentation at the moment. If you are interested, I can show you how to use that.

Meanwhile, the procedure you described relaxes the aligned structures, bringing them closer to the equilibrium distribution of the apo protein + the decoupled ligand + restraints, and (probably) starts the non-equilibrium simulations from the relaxed structures. That would be similar to what was done for the central column of Figure S15 in the SI of the paper. Except there I retained water and ions from the apo trajectory as well as the protein and didn't bother with em/nvt or position restraints, just a short npt simulation to bring the decoupled ligand closer the its equilibrium as driven by the protein-ligand restraints.

If you are solvating and equillibrating the apo protein + decoupled ligand system anyway, then you might as well only align the starting structures (eg prot_30.pdb and prot_apo.pdb, not each frame of previous npt) to get the ligand in the right spot in the starting apo structure, run em/nvt_soft/npt, and sample the 165 frames for starting the coupling non-equilibrium simulations from that. This would correspond to the procedure for Figure S14 in the SI. No need to resample the same equilibrium distribution from different starting points. This would also have shorter em than what you are currently doing, as position restraints for em/nvt could be targeted towards the crystal structure, instead of the 298 Kelvin aligned structure.

Re stability:
I can see one issue in your files: em_posre_l1_normal.mdp needs the free energy section as in nvt (fixed mdp), because, as it stands the em simulation treats the ligand as coupled to the rest of the system. In your procedure em receives coordinates for a decoupled ligand from alignment and then spends a lot of effort to make it coupled. Nvt likely crashes afterwards because the coordinates it receives from em are optimized for a coupled ligand, but in nvt the ligand is immediately decoupled. Though em still takes a suspiciously long time even with a decoupled ligand (this may be due to position restraining to a room temp protein structure). After single precision em with this fix nvt is stable for me.

One more thing that can help with stability is using gmx genion from Gromacs 2020 and newer. Older versions sometimes inserted ions too close to the protein or even into the middle of one (if there was water it could replace there). When that happened, protein would swell with water the moment position restraints were removed. From 2020 and on genion has a -rmin parameter (default 0.6 nm) that prevents ion insertion too close to the solute.

Finally, non-equilibrium simulations have been unstable at times for me. Increasing lincs-order and lincs-iter can help, but for the simulations that made it into the paper I just reran the crashing non-equilibrium simulations with a different seed until enough succeeded. It may have been caused by gmx bug #4321 when partially decoupled ligands cross PBC, but we are still investigating how to fix that one.

Cheers,
Yuriy.

Edit: wording

@xiaotianzhou1982
Copy link
Author

Dear Yuriy,
Thank you very much for your explanation. I have tried to rerun my simulation as described in your step 4 to 9. As first try, I aligned the apo and holo trajectory and later aligned the decoupled ligands into the aligned holo trajectory using gromacs command gmx trjconv and gmx confrms (test1). In my test1 simulation I got the ΔG=166.79 kJ/mol, which is 15 kJ/mol lower than yours result 181.50 kJ/mol. I found the forward work is almost same, i.e. about 230 kJ/mol. But the reverse work is ~100 kJ/mol for my run and 120 kJ/mol for your run. I though there may be some mistake when I do the alignment, because I found the ligand which aligned to the apo trajectory (protein+water+ions) has many clashes with the water in the pocket and the sidechains of the pocket residues. Therefore I used your alignment script (with some modification) and do a second round test (test2). However, this time I got almost the same result as my first test, i.e. 163.64 kJ/mol (forward work is ~230 kJ/mol, reverse work is ~100 kJ/mol). I am not sure if I am doing correct thing as you described. You can find the relevant files of my test2 run in the following link, it's about 1GB. Thanks again! Fred.
https://www.jianguoyun.com/p/DRQWcWMQkfWECRjhjrgEIAA

test1
test2

@zetadin
Copy link
Contributor

zetadin commented Apr 11, 2022

Dear Fred,
Your result seems to be affected by the outliers in the backward work distribution that have very low or even negative work values. They will shift the overall dG lower, but I don't know if it's enough to explain the difference.

I think they come from the flip of one of the ligand dihedrals in the decoupled ligand in solvent.
This flip causes a poor alignment of the ligand.
cis_vs_trans_marked_up
This picture is of frame0, where you are getting very small W. As soon as the simulations will start, the protein-ligand restraints will push on the purine in the center and likely cause the benzene and sulfur to clash with the beta-sheet.

I haven't seen this flip in any of my repeats of simulations of the decoupled ligand in solvent for this system. Same for coupled ligand in protein (but the flip did appear in the coupled ligand in solvent, which isn't used for alignment). Did you perhaps re-use the equilibrium simulations made with gromacs 2019.6 without the bug fix? Both end state equillibria of the ligand are affected by the bug when using couple-intramol=no.

If that is not the issue, then the only other difference I can see is selection of protein-ligand restraint anchor atoms and force constants. You seem to have picked yours manually, while we optimized ours (independently for each repeat) to keep the distributions of restrained degrees of freedom as Gaussian as possible in the aligned trajectory. This makes correcting the dG for the restraints more accurate later. The force from the restraints will shift both forward and reverse work distributions by an equal amount, but that amount will be different depending on the restraints. I have seen swings of ~10 kJ/mol from this. But they will get smoothed out once you add the correction for the restraints (the Boresch et al. one, not the post hoc one we added) for that particular repeat.

@xiaotianzhou1982
Copy link
Author

Dear Yuriy,
Thanks a lot for your answer. First, I think it is not the restraint anchor atoms issue. I did not picked the anchor atoms manually, instead they are generated automatically from the holo equilibrium trajectory using a scripy called MDRestraintsGenerator.py from IAlibay's github. Another evidence is that my forward work is as same as yours but only the reverse one is different. As you said, The force from the restraints will shift both forward and reverse work distributions by an equal amount. If it is the restaint issue, I shoud have got a forward work different from yours too.
I think it is still the clash problem as you mentioned because I also noticed there are many clash between ligand and water and sidechains pocket residues. If the bug fix is a issue, should I run simulations using other version of Gromacs like 2021 and use Gromacs.2019.04-bug-fix only for the transition part?
Best regards
Fred
restraint

@zetadin
Copy link
Contributor

zetadin commented Apr 12, 2022

Dear Fred,
You are right about the restraints.

Gromacs 2021 already has the bug-fix included in it, so you can use it or newer versions for everything. The only issue is that it will tell you to make the rlist larger to include the whole decoupled ligand in the neighborlist so the bug-fix in it can correctly handle ALL the excluded intramolecular interactions in it. See my post from April 1 for how to set rlist and not have Gromacs override it. The 2019.4-bug-fix didn't have a check for this, allowing for too short rlists and still slightly wrong results for larger ligands.

The point I was trying to make yesterday is to use any version with the bug-fix for both equilibria and transitions.

In principle small clashes shouldn't be an issue while the ligand is decoupled. Water will move out of the way of the ligand as it gets coupled and the ligand will rotate/deform to get out clashes with the protein unless something prevents it. This involves a lot of energy being dissipated and is responsible for the poor overlap between work distributions, but that isn't going to go away without prohibitively long transitions or manipulation of the potential in the mid-ranges of lambda. The systematic issue here may be the "prevents" bit. The dihedral might not be able to reliably flip back into its bound orientation during the coupling transition, leaving a portion of the transitions in a high energy state, leading to lower backward work values. That's why I'm hoping for the simple solution of the dihedral flip just being an artifact of the bug in the equilibrium simulation of the decoupled ligand in solvent.

@xiaotianzhou1982
Copy link
Author

Dear Yuriy,

Both equilibrium and transitions were performed using 2019.4-bug-fix. To check if it is the clash reason I did a test run using the ligands conformations directly from holo equilibrium after aligning holo trajectory onto apo trajectory. (files are in the following link)That shoud give a better alignment in the apo pocket. The dG of reverse was inproved by 7 kJ/mol, but still 10 kJ/mol lower than your result.
Another concern is I don't understand why you use the decoupled ligand in solvent to align into the apo pocket. I think the reason is you want to include more conformations of the free decouped ligand in solvent. There have two problems. First if the structures of decoupled ligand change too much compared to its couples structure, then you definitely will get a poor alignment. Second, if the structures of decoupled ligand does not change and similar to the structure of the coupled state, this will make a good alignment. However if it is the case why not just use the coupled ligands to fit into the apo pocket?
https://www.jianguoyun.com/p/DaUp_FQQkfWECRjX-rgEIAA
Best regards
Fred
wplot

@vgapsys
Copy link
Member

vgapsys commented Apr 19, 2022

Hey Fred,

I see several issues here: the ensemble that needs to be superimposed into the active site of the protein must come from the decoupled ligand simulations, because it effectively replaces the actual decoupled+restrained ligand simulations in the protein.

Another issue, you cannot just take arbitrary restraints for such a superimposed ensemble. The restraints need to be carefully selected such that an ensemble generated with the chosen restraints would not be distinguishable from the one that you have generated by fitting. For the restraint generation I suggest using our scripts: Yuriy (@zetadin) could you give a brief example how to run the script to create the restraints?

Vytas

@zetadin
Copy link
Contributor

zetadin commented Apr 19, 2022

Sure. The script is here: postHoc_restraining_python3.py

usage:
python postHoc_restraining_python3.py -f frame*.gro -n index.ndx -oii ii.itp -odg dg.dat -T 298 -alpha 0.05

  • f: aligned trajectory as separate files for each frame. The alignment script outputs them.
  • n: index file with candidate anchors atoms. Needs separate groups for ligand and protein.
  • oii: output restraints file
  • odg: output file with the Boresch-Karplus correction for uncorrelated harmonic restraints.
  • T: temperature in K
  • alpha: p-value for determining if distributions of restrained degrees of freedom are not Gaussian. If they aren't, they can't be accurately recreated with harmonic B-K restraints and the script will look for other anchors.

I also checked on the dihedral flip in the decoupled ligand that is causing poor alignment. I never saw it in 5 repeats with 2019.4-bug-fix and only see it in 1 of 5 repeats with gmx 2019.6, suggesting it is just a rare event. Even with optimal restraints, this dihedral would not be able to reliably flip back into the correct orientation during transitions, and so would be lowering the backwards work values. I would suggest doing multiple repeats of the protocol (we used 5 in the paper) and using the mean as the final dG estimate. This helps averaging stochastic errors both from not fully sampling the equilibria and from the poor overlap in work distributions during transitions.

Cheers,
Yuriy.

@xiaotianzhou1982
Copy link
Author

xiaotianzhou1982 commented Apr 22, 2022 via email

@xiaotianzhou1982
Copy link
Author

xiaotianzhou1982 commented Apr 22, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants