Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDBFixer - in some cases added residues have amides with non-trans OCNH stereochemistry #307

Open
ivanicj opened this issue Sep 30, 2024 · 14 comments · Fixed by #308
Open
Labels

Comments

@ivanicj
Copy link

ivanicj commented Sep 30, 2024

Hi There,

I first want to say that for my cases PDBFixer is superior to other methods. For the case of 7zaw.pdb - Human GPC3 having missing atoms and residues (many) - PDBFixer is able to add the missing residues (that can be very long) extremely well, while other methods just aren't anywhere near as good. I noticed in some cases that linked residues can have amides with non-trans OCNH stereochemistry, and then MM minimization can actually make cis OCNH structures.

The test example I provide is 7zaw.pdb (RCSB) where PDBFixer produces the structure in 7zaw_PDBFixer. Examples of residue links having non-trans OCNH structures include:

3(G)-4(D)
5(A)-6(T)
21(L)-22(K)
28(P)-29(V)
161(D)-162(S)
162(S)-163(A)
284(R)-285(I)

I was wondering if there is a way to add residues whereby trans OCNH stereochemistry is enforced. I understand that the residues may have unfavorable positions but I'm hoping that a MM minimization will 'repair' them, while also keeping amide trans OCNH structures.

It's possible I'm not using PDBFixer correctly and thus I end up with non-trans OCNH amides, if so please let me know as this would be the easiest solution.

Thank you for any help you can offer, I very much appreciate it.

Joe

P.S. Note that I had to add ".txt" to end of pdb files so that github would upload them.

7zaw.pdb.txt
7zaw_PDBFixer.pdb.txt

@peastman
Copy link
Member

I think I see the problem. To add missing residues, it first puts them in unrealistic positions, then energy minimizes with a soft-core force field to bring them into more realistic ones. The soft-core force field was created based on Amber ff99SB-ILDN. The problem is that the script used to create it didn't consider wildcards. It's missing any torsion term whose definition involves a wildcard. That includes the one to restrict rotation of the peptide bond.

That should be an easy fix.

@peastman peastman added the bug label Sep 30, 2024
@ivanicj
Copy link
Author

ivanicj commented Sep 30, 2024

If this is an easy fix I'll be very, very happy. Thank you very much again for looking at this!

@peastman
Copy link
Member

The fix is in #308. Can you try it out and see if it works for you?

I tried to write a test case that would verify all added peptide bonds were in the right conformation, but it still found some cis ones, just not as many as before. I suppose that's expected: cis peptide bonds do happen, and anyway, we're just doing a local energy minimization, not a full equilibration. Do you have any suggestions on other ways to test it?

@ivanicj
Copy link
Author

ivanicj commented Oct 1, 2024

Great! Thanks! What is the best way to install version #308? Originally I installed OpenMM with conda and then installed PDBFixer also with conda - conda install -c conda-forge pdbfixer.

Should I download source #308, activate environment OpenMM, and then 'python setup.py install'? Will this replace the original PDBFixer? Sorry for the dumb questions, I just want to be sure to install #308 correctly.

@peastman
Copy link
Member

peastman commented Oct 1, 2024

Since this is prerelease code, it's best to test it in a new environment. The current PDBFixer code also requires recent changes to OpenMM, so you should install the 8.2 beta. You can do it like this:

mamba create --name testenv
conda activate testenv
mamba install -c conda-forge/label/openmm_rc -c conda-forge openmm

Next download the source for the PDBFixer branch containing the change, cd to the root directory, and type

pip install .

@ivanicj
Copy link
Author

ivanicj commented Oct 1, 2024

Thank you very much for the help with installing the test version. I think I did the right thing (git clone https://github.com/peastman/pdbfixer.git). I tested it on the 7zaw.pdb and the resultant pdb is attached (.txt added at end again so github could upload it). The missing residues were added at:

1 - 6
12 - 30
159 - 163
283 - 286
322 - 354
407 - 427
450 - 464

I noticed some cis/90 degree orientations at:

1(E) - 2(T)
4(D) - 5(A)

163(A) - 164(L) 90 degrees
283(Y) - 284(R) 90

322(A) - 323(H) 90
323(H) - 324(S)
324(S) - 325(Q) 90
326(Q) - 327(R) ~90
328(Q) - 329(Y)
330(R) - 331(F) 90
331(F) - 332(A) 90
336(E) - 337(D)
343(K) - 344(V)

415(N) - 416(Q)
418(N) - 419(L)
423(K) - 424(M)

458(K) - 459(H)
463(H) - 464(H)

I understand that this is a very tough case. As far as other ways to test it I can't think of any besides the current test.
Am I right in understanding that there is a local energy minimization of the added residues?

7zaw.pdb.txt
7zaw_PDBFixer.pdb.txt

@peastman
Copy link
Member

peastman commented Oct 2, 2024

It seems the torsion energies aren't enough to make the energy minimization fix them.

I tried implementing a different approach. When it first adds the residues, before minimizing them, it tries to rotate them to get the correct orientations for the peptide bonds. That helps more, though it still doesn't completely fix the problem. In 7ZAW, it brings the number of cis bonds down to only 7. I'll see if I can improve it any further then create a new PR for you to try out.

@peastman
Copy link
Member

peastman commented Oct 3, 2024

Try the version in #309.

@ivanicj
Copy link
Author

ivanicj commented Oct 3, 2024

I very much appreciate you working on this. I know that this is a very tough case (7zaw).
The latest results are in 7zaw_PDBFixer.pdb(.txt).

The missing residues were added at:

1 - 6
12 - 30
159 - 163
283 - 286
322 - 354
407 - 427
450 - 464

I took a look and there seems to be less cis HNCO amide links, two example of cis are:

20(G)-21(L)
21(L)-22(K)

I also notice some very distorted residue structures which is why I was wondering about the local energy minimizations.

Again, I very much appreciate you looking at this and recognize that this is a very, very difficult case.

7zaw_PDBFixer.pdb.txt

@peastman
Copy link
Member

peastman commented Oct 3, 2024

The energy minimization in PDBFixer uses a very approximate force field that just tries to put things into roughly the right positions. Before simulating, you should energy minimize with a real force field. Does that fix the distorted structures?

Overall would you say the results are an improvement from before, or is it better in some ways and worse in others?

@ivanicj
Copy link
Author

ivanicj commented Oct 4, 2024

Hi Peter,

I'm sorry for not replying sooner, I was working on testing the new code for 7zaw.

Here are my steps:

  1. Use PDBFixer, on 7zaw.pdb, to add missing residues and atoms to residues-having-missing-atoms.
  2. Take the resulting structure and cap the ends with ACE & NME.
  3. Optimize the structure (MM) with Amber where the only residues that were active are (note that new numbering accounts for addition of NME cap):

a) added missing residues

(ACE)1 - 7
13 - 31
160 - 164
284 - 287
323 - 355
408 - 428
451 - 466(NME)

b) residues that were repaired (atoms added)

12*
53*
165*,166*
194*
220*
309*
320*
362*
404*

The complete input file for the Amber MM minimization is:


Minimize the system
&cntrl
imin=1, maxcyc=500000, ncyc=2000,
ntwx=25000,
cut=16, ntb=0, igb=0,
ntr=1,restraint_wt=200.0,
restraintmask=':8-11,32-52,54-159,167-193,195-219,221-283,288-308,310-319,321-322,356-361,363-403,405-407,429-450,467-468 & !@h=',
&end

I attach the resultant optimized structure (7zaw-min1-restrain.mdcrd-last.pdb.txt) - note that I added .txt suffix so github would load it.

7zaw-min1-restrain.mdcrd-last.pdb.txt

I went through every added residue and found 21 cis HNCO amides listed below:

21(G)-22(L)
22(L)-23(K)

323(A)-324(H)
324(H)-325(S)
326(Q)-327(Q)
329(Q)-330(Y)
330(Y)-331(R)
331(R)-332(F)
332(F)-333(A)
334(Y)-335(Y)
336(P)-337(E)
338(D)-339(L)
342(D)-343(K)
344(K)-345(V)
345(V)-346(L)
346(L)-347(K)

412(N)-413(G)
414(M)-415(K)
416(N)-417(Q)
417(Q)-418(F)
421(H)-422(E)

Originally, after running PDBFixer on 7zaw, capping and doing a similar MM minimization, I found 23 cis HNCO amides listed below:

3(T)-4(G)

13(S)-14(F)
16(Q)-17(R)
18(L)-19(Q)

323(A)-324(H)
326(Q)-327(Q)
329(Q)-330(Y)
332(F)-333(A)
334(Y)-335(Y)
344(K)-345(V)
346(L)-347(K)
348(V)-349(A)
352(E)-353(H)
353(H)-354(E)

408(K)-409(A)
411(R)-412(N)
412(N)-413(G)
413(G)-414(M)
414(M)-415(K)
417(Q)-418(F)
418(F)-419(N)
422(E)-423(L)
424(K)-425(M)

So, I guess by sheer count the new version is slightly better - 21 vs. 23.

@peastman
Copy link
Member

peastman commented Oct 5, 2024

Here's how I tested it. I first had PDBFixer process the structure with

pdbfixer --pdbid=7ZAW --keep-heterogens=none --add-residues --output=output.pdb

Then I ran this script to check it. It first counts the number of cis peptide bonds in the structure produced by PDBFixer, then energy minimizes it and checks again. I repeated three times with each version of the code, since PDBFixer is not deterministic and results can vary between runs.

from openmm import *
from openmm.app import *
from openmm.unit import *
import numpy as np

def checkTorsions(topology, positions):
    positions = positions.value_in_unit(nanometers)
    residues = list(topology.residues())
    count = 0
    for i in range(len(residues)-1):
        res1 = residues[i]
        res2 = residues[i+1]
        atoms1 = {atom.name: atom for atom in res1.atoms()}
        atoms2 = {atom.name: atom for atom in res2.atoms()}
        if 'CA' in atoms1 and 'C' in atoms1 and 'N' in atoms2 and 'CA' in atoms2:
            atoms = [atoms1['CA'].index, atoms1['C'].index, atoms2['N'].index, atoms2['CA'].index]
            pos1 = positions[atoms[0]]
            pos2 = positions[atoms[1]]
            pos3 = positions[atoms[2]]
            pos4 = positions[atoms[3]]
            v0 = pos1-pos2
            v1 = pos3-pos2
            v2 = pos3-pos4
            cp0 = np.cross(v0, v1)
            cp1 = np.cross(v1, v2)
            angle = np.arccos(np.dot(cp0/norm(cp0), cp1/norm(cp1)))
            if angle < np.pi/2:
                count += 1
    print(count, 'cis peptide bonds')

pdb = PDBFile('output.pdb')
checkTorsions(pdb.topology, pdb.positions)
ff = ForceField('amber14-all.xml')
system = ff.createSystem(pdb.topology)
integrator = VerletIntegrator(0.001)
simulation = Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)
simulation.minimizeEnergy()
checkTorsions(pdb.topology, simulation.context.getState(positions=True).getPositions())

Here are the counts of cis bonds for the three repetitions with the current code.

Before minimization: 30, 33, 30
After minimization: 25, 25, 28

And here are the counts for the code in #309.

Before minimization: 15, 10, 11
After minimization: 11, 10, 16

There's some variation between runs, but not that much. The new version is always better than the old version, usually a lot better. Energy minimization usually decreases the number of cis bonds, but there was one case where it actually increased it.

@ivanicj
Copy link
Author

ivanicj commented Oct 10, 2024

Yes, the new version is far better than the old version. I very much appreciate your work on this. I know it's a very, very tough case.

@peastman
Copy link
Member

Thanks! I merged it. There may still be more we can do to improve it, but at least it's a start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants