Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to read molecules from AMBER files #137

Open
SofiaBariami opened this issue Dec 2, 2019 · 8 comments
Open

Failed to read molecules from AMBER files #137

SofiaBariami opened this issue Dec 2, 2019 · 8 comments

Comments

@SofiaBariami
Copy link
Collaborator

Hello,
I am trying to use combine.py to a system of a protein and a ligand, but I am having this error:

OSError: Failed to read molecules from: ['QUBE_pro.prm7', 'QUBE_pro.rst7']

I got the prm, rst files by using the followin script, that reads pdb/xml files and prints prm/rst ones:

from simtk.openmm.app import * 
from simtk.openmm import * 
from simtk.unit import * 
from sys import stdout 
import parmed as pmd
from parmed.openmm import XmlFile

pdbfile = PDBFile('QUBE_pro.pdb')
ff = ForceField("QUBE_pro.xml")
system = ff.createSystem( pdbfile.topology)
#Swap in positions
integrator = openmm.VerletIntegrator(1.0 )
context = openmm.Context(system, integrator)
context.setPositions(pdbfile.positions)
# Create parmed Structure
structure = parmed.openmm.topsystem.load_topology( pdbfile.topology, system, pdbfile.positions)
# Write AMBER parameter/crd
structure.save('system.prmtop', overwrite=True)
structure.save('system.crd', format='rst7', overwrite=True)

I couldn't upload the files on GitHub, so you can find them through onedrive with the following link:
https://uoe-my.sharepoint.com/:f:/g/personal/s1786388_ed_ac_uk/EubzxglqUZVFvlioJKhJD3UBYhy0428TuMRRBiqcVsAcjQ?e=E8ysnb

Have you seen this problem before, or do you have any ideas on how to solve it?
Thanks a lot!

@lohedges
Copy link
Member

lohedges commented Dec 2, 2019

Hi @SofiaBariami, I was online and happened to have a spare few minutes so took a quick look at this. For reference, this is how you can see the full Sire error trace in BioSimSpace:

import BioSimSpace as BSS

# Turn on verbose error messages.
BSS.setVerbose(True)

# Try to read the molecules.
s = BSS.IO.readMolecules(["Qube_pro.rst7", "Qube_pro.prm7"])

This gives the following error:

UserWarning: Exception 'SireIO::parse_error' thrown by the thread 'master'.
Atoms in residue 1 are not listed as part of the expected molecule 1. Should be atoms 1 to 6107, but molecule has [ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515 ].
Thrown from FILE: /Users/runner/runners/2.160.0/work/1/s/corelib/src/libs/SireIO/amberprm.cpp, LINE: 3783, FUNCTION: SireMol::MolStructureEditor SireIO::AmberPrm::getMolStructure(int, const SireBase::PropertyName &) const
__Backtrace__
(  0) libSireError.2019.2.1.dylib (0x000000010a0f515b +59)
  -- SireError::getBackTrace()

It looks like there is an inconsistency with the information in the files. Can Parmed read the files back in successfully?

@SofiaBariami
Copy link
Collaborator Author

Hi Lester, thanks for looking at it. By taking a closer look to the protein files, I saw that all the residues of the protein are called "QUP", so probably this is what is wrong. Both the pdb and the prm files are like that. Is there a way to overcome this problem using the property map or I should find the file with the correct residues?

@SofiaBariami
Copy link
Collaborator Author

For the record, parmed is able to load back the prm7 files as a system made of a single residue

In [1]: import parmed as pmd                                                                                                                                            
In [2]: pmd.load_file("QUBE_pro.prm7", "QUBE_pro.rst7")                                                                                                                 
Out[2]: <AmberParm 6107 atoms; 1 residues; 6192 bonds; parametrized>
In [3]:  

@jmichel80
Copy link
Contributor

A quick update. The issue was that the input pdb actually contains several molecules, so the Sire parser presumably bails out because the connectivity appears incomplete. parmed doesn't care about these sanity checks and proceeds. @SofiaBariami is testing a workaround.

@lohedges
Copy link
Member

lohedges commented Dec 3, 2019 via email

@jmichel80
Copy link
Contributor

Good point, @SofiaBariami another workaround may be to split the pdb file with TER statements, that may give you a prm7 file that combine.py can read.

@SofiaBariami
Copy link
Collaborator Author

SofiaBariami commented Dec 4, 2019

I did that but I get this Value error:
ValueError: No template found for residue 1 (QUP). This might mean your input topology is missing some atoms or bonds, or possibly that you are using the wrong force field
This is probably because it is assumed that QUP residue has 6107 atoms, but there are only 515 before the first TER

@SofiaBariami
Copy link
Collaborator Author

Hello, an update on this issue:

  • By including TER records between the protein fragments, the residues (all named QUP) were not recognised: ValueError: No template found for residue 1 (QUP). This might mean your input topology is missing some atoms or bonds, or possibly that you are using the wrong force field
  • By including TER records between the protein fragments, changing the names of the residues , and also splitting the xml file into residues with different name that correspond to the pdb ones, I am getting a list index out of range error.
~/sire_test.app/lib/python3.7/site-packages/simtk/openmm/app/forcefield.py in addBond(self, atom1, atom2)
    608             """Add a bond between two atoms in a template given their indices in the template."""
    609             self.bonds.append((atom1, atom2))
--> 610             self.atoms[atom1].bondedTo.append(atom2)
    611             self.atoms[atom2].bondedTo.append(atom1)

I tried to debug the simtk code that is found at forcefield.py, and I found that the program doesn't read all the bond entries of the second fragment, even if the atom names and types are all read successfully. More specifically, the output that I am getting is the following:

...
atomname =  H5024
atomtype =  QUBE_5023
atomname =  H5025
atomtype =  QUBE_5024
atomname =  H5026
atomtype =  QUBE_5025
from, to = 519 516
from, to = 519 520
from, to = 516 515
from, to = 516 517
from, to = 516 518
.
.
.
from, to = 4483 4485
from, to = 4477 4476
from, to = 4489 4488
from, to = 4490 4488
from, to = 4491 4493

It stops at the bond between atoms 4491 and 4493. The next bond-pair is <Bond from="4513" to="4495"/> . I checked the pdb file (with VMD as well), and the specific connection exists. Also, at the XML file, we can find parameters about this bond: <Bond class1="C4513" class2="C4495" k="166657.088" length=".1523"/>, so there isn't a typo. Any ideas on that?
Here you can find the pdb and xml files. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants