Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fragalysis notebook breaks with newest structure content #12

Open
AndreaVolkamer opened this issue Jul 2, 2021 · 6 comments
Open

Fragalysis notebook breaks with newest structure content #12

AndreaVolkamer opened this issue Jul 2, 2021 · 6 comments

Comments

@AndreaVolkamer
Copy link
Member

AndreaVolkamer commented Jul 2, 2021

The fragalysis.ipynb notebook breaks when processing the newest version of the MPro data.

Two observations

  • Data set size (cells 11 and 12)
    • before: 404 -> 296 -> 295 = 109 structures filtered out
    • now: 493 -> 35 -> 21 = 472 structures filtered out
    • main reason for filtering out that 2 binding sites were detected, before mostly because of none (0) was detected.
  • Interaction fingerprint calculation fails (cell 14)
    • ValueError: Length mismatch: Expected axis has 305 elements, new values have 306 elements

@jaimergp can you please have a look? [adding @glass-w to this issue, because he might be working with the new data]

@AndreaVolkamer
Copy link
Member Author

AndreaVolkamer commented Jul 2, 2021

Note I changed the pdb_list in cell 6 to only read the '_bound.pdb' structures, because also '_apo.pdb' exist, but that did not change the error.

  • before: pdbs = list((DATA / "aligned").glob("**/*.pdb"))
  • now: pdbs = list((DATA / "aligned").glob("**/*_bound.pdb"))

@AndreaVolkamer
Copy link
Member Author

Besides still, most structures are omitted because of having 2 binding sites ...
image

@AndreaVolkamer
Copy link
Member Author

AndreaVolkamer commented Jul 2, 2021

checking the first file that is omitted:
data/Mpro/aligned/Mpro-P0008_0A/Mpro-P0008_0A_bound.pdb contains 2 binding sites and we want exactly one.
Revealed that it actually contains two chains with the same ligand (LIG) in each chain. There is also DMS in the structure but that is not considered since we restrict already to ligand_name="LIG".
@jaimergp I feel like we had tackled this issue before, but not sure how? [is there a way in plipify to select a specific chain only from the structure?]

@AndreaVolkamer
Copy link
Member Author

AndreaVolkamer commented Jul 7, 2021

@jaimergp and @glass-w : I've included a temporary fix, to split the files by chain, and restore them with '_chain[x]' suppl.
`

from Bio.PDB import *


for nr, filepath in enumerate(pdbs): 

   pdb_id = str(pdbs[nr]).split('/')[-1][:-4]
   chain_id = str(pdbs[nr]).split('/')[-1].split('_')[1][-1]
   new_filename=str(pdbs[nr])[:-4]+'_chain'+str(chain_id)+'.pdb'

   ## Read the PDB file and extract the chain from structure[0]
   model = PDBParser(PERMISSIVE=1,QUIET=1).get_structure(pdb_id, filepath)[0]
   ### Save new file
   io = PDBIO()
   io.set_structure(model[chain_id])
   io.save(new_filename)

# Reassign pdbs
pdbs = list((DATA / "aligned").glob("**/*_bound_chain*.pdb"))

`

This fixes the '2 binding sites' issue, but the ValueError: Length mismatch: Expected axis has 296 elements, new values have 305 elements remains.

@AndreaVolkamer
Copy link
Member Author

Respective update can be found in PR #11.

@AndreaVolkamer
Copy link
Member Author

@jaimergp note that we have a PR open (#13) regarding the visualization that contains the newest structure preprocessing. We have a workaround, but their is still a weird behavior when generating the fingerprints regarding sequence length and residues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant