Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Residue_Contacts() generates contacts beyond the cutoff and plot() and most_common() contacts do not match #130

Open
sam-mahdi opened this issue Sep 2, 2023 · 0 comments

Comments

@sam-mahdi
Copy link

I'm a bit confused why I am getting this behavior, I am using chain A of PDBs 1mmi and 1jql

Looking at the contact map difference

import mdtraj as md
from contact_map import ContactFrequency
from contact_map import AtomMismatchedContactDifference
import matplotlib.pyplot as plt


extended_monomer=md.load('1mmi.pdb')
bent_monomer=md.load('1jql.pdb')
extended_map=ContactFrequency(extended_monomer)
bent_map=ContactFrequency(bent_monomer)
difference = AtomMismatchedContactDifference(extended_map, bent_map)
difference.residue_contacts.plot()
plt.show()

We can see a contact being formed via 136/353. Looking at the pdb for these residues, they are not even remotely close (20A away). The same is also true of contacts 140/356 and 140/358.

What's strange is if I try and look at the contacts for 353

difference.residue_contacts.most_common(difference.topology.residue(353))

For 1, it doesn't connect to the proper residue

[([SER354, ALA357], 1.0), ([ARG137, SER354], 1.0), ([SER354, GLU350], 0.0), ([SER354, ASP351], 0.0)]

I don't know why it's connecting to the residue after it (i.e. 354 instead of 353). But for 2, even if we look at SER354, none of these matches line up with the contact map plot. The same is true of ALA353, it shows contacts in the plot, but nothing in the most_common(). This is because there is a difference between most_common() and the plot. If I look at the pair_list for SER354

[354, 349] 0.0
[336, 354] -1.0

Versus the most_common() output for SER354

[([SER354, ALA357], 1.0), ([ARG137, SER354], 1.0), ([SER354, GLU350], 0.0), ([SER354, ASP351], 0.0)]

There are completely different. The same is true for ALA357, where most_common() showed no contacts, but looking at the pair_list being plotted

[353, 356] 1.0
[353, 349] 0.0
[353, 350] 0.0
[136, 353] 1.0

We can see there are actually 2 contacts (one of them being the non-sense 136/353.Same with SER354, we can see a non-sense contact ([ARG137, SER354], 1.0), these are much further than the cut-off to make sense. Setting the cutoff manually does not fix this.

What's interesting is this appears to be an issue isolated to residue_contacts(). If I modify the pdb files so that their atoms perfectly match. The contacts make sense (SER368 is the same as SER354, just different numbering)

([SER368-N, ASP365-OD2], 0.0) ([SER368-N, ASP365-O], 0.0) ([SER368-N, ASP365-CB], 0.0) ([SER368-N, ASP365-OD1], 0.0) ([SER368-N, ASP365-CG], 0.0) ([SER368-N, ASP365-C], -1.0)

However the contact map plot for the identical atom pdbs doesn't match the contacts. If we look at the pairs being plotted

[368, 189] 0.0 [368, 358] 0.0 [368, 359] 0.0 [368, 191] 0.0 [360, 368] 0.0 [368, 190] 0.0

For 1, it differs from the most_common() contacts.
[([GLU364, SER368], 0.0), ([ASP365, SER368], 0.0), ([SER368, ARG151], -1.0), ([SER368, ALA371], -1.0)]
For 2 it has non-sensical results again (368/190 is 20A apart).

Furthermore, the command

difference.residue_contacts.most_common(difference.topology.residue(364))

is offset by 4 in this scenario for some reason (i.e. res 364 coincides with 368 matches, it should be noted this is only when attempting to use this command. If I loop through the contacts themselves

for items in difference.residue_contacts.most_common():
	if str(items[0][0]) == 'SER368' or str(items[0][1]) == 'SER368':
		print(items)

This has no issue.

So in total, there appears to be 3 issues:

  1. The plots do not match the contacts in most_common(). Whether the atoms are identical, or if AtomMismatchedContactDifference is used. Either method results in matches where the contact map plot does not correlate with the contacts themselves. I do not know if this was intentional
  2. The residue_contacts() generates non-sensical contacts far beyond the cutoff. Contacts that do not exist when using residue_atoms (which makes contacts that do make sense)
  3. difference.residue_contacts.most_common(difference.topology.residue(res)) has an indexing issue (looping through difference.residue_contacts.most_common() shows there is nothing offset in there, so the issue is with the topology.residue itself.

I have tested all the above issues playing around with neighbors ignored, different querys, and different cutoffs. They did not fix any of the above problems. These were tested in a virtual environment.

I don't quite know where these problems are arising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant