Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNA assembly scoring problems #161

Open
josemduarte opened this issue Jan 22, 2017 · 7 comments
Open

DNA assembly scoring problems #161

josemduarte opened this issue Jan 22, 2017 · 7 comments
Assignees
Labels
Milestone

Comments

@josemduarte
Copy link
Contributor

In DNA structures we can't use any of our indicators for contacts between nucleotide chains, even geometry doesn't apply for nucleotide chains. For interface scoring we solved it by having nopreds for contacts between nucleotide chains. But for assembly scoring we have no solution at the moment, e.g. an assembly of 2 chains of double-stranded DNA is called XTAL. Can we do better than that?

A good example is 2rt8 (NMR), now in dev server: it has only 1 interface between the 2 strands of DNA which we call nopred, but then the assembly that we call bio is the single strand, the double strand is called xtal. Any ideas for a way to treat this? Nopreds everywhere? a warning?

@josemduarte josemduarte added this to the 3.0 milestone Jan 22, 2017
@lafita
Copy link
Member

lafita commented Jan 22, 2017

As the name EPPIC stands for protein-protein interface classifier, I would say that it is not that bad if we do not provide a call for nucleotide assemblies (I mean NOPREDs everywhere).

Probably we can easily identify double stranded DNA helices, and produce a BIO call for them, but an extension of the algorithm would be needed together with some benchmarking to make up a call for other less clear cases.

@lafita
Copy link
Member

lafita commented Jan 22, 2017

What happens for nucleotide-protein interfaces?

@josemduarte
Copy link
Contributor Author

Yes definitely I'd go for NOPREDs as the best solution. I wouldn't even bother calling the DNA double strands bio, the nopred offers a more honest assessment.

Note that for nucleotide-protein interfaces we are able to score them based on the protein side only.

In any case for assemblies we'd need to catch those that are made of exclusively nucleotide chains and assign the NOPREDs to them. For assemblies with mixed protein/nucleotide chains, in principle we can score them based at least on some of the interfaces.

@sbliven
Copy link
Member

sbliven commented Jan 30, 2017

I'd go so far as to say that we should not generate assemblies that include an all-NOPRED interface. Then it's like we ignore nucleotide-nucleotide interfaces completely.

lafita added a commit to lafita/eppic that referenced this issue Jan 30, 2017
Because it is a probability, the NOPRED score should be the most
uncertain value.
lafita added a commit to lafita/eppic that referenced this issue Jan 30, 2017
General solution for DNA helix described in eppic-team#161
@lafita
Copy link
Member

lafita commented Jan 30, 2017

I agree with @sbliven. It would be a good solution to ignore them, since we cannot score them properly.

This issue has a related one: what do we do when more than one high scoring assembly is present in the crystal? Do we call all of them BIO? I have implemented a solution where I call all the high scoring assemblies the same (as NOPRED, but it can be changed), and the other lower scoring as XTAL.

@sbliven also proposed to choose the assembly with the lowest stoichiometry as BIO and others as XTAL. I will create a pull request with my solution, but it can be changed to what we agree.

The example that made me thing about it, although the tie is due to the DNA interface, is 2rt8:

# Topologically valid assemblies in 2rt8
 id   Interf cluster ids       Size   Stoichiometry        Symmetry      Score    Predicted by
  1                   {}        1,1             A,A           C1,C1       0.50                
  2                  {1}          2             A B              C1       0.50            pdb1

These cases should be very rare though.

josemduarte added a commit that referenced this issue Jan 30, 2017
Assembly scoring: fix #158 and solve ties in assembly scoring #161
lafita added a commit to lafita/eppic that referenced this issue Feb 9, 2017
This ensures we always have a BIO assembly, and only one, for each
structure.
@lafita
Copy link
Member

lafita commented Feb 9, 2017

I think that with the current solution (see PR #166) we can make the release, so I will assign the further discussion of this issue to the 3.1 milestone.

@lafita lafita modified the milestones: 3.1, 3.0 Feb 9, 2017
@lafita lafita added the question label Feb 9, 2017
josemduarte added a commit that referenced this issue Feb 9, 2017
Call reason message for assemblies and address #161
@lafita lafita modified the milestones: 3.1, 3.0.2, 3.0.3 Sep 14, 2017
@lafita
Copy link
Member

lafita commented Sep 14, 2017

We said we could implement a very naive scoring for nucleotide only interfaces. We could calculate the number of base pairs between the two chains (using the distances between H-bonding atoms) and then give probability one if there are more than, say, two bp, or probability zero if there are less.

@josemduarte josemduarte modified the milestones: 3.0.3, 3.0.4 Oct 19, 2017
@josemduarte josemduarte modified the milestones: 3.0.4, 3.0.5 Jan 23, 2018
@josemduarte josemduarte modified the milestones: 3.0.5, 3.0.6 May 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants