DNA assembly scoring problems #161

josemduarte · 2017-01-22T20:48:11Z

In DNA structures we can't use any of our indicators for contacts between nucleotide chains, even geometry doesn't apply for nucleotide chains. For interface scoring we solved it by having nopreds for contacts between nucleotide chains. But for assembly scoring we have no solution at the moment, e.g. an assembly of 2 chains of double-stranded DNA is called XTAL. Can we do better than that?

A good example is 2rt8 (NMR), now in dev server: it has only 1 interface between the 2 strands of DNA which we call nopred, but then the assembly that we call bio is the single strand, the double strand is called xtal. Any ideas for a way to treat this? Nopreds everywhere? a warning?

lafita · 2017-01-22T21:01:24Z

As the name EPPIC stands for protein-protein interface classifier, I would say that it is not that bad if we do not provide a call for nucleotide assemblies (I mean NOPREDs everywhere).

Probably we can easily identify double stranded DNA helices, and produce a BIO call for them, but an extension of the algorithm would be needed together with some benchmarking to make up a call for other less clear cases.

lafita · 2017-01-22T21:03:04Z

What happens for nucleotide-protein interfaces?

josemduarte · 2017-01-22T21:11:17Z

Yes definitely I'd go for NOPREDs as the best solution. I wouldn't even bother calling the DNA double strands bio, the nopred offers a more honest assessment.

Note that for nucleotide-protein interfaces we are able to score them based on the protein side only.

In any case for assemblies we'd need to catch those that are made of exclusively nucleotide chains and assign the NOPREDs to them. For assemblies with mixed protein/nucleotide chains, in principle we can score them based at least on some of the interfaces.

sbliven · 2017-01-30T15:31:31Z

I'd go so far as to say that we should not generate assemblies that include an all-NOPRED interface. Then it's like we ignore nucleotide-nucleotide interfaces completely.

Because it is a probability, the NOPRED score should be the most uncertain value.

General solution for DNA helix described in eppic-team#161

lafita · 2017-01-30T15:58:07Z

I agree with @sbliven. It would be a good solution to ignore them, since we cannot score them properly.

This issue has a related one: what do we do when more than one high scoring assembly is present in the crystal? Do we call all of them BIO? I have implemented a solution where I call all the high scoring assemblies the same (as NOPRED, but it can be changed), and the other lower scoring as XTAL.

@sbliven also proposed to choose the assembly with the lowest stoichiometry as BIO and others as XTAL. I will create a pull request with my solution, but it can be changed to what we agree.

The example that made me thing about it, although the tie is due to the DNA interface, is 2rt8:

# Topologically valid assemblies in 2rt8
 id   Interf cluster ids       Size   Stoichiometry        Symmetry      Score    Predicted by
  1                   {}        1,1             A,A           C1,C1       0.50                
  2                  {1}          2             A B              C1       0.50            pdb1

These cases should be very rare though.

Assembly scoring: fix #158 and solve ties in assembly scoring #161

This ensures we always have a BIO assembly, and only one, for each structure.

lafita · 2017-02-09T12:11:35Z

I think that with the current solution (see PR #166) we can make the release, so I will assign the further discussion of this issue to the 3.1 milestone.

Call reason message for assemblies and address #161

lafita · 2017-09-14T16:07:14Z

We said we could implement a very naive scoring for nucleotide only interfaces. We could calculate the number of base pairs between the two chains (using the distances between H-bonding atoms) and then give probability one if there are more than, say, two bp, or probability zero if there are less.

josemduarte added this to the 3.0 milestone Jan 22, 2017

josemduarte assigned lafita Jan 22, 2017

lafita added a commit to lafita/eppic that referenced this issue Jan 30, 2017

Set NOPRED score for CombinedPredictor to 0.5 eppic-team#161

7082dc3

Because it is a probability, the NOPRED score should be the most uncertain value.

lafita added a commit to lafita/eppic that referenced this issue Jan 30, 2017

Solve scoring ties with NOPRED

85600e8

General solution for DNA helix described in eppic-team#161

lafita mentioned this issue Jan 30, 2017

Assembly scoring: fix #158 and solve ties in assembly scoring #161 #163

Merged

josemduarte added a commit that referenced this issue Jan 30, 2017

Merge pull request #163 from lafita/assemblyScore

b8246a6

Assembly scoring: fix #158 and solve ties in assembly scoring #161

lafita added a commit to lafita/eppic that referenced this issue Feb 9, 2017

Solve ties by the lowest stoichiometry assembly eppic-team#161

41592d3

This ensures we always have a BIO assembly, and only one, for each structure.

lafita modified the milestones: 3.1, 3.0 Feb 9, 2017

lafita added the question label Feb 9, 2017

josemduarte added a commit that referenced this issue Feb 9, 2017

Merge pull request #166 from lafita/master

797db52

Call reason message for assemblies and address #161

josemduarte mentioned this issue Sep 14, 2017

Assembly scoring ties, how to treat them? #193

Open

lafita modified the milestones: 3.1, 3.0.2, 3.0.3 Sep 14, 2017

josemduarte mentioned this issue Sep 14, 2017

Problem in nucleotide-protein interface scoring #194

Open

josemduarte modified the milestones: 3.0.3, 3.0.4 Oct 19, 2017

josemduarte modified the milestones: 3.0.4, 3.0.5 Jan 23, 2018

josemduarte modified the milestones: 3.0.5, 3.0.6 May 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNA assembly scoring problems #161

DNA assembly scoring problems #161

josemduarte commented Jan 22, 2017

lafita commented Jan 22, 2017

lafita commented Jan 22, 2017

josemduarte commented Jan 22, 2017

sbliven commented Jan 30, 2017

lafita commented Jan 30, 2017 •

edited

Loading

lafita commented Feb 9, 2017

lafita commented Sep 14, 2017

DNA assembly scoring problems #161

DNA assembly scoring problems #161

Comments

josemduarte commented Jan 22, 2017

lafita commented Jan 22, 2017

lafita commented Jan 22, 2017

josemduarte commented Jan 22, 2017

sbliven commented Jan 30, 2017

lafita commented Jan 30, 2017 • edited Loading

lafita commented Feb 9, 2017

lafita commented Sep 14, 2017

lafita commented Jan 30, 2017 •

edited

Loading