Disjoint assemblies: how to treat them? #59

josemduarte · 2015-08-07T15:10:59Z

Following our assembly rules, at the moment we consider disjoint assemblies as valid assemblies (this only affects heteromeric protein crystals).

That introduces some issues in how to handle them:

there's several subassemblies per assembly
each subassembly needs to be scored separately
combining those scores into a final score for the assembly is also not straight forward
displaying the different subassemblies is complicated, see Displaying PDB assembly predictions, EPPIC disjoint assemblies and invalid assemblies #54
adds too many combinations in the final list of valid assemblies, see also Combinatorial explosion in heteromeric assemblies with many entities #48

We introduced them because we know of some examples where co-cristallization seems plausible, thus where a disjoint prediction would be desirable. See for instance: 2xqw, 1bui

lafita · 2016-04-12T10:52:52Z

What do you exactly mean by disjoint assemblies? By the meaning of the word and the description I understand that a disjoint assembly is a crystal where there are two or more different assemblies that do not share any biological interface.

An example being C3 assembly with A3 stoichiometry and C2 assembly with B2 stoichiometry in the AU.

In that case, I would find uninteresting to list them together and also display all the combinations (A3+B, A2+B2), because we really need to consider them as different assemblies (although they happen to be in the same crystal) and I would rather treat them separately and show one line for each assembly (one line for the C3-A3 assembly and another for the C2-B2 assembly).

This could solve the combinatorial explosion in displaying the results (and could be used as heuristic for computation), because the graphs of disjoint heteromeric assemblies become independent homomeric graphs, and it could also improve the interpretation.

josemduarte · 2016-04-13T18:18:13Z

By disjoint assemblies we are talking about assemblies not sharing interfaces, e.g. stoichiometry A3C3 + stoichiometry B2. This we only allow in heteromeric cases. In homomeric cases we don't allow them because they would violate the isomorphism rule.

The idea of using one line for each of them in the WUI display is good. One problem with it is that so far each line of the assembly results page corresponds to a fully covering assembly (i.e. an assembly that covers all components in the crystal). Breaking that would require a few changes in data structures.

sbliven · 2016-04-14T07:38:58Z

I think it is important that the assembly diagram display a complete covering of the unit cell. This is also required for the latticeGraph to be consistent.

I think we do want to be able to handle such cases in our scoring function, because we would like to list author annotations like 1e94 (eppic-science#63) that are co-crystals. These would presumably get a penalty.

With regard to the combinatorial explosion issue, I would suggest that we restrict the main assembly generation procedure to non-co-crystals, or at the most 2 disjoint complexes. Then we rely on the heuristic generation procedure to supply common co-crystals (like "all monomers").

BTW, I think that co-crystals will not turn out to be particularly uncommon due to crystallization factors like nanobodies and DARPINS, which would often be classified as xtal interfaces. So it may be worth including a restriction like "one complex plus some monomers".

lafita · 2016-04-14T14:55:30Z

Thank you for the explanations. Now that I understood better the problem, I was thinking more of the way to display the results. I agree that assemblies should cover the full unit cell, and that our data structures are designed for that, but the display should be focused on the biological significance of the assemblies, and disjoint assemblies mean that they are independent (co-crystals, like joining multiple independent crystals in one).

Maybe we can think of a way to keep the internal representation the same (data structures), but adapt the display. An idea I came up with is using the ID column to include multiple values if the assembly is disjoint. That way we could specify with very few rows all the combinations of disjoint assemblies. Now we are displaying multiple values in the macromolecular size, stoichiometry and symmetry columns.

As an example, the permutation for an A6(D3)+B6(D3) disjoint assembly are represented now as:

ID	Macromolecular Size	Stoichiometry	Symmetry
1	1,1	A,B	C1,C1
2	1,2	A,B2	C1,C2
3	1,3	A,B3	C1,C3
4	1,6	A,B6	C1,D3
5	2,1	A2,B	C2,C1
6	2,2	A2,B2	C2,C2
7	2,3	A2,B3	C2,C3
8	2,6	A2,B6	C2,D3
9	3,1	A3,B	C3,C1
10	3,2	A3,B2	C3,C2
11	3,3	A3,B3	C3,C3
12	3,6	A3,B6	C3,D3
13	6,1	A6,B	D3,C1
14	6,2	A6,B2	D3,C2
15	6,3	A6,B3	D3,C3
16	6,6	A6,B6	D3,D3

The new representation would be:

ID	Macromolecular Size	Stoichiometry	Symmetry
1,2,3,4	1	A	C1
5,6,7,8	2	A2	C2
9,10,11,12	3	A3	C3
13,14,15,16	6	A6	D3
1,5,9,13	1	B	C1
2,6,10,14	2	B2	C2
3,7,11,15	3	B3	C3
4,8,12,16	6	B6	D3

It is just an idea, so if the implementation is very difficult and the number of cases is very few (or these assemblies have always low score), it will probably not be worth implementing. Another issue that might arise is how to handle the 3D lattice graph and assembly diagram.

lafita · 2016-04-14T14:58:26Z

This issue overlaps a bit with #101, more focused on the wui aspect of disjoint assemblies.

sbliven · 2016-04-15T11:17:40Z

Interesting idea. Perhaps we should distinguish between an assembly (full covering of the unit cell, formerly sometimes called the superassembly) and a complex (unique connected component of an assembly). This could in general be a many-to-many relationship (depending on what properties we assign to each concept).

One problem I see with an interface like this is that it's not clear which complexes are compatable. For instance, your table above doesn't include combination entries like (AB) or (AB)6. So how would we express in the WUI situations like "A6 requires one of the B* complexes but is incompatible with (AB)*". I like the idea of reducing visual redundancy, but I worry that it would require much more sophisticated users.

lafita · 2016-04-15T13:04:40Z

I did not include combination entries because I only wanted to show the differences in disjoint assemblies, but the idea is that if they are not disjoint the display is the same as it is now.

Assuming that in the case above both A and B are C6 instead of D3 and that they have interfaces between them, the table would continue as follows:

ID	Macromolecular Size	Stoichiometry	Symmetry
17	2	AB	C1
18	4	A2B2	C2
19	6	A3B3	C3
20	12	A6B6	D3

With this all possible combinations would be covered, assemblies 1 to 16 being disjoint (the display has been reduced) and assemblies 16 to 20 being combined AB.

The situation you described is expressed by the assembly ID. Because A6 does not have any ID in the range 17-20, it means that it is incompatible with any of the (AB)* complexes.

sbliven · 2016-11-08T11:52:26Z

This would require introducing another layer to the assembly hierarchy, so it's unrealistic for a 3.0 release. For now we need to just display lots of redundant assemblies.

josemduarte added the question label Aug 7, 2015

josemduarte mentioned this issue Apr 7, 2016

Displaying disjoint assemblies #101

Closed

sbliven added this to the 3.1 milestone Nov 8, 2016

josemduarte modified the milestones: 3.2, 3.3 Feb 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disjoint assemblies: how to treat them? #59

Disjoint assemblies: how to treat them? #59

josemduarte commented Aug 7, 2015

lafita commented Apr 12, 2016

josemduarte commented Apr 13, 2016

sbliven commented Apr 14, 2016

lafita commented Apr 14, 2016 •

edited

Loading

lafita commented Apr 14, 2016

sbliven commented Apr 15, 2016

lafita commented Apr 15, 2016 •

edited

Loading

sbliven commented Nov 8, 2016

Disjoint assemblies: how to treat them? #59

Disjoint assemblies: how to treat them? #59

Comments

josemduarte commented Aug 7, 2015

lafita commented Apr 12, 2016

josemduarte commented Apr 13, 2016

sbliven commented Apr 14, 2016

lafita commented Apr 14, 2016 • edited Loading

lafita commented Apr 14, 2016

sbliven commented Apr 15, 2016

lafita commented Apr 15, 2016 • edited Loading

sbliven commented Nov 8, 2016

lafita commented Apr 14, 2016 •

edited

Loading

lafita commented Apr 15, 2016 •

edited

Loading