Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disjoint assemblies: how to treat them? #59

Open
josemduarte opened this issue Aug 7, 2015 · 8 comments
Open

Disjoint assemblies: how to treat them? #59

josemduarte opened this issue Aug 7, 2015 · 8 comments
Labels
Milestone

Comments

@josemduarte
Copy link
Contributor

Following our assembly rules, at the moment we consider disjoint assemblies as valid assemblies (this only affects heteromeric protein crystals).

That introduces some issues in how to handle them:

We introduced them because we know of some examples where co-cristallization seems plausible, thus where a disjoint prediction would be desirable. See for instance: 2xqw, 1bui

@lafita
Copy link
Member

lafita commented Apr 12, 2016

What do you exactly mean by disjoint assemblies? By the meaning of the word and the description I understand that a disjoint assembly is a crystal where there are two or more different assemblies that do not share any biological interface.

An example being C3 assembly with A3 stoichiometry and C2 assembly with B2 stoichiometry in the AU.

In that case, I would find uninteresting to list them together and also display all the combinations (A3+B, A2+B2), because we really need to consider them as different assemblies (although they happen to be in the same crystal) and I would rather treat them separately and show one line for each assembly (one line for the C3-A3 assembly and another for the C2-B2 assembly).

This could solve the combinatorial explosion in displaying the results (and could be used as heuristic for computation), because the graphs of disjoint heteromeric assemblies become independent homomeric graphs, and it could also improve the interpretation.

@josemduarte
Copy link
Contributor Author

By disjoint assemblies we are talking about assemblies not sharing interfaces, e.g. stoichiometry A3C3 + stoichiometry B2. This we only allow in heteromeric cases. In homomeric cases we don't allow them because they would violate the isomorphism rule.

The idea of using one line for each of them in the WUI display is good. One problem with it is that so far each line of the assembly results page corresponds to a fully covering assembly (i.e. an assembly that covers all components in the crystal). Breaking that would require a few changes in data structures.

@sbliven
Copy link
Member

sbliven commented Apr 14, 2016

I think it is important that the assembly diagram display a complete covering of the unit cell. This is also required for the latticeGraph to be consistent.

I think we do want to be able to handle such cases in our scoring function, because we would like to list author annotations like 1e94 (eppic-science#63) that are co-crystals. These would presumably get a penalty.

With regard to the combinatorial explosion issue, I would suggest that we restrict the main assembly generation procedure to non-co-crystals, or at the most 2 disjoint complexes. Then we rely on the heuristic generation procedure to supply common co-crystals (like "all monomers").

BTW, I think that co-crystals will not turn out to be particularly uncommon due to crystallization factors like nanobodies and DARPINS, which would often be classified as xtal interfaces. So it may be worth including a restriction like "one complex plus some monomers".

@lafita
Copy link
Member

lafita commented Apr 14, 2016

Thank you for the explanations. Now that I understood better the problem, I was thinking more of the way to display the results. I agree that assemblies should cover the full unit cell, and that our data structures are designed for that, but the display should be focused on the biological significance of the assemblies, and disjoint assemblies mean that they are independent (co-crystals, like joining multiple independent crystals in one).

Maybe we can think of a way to keep the internal representation the same (data structures), but adapt the display. An idea I came up with is using the ID column to include multiple values if the assembly is disjoint. That way we could specify with very few rows all the combinations of disjoint assemblies. Now we are displaying multiple values in the macromolecular size, stoichiometry and symmetry columns.

As an example, the permutation for an A6(D3)+B6(D3) disjoint assembly are represented now as:

ID Macromolecular Size Stoichiometry Symmetry
1 1,1 A,B C1,C1
2 1,2 A,B2 C1,C2
3 1,3 A,B3 C1,C3
4 1,6 A,B6 C1,D3
5 2,1 A2,B C2,C1
6 2,2 A2,B2 C2,C2
7 2,3 A2,B3 C2,C3
8 2,6 A2,B6 C2,D3
9 3,1 A3,B C3,C1
10 3,2 A3,B2 C3,C2
11 3,3 A3,B3 C3,C3
12 3,6 A3,B6 C3,D3
13 6,1 A6,B D3,C1
14 6,2 A6,B2 D3,C2
15 6,3 A6,B3 D3,C3
16 6,6 A6,B6 D3,D3

The new representation would be:

ID Macromolecular Size Stoichiometry Symmetry
1,2,3,4 1 A C1
5,6,7,8 2 A2 C2
9,10,11,12 3 A3 C3
13,14,15,16 6 A6 D3
1,5,9,13 1 B C1
2,6,10,14 2 B2 C2
3,7,11,15 3 B3 C3
4,8,12,16 6 B6 D3

It is just an idea, so if the implementation is very difficult and the number of cases is very few (or these assemblies have always low score), it will probably not be worth implementing. Another issue that might arise is how to handle the 3D lattice graph and assembly diagram.

@lafita
Copy link
Member

lafita commented Apr 14, 2016

This issue overlaps a bit with #101, more focused on the wui aspect of disjoint assemblies.

@sbliven
Copy link
Member

sbliven commented Apr 15, 2016

Interesting idea. Perhaps we should distinguish between an assembly (full covering of the unit cell, formerly sometimes called the superassembly) and a complex (unique connected component of an assembly). This could in general be a many-to-many relationship (depending on what properties we assign to each concept).

One problem I see with an interface like this is that it's not clear which complexes are compatable. For instance, your table above doesn't include combination entries like (AB) or (AB)6. So how would we express in the WUI situations like "A6 requires one of the B* complexes but is incompatible with (AB)*". I like the idea of reducing visual redundancy, but I worry that it would require much more sophisticated users.

@lafita
Copy link
Member

lafita commented Apr 15, 2016

I did not include combination entries because I only wanted to show the differences in disjoint assemblies, but the idea is that if they are not disjoint the display is the same as it is now.

Assuming that in the case above both A and B are C6 instead of D3 and that they have interfaces between them, the table would continue as follows:

ID Macromolecular Size Stoichiometry Symmetry
17 2 AB C1
18 4 A2B2 C2
19 6 A3B3 C3
20 12 A6B6 D3

With this all possible combinations would be covered, assemblies 1 to 16 being disjoint (the display has been reduced) and assemblies 16 to 20 being combined AB.

The situation you described is expressed by the assembly ID. Because A6 does not have any ID in the range 17-20, it means that it is incompatible with any of the (AB)* complexes.

@sbliven
Copy link
Member

sbliven commented Nov 8, 2016

This would require introducing another layer to the assembly hierarchy, so it's unrealistic for a 3.0 release. For now we need to just display lots of redundant assemblies.

@sbliven sbliven added this to the 3.1 milestone Nov 8, 2016
@josemduarte josemduarte modified the milestones: 3.2, 3.3 Feb 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants