-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for undertermined topology #13
Comments
I traced the first issue to a problem with the configuration resolution algorithm where instead of failing because of redundant parents, the error was because the parent position was Commit 1b67dc3 (master) has the fixed resolution algorithm. As for the feature request, the algorithm is relatively straight-forward when unknown positions aren't present. import itertools
def iterconfigurations_glycan(glycan):
ambiguous_links = []
# find all ambiguous links
for i, link in glycan.iterlinks():
if link.is_ambiguous():
ambiguous_links.append(link)
# produce cross-product of all possible configurations
combos = itertools.product(*[link.iterconfiguration() for link in ambiguous_links])
for configs in combos:
# configurations where the same parent id and position appears multiple
# times should be skipped
if len(set([(c[0], c[2]) for c in configs])) < len(configs):
continue
# yield the tuples of pairings of (link, configuration)
yield tuple(zip(ambiguous_links, configs)) Dealing with unknown positions is going to be more difficult since we need to track the available sites for each putative parent, which I'll need to think more about. Did you want this "iterconfiguration for glycans" function to be a method on the |
By eye, I think the issue with not checking multiple potential parents is still likely to be an issue, but I guess I'll have to find another example to demonstrate that. :-) Yes, on the surface, merely doing the product of all the link iterconfigurations is sufficient - the meat of the problem lies in correctly handling their interactions. Your if statement looks pretty good, except for the wildcard parent positions. Perhaps there is a case to be made for turning missing positions into an enumerated list of all possible ring positions - then the special cases for -1 could be dropped. Yes, I imagined this in the Glycan class - yielding each possible configuration of the entire structure. Note that the enumeration can be very large. |
I can see what you mean. The |
Its seems like the two pieces of code Glycan.iterconfiguration and AmbigousLink.find_open_position are kinda implementing the same logic, though in different ways. I didn't like the ambiguous position implementation or the links stored in a position key'ed dictionary initially, but I can see its advantages in the attempt to figure out the interactions between AmbigousLinks. |
I've pushed a draft implementation of Glycan.iterconfiguration. The theoretical organization works for variable parents. It doesn't discriminate between redundant ambiguous configurations yet. I'm concerned there's more state that needs to be tracked, but I don't have a collection of examples to work on. |
Thank you for embarking on this scary and uncertain path!
I took a look at the GlycoCT parsing implementation and how the resulting ambiguous parent links are represented in the Glycan/Monosaccharide/Link classes. Pretty similar to how I've prototyped this previously. I like how ambiguity in the the parent node and position are handled consistently.
First, a bug: AmbiguousLink.find_open_position fails when there are two undetermined sections with the same list of possible parents, but a single parent_position specified for each. The method only checks the parent data member (first parent from the list of possibilities), so the call to find_open_position for the second AmbiguousLink fails. This happens at GlycoCT parse time.
The GlyTouCan accession that showed me this issue is: G41857GD
Stack trace:
Second, a feature request - implementation of an AmbiguousLink.iterconfiguration type method for Glycans, which does the combinatorics for all AmbiguousLink configurations (which interact with each other, as per the issue above). Perhaps the find_open_position logic will be sufficient, but I'm not sure.
Thanks...
The text was updated successfully, but these errors were encountered: