Add groups feature #543

cdiener · 2017-07-10T18:51:12Z

Feature description

Allow cobrapy objects to be annotated with groups like the legacy subsystem. This is supposed to increase compatibility with SBML groups.

Possible API

Add a groups attribute to metabolites, reactions and genes that can be used to store arbitrary grouping attributes. Could be a lightweight object or list. Example:

r = model.reactions[0]
r.groups
# Output = ["TCA cycle", "mitochondrial metabolism"]

The SBML interface would read and write the groups.

or to support the kind attribute

r.groups
# Output = [Group(kind="classification", name="TCA cycle"), Group(kind="classification", name="mitochondrial metabolism"]

Problems

SBML groups allows nested groups like (group1, species1, species2). This is not compatible with the current cobrapy API that assigns groups as properties onto metabolites, reactions or genes. However, building a full fledged group system like model.groups would be hard to maintain since one has to sync those with the context manager and model modifications which is not necessary in the proposed API. However, to use it like that we would have to ignore all nested groups since SBML specifically prohibits flattening of the group hierarchy:

If a Member references a Group, it is the Group itself that is considered to be a member of the parent Group, not 33
the corresponding referenced element (or elements).

The text was updated successfully, but these errors were encountered:

pstjohn · 2017-07-12T10:55:14Z

I don't think a full-fledged solution would be that hard to code up. a Group would just be a List-like object with __add__ and __sub__ methods to handle contexts.

A while back I played around with a metabolite group summary. Essentially ignore flows within the group, just look at how fluxes enter/exit. So I could certainly revive that with a group.summary().

I would expect the harder problems would be syncing it across the pickle, json, and sbml saved file formats while respecting older formats that would be missing them.

Midnighter · 2017-07-12T13:43:16Z

In my view of the hierarchy I would do two three things:

Add an attribute groups to Model which is a dict of sets, i.e., Group -> Member where a member can be anything including another Group.
Add an attribute groups to Reaction, Metabolite, Gene which is a set.
Model.remove_ will have to delete the above components from the relevant groups.

Group itself becomes a new class if necessary so it can manage its resources better.

cdiener · 2017-07-12T16:15:04Z

Okay, you convinced me 😄

@Midnighter Your suggestion looks good to me. Maybe the groups attribute on metabolites etc. should just be ids (strings) to not create unwanted references to Model.

@pstjohn Seems like a good idea to me that the groups feature should be its own object since you could add useful methods to it like to_sbml, as_dataframe (to be joinable with solution.fluxes) etc. Should the list of groups be the objective (dict of sets) or just the individual groups (sets)?

cdiener · 2017-07-12T16:16:20Z

@Midnighter the pickle format is not backwards compatible as it is for now. You can not unpickle cobra 0.5 models in 0.6 for instance since they have different attributes and methods. JSON and SBMl should be straight forward.

matthiaskoenig · 2018-03-23T11:40:54Z

The SBML import export of groups (subsystems) is straight-forward. As soon as someone implements the groups on Model, Reaction, Species and Gene as proposed by @Midnighter I would implement the groups support.

silviamorins · 2018-06-18T08:56:42Z

Hi,
I am dealing with this same issue (not being able to directly export COBRApy subsystems ad SBML groups): this feature is up to now not yet implemented, right?

Midnighter · 2018-06-18T09:09:49Z

Unfortunately, not yet. With a better SBML parser in the works, compartment classes imminent, I can definitely see these coming soon, though. We will prioritize cobrapy development in our group for all of August so latest then, unless you want to take a stab at it? :)

gregmedlock · 2018-06-20T15:54:50Z

If Silvia is not interested in implementing it, I would be happy to try. To restate the to-dos that have been discussed so far, just for my own understanding:

Create a new class, Group, defined in cobra.core.group.py, which will be a subclass of Object
Create a groups attribute in Model, which is a dictionary with keys of type Group and values that are sets of type Reaction, Metabolite, Gene, or Group.
Create a groups attribute in Reaction, Metabolite, and Gene that is a set containing references to objects of type Group

Based on this summary, I have a few questions:

For the groups attribute in Model, would it be more consistent with the reactions, metabolites, and genes attributes to use a DictList rather than a dictionary? In this case, the DictList would have Group ids as keys, Groups as values, and each Group would have a members attribute, which is a set of Reaction, Metabolite, Gene, or Group.
Similarly, would we prefer avoiding new references to Model in the groups attribute for Reaction, Metabolite, and Gene, as @cdiener mentioned? I favor using a set as in the original suggestions by @Midnighter so that the structure is similar to the rest of the codebase (e.g. Reactions have an attribute _genes that is a set containing references to Genes) but I don't think I fully understand potential consequences of adding new references.
Since any Species subclass should be able to belong to a Group, should we give Species the group attribute rather than defining it in both Metabolite and Gene?

Also, considering the kind attribute--from the Groups package specification linked in the top post, there are three values for kind a group can take: classification, partonomy, or collection. I favor implementing kind as an attribute of Group rather than just picking "collection" by default when parsing/writing, since it would be nice to be able to label particular metabolites with a "classification" (e.g. "highly polar", "low molecular weight") or with subsystems, which would fall under "partonomy" IMO.

Thoughts?

silviamorins · 2018-06-21T11:45:38Z

sorry for my MIA,
I'm actually coming from the biology side and just approached systems biology at the beginning of this year, so I would say it's better for the community if I keep myself away from implementing attempts... Thank you again for the help!

cdiener · 2018-06-29T17:05:54Z

Having spent some time now with the new API I would retract my concerns about the references to model. Groups are always defined in the context of a model so it makes sense they have a reference to it. We did have a lot of bugs/problems recently related to some properties similar to that of groups that were mostly caused by having the same information in several places (for instance with reactions and compartments). So it would rather argue for having a "single source of truth" now. This means that the assignment into groups should only happen once, for instance in Model.groups and all other objects should generate this information dynamically. So for instance Reaction.group would just look up in Model.groups and get the info dynamically from there. This might be a bit lower but is much more maintainable with the context manager etc.

Midnighter · 2018-06-29T18:30:03Z

I very much agree with a single source of truth. Also, SBML defines groups as having a list of members whereas the group information is not defined at all on the reaction.

This brings up the argument again of do we need Reaction.groups as an attribute/property at all? Having all those multiple ways of accessing the same information is actually the source of our troubles, I believe. We might decide that for the sake of convenience that we do want that or we might go a different way. Rather than Reaction.groups we might add a method to the model Model.get_associated_groups, for example.

def get_associated_groups(self, element):
   return [g for g in self.groups if element in g]

gregmedlock · 2018-07-10T15:41:02Z

So it sounds like the optimal solution is a new Groups attribute for Model that stores objects of the new class Group, and a getter function that searches through Groups to lookup group membership for individual species. I agree the lookup might be slow but I don't think too many users would be concerned about speed here (e.g. you could just create your own dict from Model.Groups to map all elements to their groups if you needed to repetitively determine group membership).

Let me know if I missed anything, otherwise I will get to work on a solution.

Midnighter · 2018-07-10T18:13:38Z

Sounds about right. Initially I wanted to recommend making the class Group inherit from set but groups can be elements of other groups and sets are unhashable so they can not be members of other sets. I guess an attribute members which is a set would do the trick. Other than that I would try to keep the Group class rather sparse and only implement the minimum interface to satisfy the SBML requirements.

matthiaskoenig · 2018-07-11T07:13:04Z

I will write the SBML groups import and export for this. Just let me know when there is a semi-stable version of this feature, i.e. the Group class.

…

On Tue, Jul 10, 2018 at 8:13 PM Moritz E. Beber ***@***.***> wrote: Sounds about right. Initially I wanted to recommend making the class Group inherit from set but groups can be elements of other groups and sets are unhashable so they can not be members of other sets. I guess an attribute members which is a set would do the trick. Other than that I would try to keep the Group class rather sparse and only implement the minimum interface to satisfy the SBML requirements. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#543 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA29umeni5dKFc6yhCu7W-n3pKsPzUY6ks5uFO7TgaJpZM4OTQYC> .

-- Matthias König, PhD. Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology https://livermetabolism.com [email protected] https://twitter.com/konigmatt https://github.com/matthiaskoenig Tel: +49 30 2093 98435

cdiener · 2018-07-13T19:27:33Z

@gregmedlock's proposal looks good to me. I would like some utility methods to get groups across a species type and group name as a pandas DataFrame. For instance getting the subsystem for all reactions which was one of the initial motivations to implement this feature.

ChristianLieven · 2018-08-13T14:33:36Z

@gregmedlock How's your progress on this so far?

I'd like to help this along however I can in the coming weeks, but I also don't want to step on your toes if you've already started with it. Could you give me a quick update on where you're at?

gregmedlock · 2018-08-13T15:24:55Z

My plan is to spend next week on it, start to finish, although I see @Midnighter has assigned it to @AlbaLopez. If next week is too late I am happy to let anyone else take the reins.

Midnighter · 2018-08-15T19:34:11Z

Hi Greg,
Sorry, for just "taking away" the issue from you. I didn't see any activity in a while and @AlbaLopez was in favour of taking on this issue. I should have asked you about it beforehand, though. As things stand, though, Alba won't get around to working on this soon, so if you're still willing why don't you work on it next week?

gregmedlock · 2018-08-16T13:45:07Z

No worries at all, a month is a long time to leave the issue hanging with no update! Will make the PR as soon as it's ready.

gregmedlock · 2018-08-21T15:07:53Z

@cdiener for the utility function, would returning a dataframe of booleans with element id's as rownames (genes, reactions, metabolites) and group id's as column names meet your desired use cases? The function would just take any list of metabolites, reactions, and/or genes, e.g.

reaction_group_associations = model.get_group_associations(model.reactions)

Second question... would Model start to get a bit crowded if we include such a utility function in it? solver is the only class with a similar function. I could throw it in cobra.util.util for now, or create cobra.util.groups if we anticipated need more group-related utility functions for SBML io.

Midnighter · 2018-08-21T17:12:32Z

I'm definitely in favor of becoming a little more function-oriented and not to tack on every method to the model class.

matthiaskoenig · 2019-03-18T09:21:21Z

The new libsbml parser reads and writes groups (and subsystem information).
@Midnighter implemented the groups feature.
Remaining issues with groups will be part of #760

cdiener added the SBML Related to reading and writing SBML models. label Jul 10, 2017

cdiener mentioned this issue Jul 10, 2017

cobra.io.write_sbml_model does not write reaction.subsystem to xml file #473

Closed

cdiener mentioned this issue Jul 14, 2017

Non-integer charge #534

Closed

Midnighter added this to the COBRApy 1.0 milestone Jul 29, 2018

Midnighter assigned AlbaLopez Aug 6, 2018

kvikshaug unassigned AlbaLopez Aug 20, 2018

This was referenced Aug 21, 2018

re-pickle the pickled test models upon running tests #750

Closed

Feature/implement groups #754

Closed

Midnighter self-assigned this Sep 27, 2018

Midnighter modified the milestones: COBRApy 1.0, Sprint 39 Sep 27, 2018

kvikshaug removed this from the Sprint 39 milestone Oct 29, 2018

matthiaskoenig self-assigned this Feb 27, 2019

matthiaskoenig mentioned this issue Mar 6, 2019

libsbml based SBML parser #685

Merged

matthiaskoenig closed this as completed Mar 18, 2019

matthiaskoenig mentioned this issue Apr 8, 2019

Make a decision on the behavior of Compartments #760

Open

akaviaLab mentioned this issue Feb 26, 2020

Reactions are unaware of groups (and subsystems) #938

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add groups feature #543

Add groups feature #543

cdiener commented Jul 10, 2017 •

edited

Loading

pstjohn commented Jul 12, 2017

Midnighter commented Jul 12, 2017

cdiener commented Jul 12, 2017

cdiener commented Jul 12, 2017

matthiaskoenig commented Mar 23, 2018

silviamorins commented Jun 18, 2018

Midnighter commented Jun 18, 2018

gregmedlock commented Jun 20, 2018

silviamorins commented Jun 21, 2018 •

edited

Loading

cdiener commented Jun 29, 2018

Midnighter commented Jun 29, 2018

gregmedlock commented Jul 10, 2018

Midnighter commented Jul 10, 2018

matthiaskoenig commented Jul 11, 2018 via email

cdiener commented Jul 13, 2018

ChristianLieven commented Aug 13, 2018

gregmedlock commented Aug 13, 2018

Midnighter commented Aug 15, 2018

gregmedlock commented Aug 16, 2018

gregmedlock commented Aug 21, 2018

Midnighter commented Aug 21, 2018

matthiaskoenig commented Mar 18, 2019

Add groups feature #543

Add groups feature #543

Comments

cdiener commented Jul 10, 2017 • edited Loading

Feature description

Possible API

Problems

pstjohn commented Jul 12, 2017

Midnighter commented Jul 12, 2017

cdiener commented Jul 12, 2017

cdiener commented Jul 12, 2017

matthiaskoenig commented Mar 23, 2018

silviamorins commented Jun 18, 2018

Midnighter commented Jun 18, 2018

gregmedlock commented Jun 20, 2018

silviamorins commented Jun 21, 2018 • edited Loading

cdiener commented Jun 29, 2018

Midnighter commented Jun 29, 2018

gregmedlock commented Jul 10, 2018

Midnighter commented Jul 10, 2018

matthiaskoenig commented Jul 11, 2018 via email

cdiener commented Jul 13, 2018

ChristianLieven commented Aug 13, 2018

gregmedlock commented Aug 13, 2018

Midnighter commented Aug 15, 2018

gregmedlock commented Aug 16, 2018

gregmedlock commented Aug 21, 2018

Midnighter commented Aug 21, 2018

matthiaskoenig commented Mar 18, 2019

cdiener commented Jul 10, 2017 •

edited

Loading

silviamorins commented Jun 21, 2018 •

edited

Loading