Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add groups feature #543

Closed
cdiener opened this issue Jul 10, 2017 · 22 comments
Closed

Add groups feature #543

cdiener opened this issue Jul 10, 2017 · 22 comments
Assignees
Labels
SBML Related to reading and writing SBML models.

Comments

@cdiener
Copy link
Member

cdiener commented Jul 10, 2017

Feature description

Allow cobrapy objects to be annotated with groups like the legacy subsystem. This is supposed to increase compatibility with SBML groups.

Possible API

Add a groups attribute to metabolites, reactions and genes that can be used to store arbitrary grouping attributes. Could be a lightweight object or list. Example:

r = model.reactions[0]
r.groups
# Output = ["TCA cycle", "mitochondrial metabolism"]

The SBML interface would read and write the groups.

or to support the kind attribute

r.groups
# Output = [Group(kind="classification", name="TCA cycle"), Group(kind="classification", name="mitochondrial metabolism"]

Problems

SBML groups allows nested groups like (group1, species1, species2). This is not compatible with the current cobrapy API that assigns groups as properties onto metabolites, reactions or genes. However, building a full fledged group system like model.groups would be hard to maintain since one has to sync those with the context manager and model modifications which is not necessary in the proposed API. However, to use it like that we would have to ignore all nested groups since SBML specifically prohibits flattening of the group hierarchy:

If a Member references a Group, it is the Group itself that is considered to be a member of the parent Group, not 33
the corresponding referenced element (or elements).

@cdiener cdiener added the SBML Related to reading and writing SBML models. label Jul 10, 2017
@pstjohn
Copy link
Contributor

pstjohn commented Jul 12, 2017

I don't think a full-fledged solution would be that hard to code up. a Group would just be a List-like object with __add__ and __sub__ methods to handle contexts.

A while back I played around with a metabolite group summary. Essentially ignore flows within the group, just look at how fluxes enter/exit. So I could certainly revive that with a group.summary().

I would expect the harder problems would be syncing it across the pickle, json, and sbml saved file formats while respecting older formats that would be missing them.

@Midnighter
Copy link
Member

In my view of the hierarchy I would do two three things:

  1. Add an attribute groups to Model which is a dict of sets, i.e., Group -> Member where a member can be anything including another Group.
  2. Add an attribute groups to Reaction, Metabolite, Gene which is a set.
  3. Model.remove_ will have to delete the above components from the relevant groups.

Group itself becomes a new class if necessary so it can manage its resources better.

@cdiener
Copy link
Member Author

cdiener commented Jul 12, 2017

Okay, you convinced me 😄

@Midnighter Your suggestion looks good to me. Maybe the groups attribute on metabolites etc. should just be ids (strings) to not create unwanted references to Model.

@pstjohn Seems like a good idea to me that the groups feature should be its own object since you could add useful methods to it like to_sbml, as_dataframe (to be joinable with solution.fluxes) etc. Should the list of groups be the objective (dict of sets) or just the individual groups (sets)?

@cdiener
Copy link
Member Author

cdiener commented Jul 12, 2017

@Midnighter the pickle format is not backwards compatible as it is for now. You can not unpickle cobra 0.5 models in 0.6 for instance since they have different attributes and methods. JSON and SBMl should be straight forward.

@matthiaskoenig
Copy link
Contributor

The SBML import export of groups (subsystems) is straight-forward. As soon as someone implements the groups on Model, Reaction, Species and Gene as proposed by @Midnighter I would implement the groups support.

@silviamorins
Copy link

Hi,
I am dealing with this same issue (not being able to directly export COBRApy subsystems ad SBML groups): this feature is up to now not yet implemented, right?

@Midnighter
Copy link
Member

Unfortunately, not yet. With a better SBML parser in the works, compartment classes imminent, I can definitely see these coming soon, though. We will prioritize cobrapy development in our group for all of August so latest then, unless you want to take a stab at it? :)

@gregmedlock
Copy link
Member

If Silvia is not interested in implementing it, I would be happy to try. To restate the to-dos that have been discussed so far, just for my own understanding:

  • Create a new class, Group, defined in cobra.core.group.py, which will be a subclass of Object
  • Create a groups attribute in Model, which is a dictionary with keys of type Group and values that are sets of type Reaction, Metabolite, Gene, or Group.
  • Create a groups attribute in Reaction, Metabolite, and Gene that is a set containing references to objects of type Group

Based on this summary, I have a few questions:

  1. For the groups attribute in Model, would it be more consistent with the reactions, metabolites, and genes attributes to use a DictList rather than a dictionary? In this case, the DictList would have Group ids as keys, Groups as values, and each Group would have a members attribute, which is a set of Reaction, Metabolite, Gene, or Group.
  2. Similarly, would we prefer avoiding new references to Model in the groups attribute for Reaction, Metabolite, and Gene, as @cdiener mentioned? I favor using a set as in the original suggestions by @Midnighter so that the structure is similar to the rest of the codebase (e.g. Reactions have an attribute _genes that is a set containing references to Genes) but I don't think I fully understand potential consequences of adding new references.
  3. Since any Species subclass should be able to belong to a Group, should we give Species the group attribute rather than defining it in both Metabolite and Gene?

Also, considering the kind attribute--from the Groups package specification linked in the top post, there are three values for kind a group can take: classification, partonomy, or collection. I favor implementing kind as an attribute of Group rather than just picking "collection" by default when parsing/writing, since it would be nice to be able to label particular metabolites with a "classification" (e.g. "highly polar", "low molecular weight") or with subsystems, which would fall under "partonomy" IMO.

Thoughts?

@silviamorins
Copy link

silviamorins commented Jun 21, 2018

sorry for my MIA,
I'm actually coming from the biology side and just approached systems biology at the beginning of this year, so I would say it's better for the community if I keep myself away from implementing attempts... Thank you again for the help!

@cdiener
Copy link
Member Author

cdiener commented Jun 29, 2018

Having spent some time now with the new API I would retract my concerns about the references to model. Groups are always defined in the context of a model so it makes sense they have a reference to it. We did have a lot of bugs/problems recently related to some properties similar to that of groups that were mostly caused by having the same information in several places (for instance with reactions and compartments). So it would rather argue for having a "single source of truth" now. This means that the assignment into groups should only happen once, for instance in Model.groups and all other objects should generate this information dynamically. So for instance Reaction.group would just look up in Model.groups and get the info dynamically from there. This might be a bit lower but is much more maintainable with the context manager etc.

@Midnighter
Copy link
Member

I very much agree with a single source of truth. Also, SBML defines groups as having a list of members whereas the group information is not defined at all on the reaction.

This brings up the argument again of do we need Reaction.groups as an attribute/property at all? Having all those multiple ways of accessing the same information is actually the source of our troubles, I believe. We might decide that for the sake of convenience that we do want that or we might go a different way. Rather than Reaction.groups we might add a method to the model Model.get_associated_groups, for example.

def get_associated_groups(self, element):
   return [g for g in self.groups if element in g]

@gregmedlock
Copy link
Member

So it sounds like the optimal solution is a new Groups attribute for Model that stores objects of the new class Group, and a getter function that searches through Groups to lookup group membership for individual species. I agree the lookup might be slow but I don't think too many users would be concerned about speed here (e.g. you could just create your own dict from Model.Groups to map all elements to their groups if you needed to repetitively determine group membership).

Let me know if I missed anything, otherwise I will get to work on a solution.

@Midnighter
Copy link
Member

Sounds about right. Initially I wanted to recommend making the class Group inherit from set but groups can be elements of other groups and sets are unhashable so they can not be members of other sets. I guess an attribute members which is a set would do the trick. Other than that I would try to keep the Group class rather sparse and only implement the minimum interface to satisfy the SBML requirements.

@matthiaskoenig
Copy link
Contributor

matthiaskoenig commented Jul 11, 2018 via email

@cdiener
Copy link
Member Author

cdiener commented Jul 13, 2018

@gregmedlock's proposal looks good to me. I would like some utility methods to get groups across a species type and group name as a pandas DataFrame. For instance getting the subsystem for all reactions which was one of the initial motivations to implement this feature.

@Midnighter Midnighter added this to the COBRApy 1.0 milestone Jul 29, 2018
@ChristianLieven
Copy link
Contributor

@gregmedlock How's your progress on this so far?

I'd like to help this along however I can in the coming weeks, but I also don't want to step on your toes if you've already started with it. Could you give me a quick update on where you're at?

@gregmedlock
Copy link
Member

My plan is to spend next week on it, start to finish, although I see @Midnighter has assigned it to @AlbaLopez. If next week is too late I am happy to let anyone else take the reins.

@Midnighter
Copy link
Member

Hi Greg,
Sorry, for just "taking away" the issue from you. I didn't see any activity in a while and @AlbaLopez was in favour of taking on this issue. I should have asked you about it beforehand, though. As things stand, though, Alba won't get around to working on this soon, so if you're still willing why don't you work on it next week?

@gregmedlock
Copy link
Member

No worries at all, a month is a long time to leave the issue hanging with no update! Will make the PR as soon as it's ready.

@gregmedlock
Copy link
Member

@cdiener for the utility function, would returning a dataframe of booleans with element id's as rownames (genes, reactions, metabolites) and group id's as column names meet your desired use cases? The function would just take any list of metabolites, reactions, and/or genes, e.g.

reaction_group_associations = model.get_group_associations(model.reactions)

Second question... would Model start to get a bit crowded if we include such a utility function in it? solver is the only class with a similar function. I could throw it in cobra.util.util for now, or create cobra.util.groups if we anticipated need more group-related utility functions for SBML io.

@Midnighter
Copy link
Member

I'm definitely in favor of becoming a little more function-oriented and not to tack on every method to the model class.

@Midnighter Midnighter modified the milestones: COBRApy 1.0, Sprint 39 Sep 27, 2018
@kvikshaug kvikshaug removed this from the Sprint 39 milestone Oct 29, 2018
@matthiaskoenig matthiaskoenig self-assigned this Feb 27, 2019
@matthiaskoenig
Copy link
Contributor

The new libsbml parser reads and writes groups (and subsystem information).
@Midnighter implemented the groups feature.
Remaining issues with groups will be part of #760

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SBML Related to reading and writing SBML models.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants