EnzymeInfo structure with complex information #174

edkerk · 2022-09-27T12:25:01Z

edkerk
Sep 27, 2022
Maintainer

As written by @johan-gson on email:

In GECKO3, we want to use the EC field in the model, and not calculate it automatically as was previously done in GECKO2. This saves computation time, allows for manual curation, etc.
The EC field sometimes contains more than one EC Code. An example in Human-GEM is the reaction MAR08341. The reaction is: H+[c] + L-arabinose[c] + NADPH[c] => L-arabitol[c] + NADP+[c]. EC code: 1.1.1.21;1.1.1.2. GPR ENSG00000053371 or ENSG00000085662 or ENSG00000117448. What I suspect is that some genes belong to one EC code and some to another. How do we know which?
We don’t have the number of subunits information

What I propose instead is to add another structure to the model, called EnzymeInfo or something like that, that is a list of enzyme complexes (just ids, maybe EC numbers, I’m not sure). Then we would add a separate table called Enzyme Complexes or similar, which would contain:

A list of subunits (could be just one for simple enzymes)
The number of each subunit in the complex.

We would then add a table of subunits, which would contain gene id, protein id, and potentially some other info. Alternatively, this table could be skipped, and we could just use gene directly in the enzyme complexes above. But it depends on what we want to add. Ideally, I think the subunits table should contain things like MW.

I would then think that Gecko 3 could use different data depending on what is available. If this new structure is there in a model, that is what we will use. Otherwise, we can provide code to build it from GPRs and/or the old eccodes field.

If you want, I could generate this type of structure for Human-GEM, perhaps as part of the Human2 project? If you agree, that is. The question is exactly what should be put in this structure, and what should be retrieved from other databases. When we work with it, the GPRs and ECCodes fields could then be generated from the new structure, so we don’t store things in multiple ways.

johan-gson · 2022-09-27T12:32:09Z

johan-gson
Sep 27, 2022
Collaborator

Another potential benefit is that some isozymes could be skipped by ftINIT perhaps. Not sure how much it matters though. This could be a post-processing step at the end of ftINIT, to just remove the isozymes with a score < 0 if there are others with a score > 0.

1 reply

edkerk Sep 27, 2022
Maintainer Author

I suppose this is related to GECKO light? Where there are no separate reactions for each isozyme? Otherwise this could just be done by ignoring/filtering out relevant reactions. Indeed, in GECKO light the rxnEnzMat structure would contain all alternative complexes for a particular reaction, not juse one isozymic complex.

Actually, perhaps this whole suggestion of EnzymeInfo is most valuable for GECKO light? Due to absence of a usable rxnEnzMat structure.

Not sure what "isozymes could be skipped by ftINIT" exactly means. Does this assume that a generic ec-model is used as input for ftINIT? Could this be directly parsed from EnzymeInfo?

edkerk · 2022-09-27T12:39:41Z

edkerk
Sep 27, 2022
Maintainer Author

Good to notice, is that the new model.ec structure already has the following relevant fields:

model.ec.rxnEnzMat, where integers indicate the number of subunits that make up a complex. This is done on a per-reaction basis, and considering that makeEcModel splits reactions by alternative isozymes, each ec-reaction only contains one complex. (In this context, all enzymes are then a complex, where most have just 1 subunit).
model.ec.mw contains the enzyme-specific molecular weights. Combining it with model.ec.rxnEnzMat would give you the MW of the whole complex.
model.ec.eccodes contain reaction-specific EC codes, and considering the splitting by alternative isozymes, this also means complex-specific EC codes.

The most important part of the proposed EnzymeInfo structure would be to specify each (multi-subunit) complex in a more easily readable format, potentially combined with some further info. As the proposed info is already in the model (and it is easiest to have e.g. applyKcatConstraints to always use the same fields to define the kcat values), it would seem that this EnzymeInfo structure is something that could be used to populate model.ec. with complex information.

Note that there should also be some function that can parse e.g. Complex Portal output (loadComplexData). One idea would be to have loadComplexData make an EnzymeInfo structure, which with an additional function can be used to fill the relevant model.ec fields. One can then also use different ways to specify the EnzymeInfo structure.

In that strategy, EnzymeInfo would be an intermediate structure, but no a direct part of the model itself, because all information would already be present in the model.ec fields.

Regarding the EC codes, to me this is a separate issue (how to annotate EC numbers, potentially extracted from proteins that are annotated with EC numbers (e.g. from UniProt).

1 reply

edkerk Sep 27, 2022
Maintainer Author

Another aspect related to not duplicating information is the idea of exporting/importing ec-models in a YAML format (where the current GEM YAML structure is extended with a part describing model.ec). If another EnzymeInfo field is defined, it would duplicate information in the exported file.

johan-gson · 2022-09-27T13:04:55Z

johan-gson
Sep 27, 2022
Collaborator

So, the main point is that the EnzymeInfo would be the primary storage of this information, instead of GPRs and eccodes, which are a bit confusing as I pointed out, so it would replace them. GPRs and eccodes will probably have to be kept for legacy reasons, but would be generated from the EnzymeInfo. What we do inside GECKO is less a concern I'm thinking - I think this will be different for light vs full. But the problem is the same for both. I also think this is a more clear structure - the GPRs can also be defined in other forms, i.e., in more complex setups than just (ANDs) OR (ANDs), which I think is bad - it makes it much more difficult to work with them. It is like that in Human-GEM to save space.

So, this is not to replace the model.ec fields, but to replace the GPR and ECCodes fields.

About the ftINIT - let's leave it for now, I'm not sure it is that valuable to identify which enzyme complexes that are available for a reaction.

12 replies

johan-gson Sep 29, 2022
Collaborator

So, I still don't fully understand this. If you have a GPR with ORs, can some of those isozymes belong to one EC number, and some to another? If not, I agree it is better to stick with as it is. The multiple EC numbers would then be a way to find kcats if there is no kcat for the "primary" one, is that how this should be viewed? In that case we don't have a problem.

edkerk Sep 29, 2022
Maintainer Author

The EC number describes the chemical reaction that is catalyzed by an enzyme. If isozymes all catalyze the same reaction (they should, otherwise they should not be associated to the same reaction), they all have the same EC.

Complication can be if it is a multi-step reaction with multiple subunits (e.g. pyruvate dehydrogenase), but this is not what you refer to. And indeed, the multiple EC numbers can be used for additional kcat matching (which is probably a bit better than just wildcarding EC digits).

johan-gson Sep 29, 2022
Collaborator

Ok, so the best thing to do may still be then to keep a list of comma-separated EC numbers per reaction in the EC structure coming out of MakeEcModel, correct? Then all is clear. For now :)

edkerk Sep 29, 2022
Maintainer Author

Yes, I'm partially retracting my point of "one-EC-per-reaction" :). It might still be useful to have multiple EC numbers, as long as they make sense :).

johan-gson Sep 29, 2022
Collaborator

The best part with that is that I don't need to decide which one to keep :) In the long run, it sounds like it may be valuable to describe the multistep reactions such as pyruvate dehydrogenase in some kind of structure so that can be accounted for. But let's just ignore those problems for now.

Yu-sysbio · 2022-09-29T10:41:03Z

Yu-sysbio
Sep 29, 2022
Collaborator

I somehow fixed the enzyme info issue in the CofactorYeast project although it might not be an efficient way but could inspire a bit here. I store the info in the file enzymedata.mat, in which there are different enzyme-related fields (e.g., subunits' id, ec code, stoichiometry, kcat and kcat confidence...) with the same length that equals the number of split enzymatic reactions.

1 reply

johan-gson Sep 29, 2022
Collaborator

Nice Yu, sounds very similar to what I proposed. So, I will be a bit practical regarding this since I'm running out of time. If I don't need to do this, I won't :). But I think that in the long run, it would be good to replace the GPRs and EC numbers in models with a structure that more clearly defines isozymes and subunits - the GPRs lack some information (subunit stoichiometry) and have more degrees of freedom than they should - it is possible to declare any type of boolean expression, which we then have problems to handle. We just assume that they are written in the "correct" way here (which is for example not the case for Human-GEM to save space, it needs to be converted first).

But for now, I will not venture into this if it is not needed :)

mihai-sysbio · 2022-09-29T13:21:33Z

mihai-sysbio
Sep 29, 2022
Maintainer

After the model.ec structure is decided, the remaining problem of providing the subunit stoichiometry remains. This means that, for any model that is to be used with GECKO 3, it would fall onto GECKO to provide the subunit stoichiometry (in line with the defined structure). This is something the model doesn't provide, does it? And this also needs versioning, explanations, curation etc. More details on this are listed under #160.
My question is: is this a must for GECKO 3 - does it have to be implemented in this release and it cannot wait for a future one?

3 replies

johan-gson Sep 29, 2022
Collaborator

My thoughts on this is that it is a bonus, not a requirement (just defaults to 1), but that it would be good to have. It would also be good if this was part of the model somehow, so GECKO will not need to find it every time it is run + that would allow for manual curations etc. Not sure what you others think?

feiranl Sep 29, 2022
Collaborator

I also think that in the future, this subunit stoi must be included. If the model does not supply such info, then we assume it is 1. So it is good to include the design now.

edkerk Sep 29, 2022
Maintainer Author

The model indeed does not provide it, #160 suggests one way to provide this information, and indeed the model.ec.rxnEnzMat and applyKcatConstraints are already designed to consider it, but it just takes 1-as-default for now. Not very high priority to resolve #160, there are also only a few organisms included.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EnzymeInfo structure with complex information #174

{{title}}

Replies: 5 comments 18 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

EnzymeInfo structure with complex information #174

edkerk Sep 27, 2022 Maintainer

Replies: 5 comments · 18 replies

johan-gson Sep 27, 2022 Collaborator

edkerk Sep 27, 2022 Maintainer Author

edkerk Sep 27, 2022 Maintainer Author

edkerk Sep 27, 2022 Maintainer Author

johan-gson Sep 27, 2022 Collaborator

johan-gson Sep 29, 2022 Collaborator

edkerk Sep 29, 2022 Maintainer Author

johan-gson Sep 29, 2022 Collaborator

edkerk Sep 29, 2022 Maintainer Author

johan-gson Sep 29, 2022 Collaborator

Yu-sysbio Sep 29, 2022 Collaborator

johan-gson Sep 29, 2022 Collaborator

mihai-sysbio Sep 29, 2022 Maintainer

johan-gson Sep 29, 2022 Collaborator

feiranl Sep 29, 2022 Collaborator

edkerk Sep 29, 2022 Maintainer Author

edkerk
Sep 27, 2022
Maintainer

Replies: 5 comments 18 replies

johan-gson
Sep 27, 2022
Collaborator

edkerk Sep 27, 2022
Maintainer Author

edkerk
Sep 27, 2022
Maintainer Author

edkerk Sep 27, 2022
Maintainer Author

johan-gson
Sep 27, 2022
Collaborator

johan-gson Sep 29, 2022
Collaborator

edkerk Sep 29, 2022
Maintainer Author

johan-gson Sep 29, 2022
Collaborator

edkerk Sep 29, 2022
Maintainer Author

johan-gson Sep 29, 2022
Collaborator

Yu-sysbio
Sep 29, 2022
Collaborator

johan-gson Sep 29, 2022
Collaborator

mihai-sysbio
Sep 29, 2022
Maintainer

johan-gson Sep 29, 2022
Collaborator

feiranl Sep 29, 2022
Collaborator

edkerk Sep 29, 2022
Maintainer Author