How to run enrichment analysis of metabolites using clusterProfier #731

Krithika-Bhuvan · 2024-10-15T22:44:15Z

I have a list of metabolites, In the form of metabolite names, and also in the form of KEGG Ids. How can i run enrichment analysis on this data ? Please advise

Example data:

name keggid
Pyruvate  C00022
Acetyl-CoA  C00024
2OG  C00026
Glycine  C00037
Succinate  C00042
Aspartate  C00049

The text was updated successfully, but these errors were encountered:

guidohooiveld · 2024-10-22T22:54:21Z

A pointer is given in, for example, the protocol recently published in Nature Methods, See section 1.9 on page 16.
https://doi.org/10.1038/s41596-024-01020-z

In essence you can use the enrichKEGG function (since you have a list of KEGG ids), and use it with organism = "cpd" (because the input are 'compounds'.

> library(clusterProfiler)
> 
> input <- data.frame("name"=c("Pyruvate","Acetyl-CoA","2OG","Glycine","Succinate","Aspartate"),
+                     "keggid"=c("C00022","C00024","C00026","C00037","C00042","C00049") )
> 
> ## run over-representation analysis.
> ## note not cutoffs are applied for significanc and minimum number of compunds.
> cpd_enrich_result <- enrichKEGG(input[,"keggid"],
+                                 organism = "cpd",
+                                 minGSSize=1,
+                                 pvalueCutoff=1)
Reading KEGG annotation online: "https://rest.kegg.jp/link/cpd/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway"...
> cpd_enrich_result
#
# over-representation test
#
#...@organism    KEGG Compound 
#...@ontology    KEGG 
#...@keytype     kegg_compound 
#...@gene        chr [1:6] "C00022" "C00024" "C00026" "C00037" "C00042" "C00049"
#...pvalues adjusted by 'BH' with cutoff <1 
#...103 enriched terms found
'data.frame':   103 obs. of  14 variables:
 $ category      : chr  "Human Diseases" "Metabolism" "Metabolism" "Metabolism" ...
 $ subcategory   : chr  "Cancer: overview" "Global and overview maps" "Chemical structure transformation maps" "Carbohydrate metabolism" ...
 $ ID            : chr  "map05230" "map01200" "map01060" "map00630" ...
 $ Description   : chr  "Central carbon metabolism in cancer" "Carbon metabolism" "Biosynthesis of plant secondary metabolites" "Glyoxylate and dicarboxylate metabolism" ...
 $ GeneRatio     : chr  "6/6" "6/6" "6/6" "5/6" ...
 $ BgRatio       : chr  "37/6500" "112/6500" "141/6500" "64/6500" ...
 $ RichFactor    : num  0.1622 0.0536 0.0426 0.0781 0.0746 ...
 $ FoldEnrichment: num  175.7 58 46.1 84.6 80.8 ...
 $ zScore        : num  32.4 18.5 16.5 20.4 20 ...
 $ pvalue        : num  2.22e-14 2.29e-11 9.38e-11 4.70e-10 5.95e-10 ...
 $ p.adjust      : num  2.29e-12 1.18e-09 3.22e-09 1.10e-08 1.10e-08 ...
 $ qvalue        : num  2.11e-13 1.08e-10 2.96e-10 1.01e-09 1.01e-09 ...
 $ geneID        : chr  "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042" ...
 $ Count         : int  6 6 6 5 5 5 4 4 4 4 ...
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, doi:10.1038/s41596-024-01020-z 

> 
> as.data.frame(cpd_enrich_result)[1:3, ]
               category                            subcategory       ID
map05230 Human Diseases                       Cancer: overview map05230
map01200     Metabolism               Global and overview maps map01200
map01060     Metabolism Chemical structure transformation maps map01060
                                         Description GeneRatio  BgRatio
map05230         Central carbon metabolism in cancer       6/6  37/6500
map01200                           Carbon metabolism       6/6 112/6500
map01060 Biosynthesis of plant secondary metabolites       6/6 141/6500
         RichFactor FoldEnrichment   zScore       pvalue     p.adjust
map05230 0.16216216      175.67568 32.38612 2.224526e-14 2.291262e-12
map01200 0.05357143       58.03571 18.50615 2.289233e-11 1.178955e-09
map01060 0.04255319       46.09929 16.45613 9.376114e-11 3.219132e-09
               qvalue                                    geneID Count
map05230 2.107446e-13 C00022/C00024/C00026/C00037/C00042/C00049     6
map01200 1.084374e-10 C00022/C00024/C00026/C00037/C00042/C00049     6
map01060 2.960878e-10 C00022/C00024/C00026/C00037/C00042/C00049     6
> 
> dotplot(cpd_enrich_result)
>

guidohooiveld mentioned this issue Oct 23, 2024

KEGG analysis of metabolites #712

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run enrichment analysis of metabolites using clusterProfier #731

How to run enrichment analysis of metabolites using clusterProfier #731

Krithika-Bhuvan commented Oct 15, 2024

guidohooiveld commented Oct 22, 2024

How to run enrichment analysis of metabolites using clusterProfier #731

How to run enrichment analysis of metabolites using clusterProfier #731

Comments

Krithika-Bhuvan commented Oct 15, 2024

guidohooiveld commented Oct 22, 2024