Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run enrichment analysis of metabolites using clusterProfier #731

Open
Krithika-Bhuvan opened this issue Oct 15, 2024 · 1 comment

Comments

@Krithika-Bhuvan
Copy link

I have a list of metabolites, In the form of metabolite names, and also in the form of KEGG Ids. How can i run enrichment analysis on this data ? Please advise

Example data:

name keggid
Pyruvate  C00022
Acetyl-CoA  C00024
2OG  C00026
Glycine  C00037
Succinate  C00042
Aspartate  C00049
@guidohooiveld
Copy link

A pointer is given in, for example, the protocol recently published in Nature Methods, See section 1.9 on page 16.
https://doi.org/10.1038/s41596-024-01020-z

In essence you can use the enrichKEGG function (since you have a list of KEGG ids), and use it with organism = "cpd" (because the input are 'compounds'.

> library(clusterProfiler)
> 
> input <- data.frame("name"=c("Pyruvate","Acetyl-CoA","2OG","Glycine","Succinate","Aspartate"),
+                     "keggid"=c("C00022","C00024","C00026","C00037","C00042","C00049") )
> 
> ## run over-representation analysis.
> ## note not cutoffs are applied for significanc and minimum number of compunds.
> cpd_enrich_result <- enrichKEGG(input[,"keggid"],
+                                 organism = "cpd",
+                                 minGSSize=1,
+                                 pvalueCutoff=1)
Reading KEGG annotation online: "https://rest.kegg.jp/link/cpd/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway"...
> cpd_enrich_result
#
# over-representation test
#
#...@organism    KEGG Compound 
#...@ontology    KEGG 
#...@keytype     kegg_compound 
#...@gene        chr [1:6] "C00022" "C00024" "C00026" "C00037" "C00042" "C00049"
#...pvalues adjusted by 'BH' with cutoff <1 
#...103 enriched terms found
'data.frame':   103 obs. of  14 variables:
 $ category      : chr  "Human Diseases" "Metabolism" "Metabolism" "Metabolism" ...
 $ subcategory   : chr  "Cancer: overview" "Global and overview maps" "Chemical structure transformation maps" "Carbohydrate metabolism" ...
 $ ID            : chr  "map05230" "map01200" "map01060" "map00630" ...
 $ Description   : chr  "Central carbon metabolism in cancer" "Carbon metabolism" "Biosynthesis of plant secondary metabolites" "Glyoxylate and dicarboxylate metabolism" ...
 $ GeneRatio     : chr  "6/6" "6/6" "6/6" "5/6" ...
 $ BgRatio       : chr  "37/6500" "112/6500" "141/6500" "64/6500" ...
 $ RichFactor    : num  0.1622 0.0536 0.0426 0.0781 0.0746 ...
 $ FoldEnrichment: num  175.7 58 46.1 84.6 80.8 ...
 $ zScore        : num  32.4 18.5 16.5 20.4 20 ...
 $ pvalue        : num  2.22e-14 2.29e-11 9.38e-11 4.70e-10 5.95e-10 ...
 $ p.adjust      : num  2.29e-12 1.18e-09 3.22e-09 1.10e-08 1.10e-08 ...
 $ qvalue        : num  2.11e-13 1.08e-10 2.96e-10 1.01e-09 1.01e-09 ...
 $ geneID        : chr  "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042/C00049" "C00022/C00024/C00026/C00037/C00042" ...
 $ Count         : int  6 6 6 5 5 5 4 4 4 4 ...
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, doi:10.1038/s41596-024-01020-z 

> 
> as.data.frame(cpd_enrich_result)[1:3, ]
               category                            subcategory       ID
map05230 Human Diseases                       Cancer: overview map05230
map01200     Metabolism               Global and overview maps map01200
map01060     Metabolism Chemical structure transformation maps map01060
                                         Description GeneRatio  BgRatio
map05230         Central carbon metabolism in cancer       6/6  37/6500
map01200                           Carbon metabolism       6/6 112/6500
map01060 Biosynthesis of plant secondary metabolites       6/6 141/6500
         RichFactor FoldEnrichment   zScore       pvalue     p.adjust
map05230 0.16216216      175.67568 32.38612 2.224526e-14 2.291262e-12
map01200 0.05357143       58.03571 18.50615 2.289233e-11 1.178955e-09
map01060 0.04255319       46.09929 16.45613 9.376114e-11 3.219132e-09
               qvalue                                    geneID Count
map05230 2.107446e-13 C00022/C00024/C00026/C00037/C00042/C00049     6
map01200 1.084374e-10 C00022/C00024/C00026/C00037/C00042/C00049     6
map01060 2.960878e-10 C00022/C00024/C00026/C00037/C00042/C00049     6
> 
> dotplot(cpd_enrich_result)
> 

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants