Request for Microbial Counts Data #308

jiarui1120 · 2025-01-26T11:55:09Z

Hi！

I am writing to express my sincere appreciation for your remarkable work on the curatedMetagenomicData package, which has been of great value to the research community.

I am currently conducting research that involves metagenomic data analysis. While the relative abundance data provided by your package is extremely useful, I was wondering if it would be possible for you to also provide the microbial counts data. The reason is that counts data can be easily transformed into relative abundance, but the reverse is not true. For researchers like me who plan to perform certain downstream analyses, having the original counts data would be much more convenient and accurate. This would enable us to explore a wider range of analytical methods and draw more comprehensive conclusions.

I fully understand that providing additional data might involve extra work on your part, and I truly appreciate any consideration you can give to this request. If there are any challenges or limitations, please do not hesitate to let me know.

Thank you very much for your time and attention. I look forward to your reply.

lwaldron · 2025-01-27T13:23:30Z

Hi @jiarui1120 ,
my understanding of the MetaPhlan taxonomic profiling pipeline is that unnormalized counts mapped to each species are not part of its default output because they're not particularly informative, due to variations in the number, length, and copy number of marker genes between species in the database. The marker_abundance values in cMD give you something closer to the raw data since they have not been normalized by number and length of marker genes per species, and if you multiplied these values by sequencing depth (number_reads in the colData), they would also no longer be normalized by sequencing depth either, but I believe would still be normalized by copy number.

@fasnicar can you add anything to this description? Are there additional intermediate MetaPhlAn results or flags that could be useful to users looking for unnormalized data?

fasnicar · 2025-01-30T16:03:15Z

Hi @jiarui1120,

I agree with Levi’s explanation, and I’d like to add that the transformation described above would only apply to species that MetaPhlAn successfully profiled. Due to insufficient marker coverage, very-low abundant species that MetaPhlAn did not detect, would be excluded from the conversion. So, the conversion itself will be an estimation.
Nonetheless, I think this is the best way to use if you want to go with estimating counts from metagenomes.

I hope this helps!

jiarui1120 added the Feature Request label Jan 26, 2025

jiarui1120 assigned schifferl Jan 26, 2025

lwaldron unassigned schifferl Jan 27, 2025

lwaldron closed this as completed Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Microbial Counts Data #308

Request for Microbial Counts Data #308

jiarui1120 commented Jan 26, 2025

lwaldron commented Jan 27, 2025

fasnicar commented Jan 30, 2025

Request for Microbial Counts Data #308

Request for Microbial Counts Data #308

Comments

jiarui1120 commented Jan 26, 2025

lwaldron commented Jan 27, 2025

fasnicar commented Jan 30, 2025