Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Microbial Counts Data #308

Closed
jiarui1120 opened this issue Jan 26, 2025 · 2 comments
Closed

Request for Microbial Counts Data #308

jiarui1120 opened this issue Jan 26, 2025 · 2 comments

Comments

@jiarui1120
Copy link

Hi!

I am writing to express my sincere appreciation for your remarkable work on the curatedMetagenomicData package, which has been of great value to the research community.

I am currently conducting research that involves metagenomic data analysis. While the relative abundance data provided by your package is extremely useful, I was wondering if it would be possible for you to also provide the microbial counts data. The reason is that counts data can be easily transformed into relative abundance, but the reverse is not true. For researchers like me who plan to perform certain downstream analyses, having the original counts data would be much more convenient and accurate. This would enable us to explore a wider range of analytical methods and draw more comprehensive conclusions.

I fully understand that providing additional data might involve extra work on your part, and I truly appreciate any consideration you can give to this request. If there are any challenges or limitations, please do not hesitate to let me know.

Thank you very much for your time and attention. I look forward to your reply.

@lwaldron
Copy link
Member

Hi @jiarui1120 ,
my understanding of the MetaPhlan taxonomic profiling pipeline is that unnormalized counts mapped to each species are not part of its default output because they're not particularly informative, due to variations in the number, length, and copy number of marker genes between species in the database. The marker_abundance values in cMD give you something closer to the raw data since they have not been normalized by number and length of marker genes per species, and if you multiplied these values by sequencing depth (number_reads in the colData), they would also no longer be normalized by sequencing depth either, but I believe would still be normalized by copy number.

@fasnicar can you add anything to this description? Are there additional intermediate MetaPhlAn results or flags that could be useful to users looking for unnormalized data?

@fasnicar
Copy link
Collaborator

Hi @jiarui1120,

I agree with Levi’s explanation, and I’d like to add that the transformation described above would only apply to species that MetaPhlAn successfully profiled. Due to insufficient marker coverage, very-low abundant species that MetaPhlAn did not detect, would be excluded from the conversion. So, the conversion itself will be an estimation.
Nonetheless, I think this is the best way to use if you want to go with estimating counts from metagenomes.

I hope this helps!

@lwaldron lwaldron closed this as completed Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants