Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant differences between gtdbtk de_novo_wf (GTDB-Tk v2.3.2) and gtdbtk classify_wf (GTDB-Tk v2.1.0) #613

Open
473021677 opened this issue Nov 26, 2024 · 2 comments

Comments

@473021677
Copy link

Hi,
I have used the “gtdbtk de_novo_wf“ command (GTDB-Tk v2.3.2) to produce the GTDB-tk result files for 160 MAGs. 6 and 22 of the 160 MAGs could be assigned g__32-111 and g__Caldipriscus, respectively, and the other 132 MAGs could not be assigned to a genus. Previously, taxonomic assignment of these 160 MAGs has been performed using GTDB-Tk v2.1.0 (gtdbtk classify_wf) based on the GTDB r207 database, and 100 of these the 160 MAGs could be assigned to 16 genera including the genus g__32-111 and g__Caldipriscus. I am not sure whether I could ignore the significant differences between gtdbtk de_novo_wf (GTDB-Tk v2.3.2, the GTDB r214.0 database) and gtdbtk classify_wf (GTDB-Tk v2.1.0, the GTDB r207 database). The significant differences should not be due to difference GTDB-Tk versions or difference GTDB databases. I need your help. I have appended the resulting files. Thanks.

Best regards,

Yang Yuan
GTDB-Tk_v2.1.0_classify_wf_for_our_genomes.txt
GTDB-Tk v2.3.2_de_novo_wf_for_our_genomes.txt

@pchaumeil
Copy link
Collaborator

Hello,
These workflows are fundamentally different right now. The classify workflow makes a best effort to determine the correct taxonomic assignment taking into account the placement of a genome and RED values. The de novo workflow infers a de novo tree and then decorates the tree using what is essentially implemented in PhyloRank.
Currently, you should take the taxonomic assignments as a guide, but not as final classifications. In particular, there is no consideration of RED to determine if a new user genome is the most basal lineage of a group and thus will systematically under-classify compared to the classify workflow.

@473021677
Copy link
Author

Thanks for your explanation very much. I will combine the gtdbtk analysis with phylogenetic analysis based on other marker gene sets to obtain the accurate taxonomic assignments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants