-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom DB and custom taxonomy (GTDB or similar) #884
Comments
Hello, We are in the process of adding GTDB support to the |
I'm also interested in building a custom database with GTDB taxonomy! When can we expect to have this function available? |
I pushed a change to the k2 wrapper today. Here's how you can build a GTDB database with the changes: k2 build --special gtdb --gtdb-files gtdb_genomes_reps.tar.gz --threads 6 --db gtdb_reps The You can find the list of file names here https://data.gtdb.ecogenomic.org/releases/latest/genomic_files_reps/ The command line will also offer a list of files if you specify the wrong file name: k2 build --special --gtdb-files foo.tar.gz --db gtdb_reps
[ERROR - 2024-11-05 17:57:48,567]: Could not find any files matching foo.tar.gz
[ERROR - 2024-11-05 17:57:48,567]: Here are a list of candidates:
bac120_msa_marker_genes_all.tar.gz
ar53_msa_marker_genes_all.tar.gz
ssu_all.fna.gz
bac120_marker_genes_all.tar.gz
ar53_marker_genes_all.tar.gz
bac120_ssu_reps.fna.gz
ar53_msa_marker_genes_reps.tar.gz
gtdb_proteins_aa_reps.tar.gz
bac120_msa_marker_genes_reps.tar.gz
bac120_msa_reps.faa.gz
bac120_marker_genes_reps.tar.gz
ar53_msa_reps.faa.gz
ar53_marker_genes_reps.tar.gz
gtdb_proteins_nt_reps.tar.gz
gtdb_genomes_reps.tar.gz
ar53_ssu_reps.fna.gz We welcome your testing in making sure that this feature works as expected. |
Traceback (most recent call last):
Hello, I tried to build a GTDB database with the k2 you provided, but it resulted in this error. |
I pushed a fix for this issue a few days ago and have also pushed a fix for potential crashes while masking the genomes. Can you try pulling these changes and trying again? |
hi, is there any update to this issue? thanks :) |
I was able to run the command with no errors after the push. |
We will soon be publishing an index for the GTDB representative genomes, stay tuned. |
That would be really great, thank you. |
The database is now available, see: https://benlangmead.github.io/aws-indexes/k2 |
thank you |
Has anyone tried k2 classify with the pre-built gtdb database? I tried to run with one sample but it's stuck at "loading database information" step for about 16 hours now. I don't know what might have gone wrong because there is no error reported. I was able to run kraken2 on the same sample with standard database. |
I realized that too. Try adding the |
I get the following error when building the GTDB database, is there a solution? [INFO - 2024-12-30 10:21:37,302]: Creating sequence ID to taxonomy ID map |
Is there a way to create a completely new database and non-NCBI taxonomy for that database? I am primarily interested in using something like GTDB, which has a genome download and taxonomy tsv file available.
I understand that I can add the contigs/genomes to a custom database using "kraken2-build --add-to-library" but the creation of a new taxonomy doesn't seem straight forward.
The text was updated successfully, but these errors were encountered: