Skip to content

Commit

Permalink
[TheiaProk - GAMBIT and GAMBIT_Query] Update to latest db v2.0.0-2024…
Browse files Browse the repository at this point in the history
…0628 (#539)

* update gambit db to v2.0.0

* remove candidate sub-speciation from merlin_tag

* add missing import

* update CI

* remove sub-speciation from gambit_predicted_taxon

* update md5sum

---------

Co-authored-by: Sage Wright <[email protected]>
  • Loading branch information
cimendes and sage-wright authored Aug 16, 2024
1 parent af2f76b commit 1508bb5
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 7 deletions.
15 changes: 12 additions & 3 deletions tasks/taxon_id/task_gambit.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ task gambit {
File assembly
String samplename
String docker = "us-docker.pkg.dev/general-theiagen/staphb/gambit:1.0.0"
File gambit_db_genomes = "gs://gambit-databases-rp/1.3.0/gambit-metadata-1.3-231016.gdb"
File gambit_db_signatures = "gs://gambit-databases-rp/1.3.0/gambit-signatures-1.3-231016.gs"
File gambit_db_genomes = "gs://gambit-databases-rp/2.0.0/gambit-metadata-2.0.0-20240628.gdb"
File gambit_db_signatures = "gs://gambit-databases-rp/2.0.0/gambit-signatures-2.0.0-20240628.gs"
Int disk_size = 20
Int memory = 2
Int cpu = 1
Expand Down Expand Up @@ -40,6 +40,7 @@ task gambit {
python3 <<EOF
import json
import csv
import re
def fmt_dist(d): return format(d, '.4f')
Expand All @@ -65,7 +66,13 @@ task gambit {
if str(empty_value) == str(fmt_dist(0)):
f.write(fmt_dist(search_item[column]))
else:
f.write(search_item[column])
# remove candidate sub-speciation from taxon name
if column == 'name':
gambit_name = search_item[column]
gambit_name = re.sub(r'_[A-Za-z]+', '', gambit_name) # This line is added to remove _X where X is any letter
f.write(gambit_name)
else:
f.write(search_item[column])
# Predicted taxon
write_output('PREDICTED_TAXON', predicted, 'name', 'NA')
Expand Down Expand Up @@ -121,6 +128,8 @@ task gambit {
try:
merlin_tag = predicted['name']
# remove candidate sub-speciation from merlin_tag
merlin_tag = re.sub(r'_[A-Za-z]+', '', merlin_tag) # This line is added to remove _X where X is any letter
except:
merlin_tag = "NA"
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiaprok/test_wf_theiaprok_illumina_pe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@
- path: miniwdl_run/call-clean_check_reads/work/_miniwdl_inputs/0/test_1.clean.fastq.gz
- path: miniwdl_run/call-clean_check_reads/work/_miniwdl_inputs/0/test_2.clean.fastq.gz
- path: miniwdl_run/call-gambit/command
md5sum: 6f2537d9aa54eac508dc393d27772b76
md5sum: 8694687ae578a88817bab0865cceb652
- path: miniwdl_run/call-gambit/inputs.json
contains: ["assembly", "fasta", "samplename", "test"]
- path: miniwdl_run/call-gambit/outputs.json
Expand Down Expand Up @@ -623,7 +623,7 @@
- path: miniwdl_run/wdl/tasks/task_versioning.wdl
md5sum: c20b66eea46148f4618abc038d3877b7
- path: miniwdl_run/wdl/tasks/taxon_id/task_gambit.wdl
md5sum: f4826d61709d0d44b921a821d0d5706f
md5sum: 2aa70eab24868920f6c28843dd3b5613
- path: miniwdl_run/wdl/tasks/taxon_id/contamination/task_kraken2.wdl
md5sum: 0ea83681884800bda1e3c4e116f2b19d
- path: miniwdl_run/wdl/tasks/taxon_id/contamination/task_midas.wdl
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiaprok/test_wf_theiaprok_illumina_se.yml
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@
md5sum: 32c0be4fb7f3030bf9c74c0a836d4f2e
- path: miniwdl_run/call-clean_check_reads/work/_miniwdl_inputs/0/test_1.clean.fastq.gz
- path: miniwdl_run/call-gambit/command
md5sum: 6f2537d9aa54eac508dc393d27772b76
md5sum: 8694687ae578a88817bab0865cceb652
- path: miniwdl_run/call-gambit/inputs.json
contains: ["assembly", "fasta", "samplename", "test"]
- path: miniwdl_run/call-gambit/outputs.json
Expand Down Expand Up @@ -586,7 +586,7 @@
- path: miniwdl_run/wdl/tasks/task_versioning.wdl
md5sum: c20b66eea46148f4618abc038d3877b7
- path: miniwdl_run/wdl/tasks/taxon_id/task_gambit.wdl
md5sum: f4826d61709d0d44b921a821d0d5706f
md5sum: 2aa70eab24868920f6c28843dd3b5613
- path: miniwdl_run/wdl/tasks/taxon_id/contamination/task_kraken2.wdl
md5sum: 0ea83681884800bda1e3c4e116f2b19d
- path: miniwdl_run/wdl/tasks/taxon_id/contamination/task_midas.wdl
Expand Down

0 comments on commit 1508bb5

Please sign in to comment.