Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update ncbi-scrub task (version 2.2.1) #202

Closed
wants to merge 8 commits into from
45 changes: 11 additions & 34 deletions tasks/quality_control/task_ncbi_scrub.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,51 +5,29 @@ task ncbi_scrub_pe {
File read1
File read2
String samplename
String docker = "us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:1.0.2021-05-05"
String docker = "us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1"
Int disk_size = 100
}
String r1_filename = basename(read1)
String r2_filename = basename(read2)
command <<<
# date and version control
date | tee DATE

# unzip fwd file as scrub tool does not take in .gz fastq files
if [[ "~{read1}" == *.gz ]]
then
gunzip -c ~{read1} > r1.fastq
read1_unzip=r1.fastq
else
read1_unzip=~{read1}
fi

# dehost reads
/opt/scrubber/scripts/scrub.sh -n ${read1_unzip} |& tail -n1 | awk -F" " '{print $1}' > FWD_SPOTS_REMOVED

# gzip dehosted reads
gzip ${read1_unzip}.clean -c > ~{samplename}_R1_dehosted.fastq.gz

# do the same on read
# unzip file if necessary
if [[ "~{read2}" == *.gz ]]
then
gunzip -c ~{read2} > r2.fastq
read2_unzip=r2.fastq
else
read2_unzip=~{read2}
fi
# unzip read files as scrub tool does not take in .gz fastq files, and interleave them
paste <(zcat ~{read1} | paste - - - -) <(zcat ~{read2} | paste - - - -) | tr '\t' '\n' > interleaved.fastq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work if there are uneven reads in ~{read1} and ~{read2}?


# dehost reads
/opt/scrubber/scripts/scrub.sh -n ${read2_unzip} |& tail -n1 | awk -F" " '{print $1}' > REV_SPOTS_REMOVED
/opt/scrubber/scripts/scrub.sh -i interleaved.fastq |& tail -n1 | awk -F" " '{print $1}' > SPOTS_REMOVED

# gzip dehosted reads
gzip ${read2_unzip}.clean -c > ~{samplename}_R2_dehosted.fastq.gz
# split interleaved reads and compress files
paste - - - - - - - - < interleaved.fastq.clean \
| tee >(cut -f 1-4 | tr '\t' '\n' | gzip > ~{samplename}_R1_dehosted.fastq.gz) \
| cut -f 5-8 | tr '\t' '\n' | gzip > ~{samplename}_R2_dehosted.fastq.gz

>>>
output {
File read1_dehosted = "~{samplename}_R1_dehosted.fastq.gz"
File read2_dehosted = "~{samplename}_R2_dehosted.fastq.gz"
Int read1_human_spots_removed = read_int("FWD_SPOTS_REMOVED")
Int read2_human_spots_removed = read_int("REV_SPOTS_REMOVED")
Int human_spots_removed = read_int("SPOTS_REMOVED")
String ncbi_scrub_docker = docker
}
runtime {
Expand All @@ -67,10 +45,9 @@ task ncbi_scrub_se {
input {
File read1
String samplename
String docker = "us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:1.0.2021-05-05"
String docker = "us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1"
Int disk_size = 100
}
String r1_filename = basename(read1)
command <<<
# date and version control
date | tee DATE
Expand Down
3 changes: 1 addition & 2 deletions workflows/utilities/wf_read_QC_trim_pe.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,7 @@ workflow read_QC_trim_pe {
# NCBI scrubber
File? read1_dehosted = ncbi_scrub_pe.read1_dehosted
File? read2_dehosted = ncbi_scrub_pe.read2_dehosted
Int? read1_human_spots_removed = ncbi_scrub_pe.read1_human_spots_removed
Int? read2_human_spots_removed = ncbi_scrub_pe.read2_human_spots_removed
Int? ncbi_scrub_human_spots_removed = ncbi_scrub_pe.human_spots_removed
String? ncbi_scrub_docker = ncbi_scrub_pe.ncbi_scrub_docker

# bbduk
Expand Down