Skip to content

Commit

Permalink
Depends on megahit from sbx_assembly and remove host_decontam_Q (run …
Browse files Browse the repository at this point in the history
…filter empty)
  • Loading branch information
Ulthran committed Jan 3, 2024
1 parent 5983c37 commit 89da98c
Show file tree
Hide file tree
Showing 5 changed files with 7 additions and 92 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@

sbx_virus_id is a [sunbeam](https://github.com/sunbeam-labs/sunbeam) extension for identifying viruses in samples. This pipeline uses [MEGAHIT](https://github.com/voutcn/megahit) or [SPAdes](https://github.com/ablab/spades) for assembly of contigs and [Cenote-Taker2](https://github.com/mtisza1/Cenote-Taker2) or [Virsorter2](https://github.com/jiarong/VirSorter2) for viral identification.

N.B. If using Megahit for assembly, this extension requires also having sbx_assembly installed.

### Installation

```
Expand Down Expand Up @@ -51,7 +53,6 @@ sunbeam run --profile /path/to/project/ all_virus_id
- bowtie2_build_threads: number of threads for running bowtie2-build (default: 4)
- cenote_taker2_db: path to cenote-taker2 db (default: "")
- virsorter_db: path to virsorter2 db (default: "")
- host_decontam: Whether to run host decontamination (default: False)
- include_phages: Whether to include phages in the output (default: False)
- use_spades: Whether to use SPAdes instead of MEGAHIT (default: False)
- use_virsorter: Whether to use Virsorter2 instead of Cenote-Taker2 (default: False)
Expand Down
1 change: 0 additions & 1 deletion config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ sbx_virus_id:
bowtie2_build_threads: 4
cenote_taker2_db: ''
virsorter_db: ''
host_decontam: True
include_phages: False
use_spades: False # Default: Megahit
use_virsorter: False # Default: Cenote-Taker2
30 changes: 0 additions & 30 deletions envs/megahit_env.linux-64.pin.txt

This file was deleted.

5 changes: 0 additions & 5 deletions envs/megahit_env.yml

This file was deleted.

60 changes: 5 additions & 55 deletions sbx_virus_id.smk
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,11 @@ def get_virus_ext_path() -> Path:
)


def host_decontam_Q() -> str:
if Cfg["sbx_virus_id"]["host_decontam"]:
return "decontam"
else:
return "cleaned"


def virus_sorter_input() -> Path:
if Cfg["sbx_virus_id"]["use_spades"]:
return ASSEMBLY_FP / "virus_id_spades" / "{sample}" / "scaffolds.fasta"
else:
return ASSEMBLY_FP / "virus_id_megahit" / "{sample}_asm" / "final.contigs.fa"
return ASSEMBLY_FP / "megahit" / "{sample}_asm" / "final.contigs.fa"


def virus_sorter_output() -> Path:
Expand All @@ -51,53 +44,10 @@ rule all_virus_id:
VIRUS_FP / "summary" / "all_align_summary.txt",


rule virus_id_megahit_paired:
input:
r1=QC_FP / host_decontam_Q() / "{sample}_1.fastq.gz",
r2=QC_FP / host_decontam_Q() / "{sample}_2.fastq.gz",
output:
ASSEMBLY_FP / "virus_id_megahit" / "{sample}_asm" / "final.contigs.fa",
benchmark:
BENCHMARK_FP / "virus_id_megahit_paired_{sample}.tsv"
log:
LOG_FP / "virus_id_megahit_paired_{sample}.log",
params:
out_fp=str(ASSEMBLY_FP / "virus_id_megahit" / "{sample}_asm"),
threads: 4
conda:
"envs/megahit_env.yml"
resources:
mem_mb=20000,
runtime=720,
shell:
"""
## turn off bash strict mode
set +o pipefail
## sometimes the error is due to lack of memory
exitcode=0
if [ -d {params.out_fp} ]
then
echo "Clearing previous megahit directory..." > {log}
rm -rf {params.out_fp}
fi
megahit -t {threads} -1 {input.r1} -2 {input.r2} -o {params.out_fp} --continue 2>&1 {log} || exitcode=$?
if [ $exitcode -eq 255 ]
then
touch {output}
echo "Empty contigs" 2>&1 | tee {log}
elif [ $exitcode -gt 1 ]
then
echo "Check your memory" 2>&1 | tee {log}
fi
"""


rule virus_id_spades_paired:
input:
r1=QC_FP / host_decontam_Q() / "{sample}_1.fastq.gz",
r2=QC_FP / host_decontam_Q() / "{sample}_2.fastq.gz",
r1=QC_FP / "decontam" / "{sample}_1.fastq.gz",
r2=QC_FP / "decontam" / "{sample}_2.fastq.gz",
output:
ASSEMBLY_FP / "virus_id_spades" / "{sample}" / "scaffolds.fasta",
benchmark:
Expand Down Expand Up @@ -274,8 +224,8 @@ rule build_virus_index:

rule align_virus_reads:
input:
r1=QC_FP / host_decontam_Q() / "{sample}_1.fastq.gz",
r2=QC_FP / host_decontam_Q() / "{sample}_2.fastq.gz",
r1=QC_FP / "decontam" / "{sample}_1.fastq.gz",
r2=QC_FP / "decontam" / "{sample}_2.fastq.gz",
index=str(virus_sorter_output()) + ".1.bt2", # Don't use f-string, broken with python 3.12
output:
temp(VIRUS_FP / "alignments" / "{sample}.sam"),
Expand Down

0 comments on commit 89da98c

Please sign in to comment.