Hi Folks! π
First thing first. Our mantra ποΈ : This repository is not a tutorial. It is just for reproducing my work. However, you are more than welcome to use this workflow. And if you find any error, please, you'll welcome as well to complain (but not too much π). I'll be glad to fix it.
The workflow will do:
-
Quality control of llumina MiSeq reads (paired-end reads, PE) - FastQC.
-
Trimmed the raw reads - Trimmomatic.
-
Assembly the quality-filtered paired-end reads - De novo assembly - SPAdes.
-
Quality assessment for evaluating genome assembled - QUAST.
-
Detection chimera or contamination - GUNC.
-
Prediction and annotation - Prokka.
-
Folders have the same name of each tool used.
Important points:
Create a folder named reads/
and transfer your "fastq.gz" to this folder. Then, rename your "fastq.gz" files to {dadada}_1.fastq.gz
and {dadada}_2.fastq.gz
.
From my repository: download to your area the file with all PE sequence adapters in Adapter folder for trimming step, and GenomeAnalysis.yaml file in env folder to recreate the the same environment that I use to process my data.
$ conda env create -n snake -f GenomeAnalysis.yaml
$ conda activate snake
Now, everything is ready to run the workflow.
**Additional information:**π₯
SPAdes is still the best assembler for bacterial genome assembly (considering that you are using PE). That's why you won't find another assembler as a second option. However, if you still want to try another assembler, it is very easy to add a new rule or replace the current one in the workflow (but, you'll be in charge to do it π).
QUAST - Give an idea about how good your assembly is. But, QUAST was not set up for comparing genome assemblies. I guess you can easily have a better comparison going directly to the NCBI genome.
GUNC - This is a new tool for detecting and quantifying chimerism. In my opinion, it is better than CheckM. Don't forget to specify the GUNC database path installed on your computer/server.
All the best for us.