Skip to content

Bacterial genome assembly and prediction -snakemake

Notifications You must be signed in to change notification settings

Geize/BactGenomeAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Jan 15, 2023
f2dba99 Β· Jan 15, 2023

History

12 Commits
Nov 16, 2022
Nov 16, 2022
Jan 15, 2023
Jan 15, 2023
Nov 16, 2022
Jan 2, 2023
Jan 15, 2023

Repository files navigation

Bacterial genome assembly and prediction - snakemake


Hi Folks! πŸ˜€

First thing first. Our mantra πŸ•‰οΈ : This repository is not a tutorial. It is just for reproducing my work. However, you are more than welcome to use this workflow. And if you find any error, please, you'll welcome as well to complain (but not too much 😊). I'll be glad to fix it.

The workflow will do:

  1. Quality control of llumina MiSeq reads (paired-end reads, PE) - FastQC.

  2. Trimmed the raw reads - Trimmomatic.

  3. Assembly the quality-filtered paired-end reads - De novo assembly - SPAdes.

  4. Quality assessment for evaluating genome assembled - QUAST.

  5. Detection chimera or contamination - GUNC.

  6. Prediction and annotation - Prokka.

  7. Folders have the same name of each tool used.

Important points:

Create a folder named reads/ and transfer your "fastq.gz" to this folder. Then, rename your "fastq.gz" files to {dadada}_1.fastq.gz and {dadada}_2.fastq.gz.

From my repository: download to your area the file with all PE sequence adapters in Adapter folder for trimming step, and GenomeAnalysis.yaml file in env folder to recreate the the same environment that I use to process my data.

$ conda env create -n snake -f GenomeAnalysis.yaml

$ conda activate snake

Now, everything is ready to run the workflow.

**Additional information:**πŸ”₯

SPAdes is still the best assembler for bacterial genome assembly (considering that you are using PE). That's why you won't find another assembler as a second option. However, if you still want to try another assembler, it is very easy to add a new rule or replace the current one in the workflow (but, you'll be in charge to do it πŸ˜„).

QUAST - Give an idea about how good your assembly is. But, QUAST was not set up for comparing genome assemblies. I guess you can easily have a better comparison going directly to the NCBI genome.

GUNC - This is a new tool for detecting and quantifying chimerism. In my opinion, it is better than CheckM. Don't forget to specify the GUNC database path installed on your computer/server.

All the best for us.

About

Bacterial genome assembly and prediction -snakemake

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages