Skip to content

Ahuiting/DNAPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paired-end DNA pipeline

Description

This snakemake pipeline is designed for paired-end NGS DNA

Inputs

  • Path of fastqc files
  • Name of outout folder
  • Reference genome

Outputs

  • multiqc_data\ - Dictionary containing the summary results of all the tools, inculde multiqc.html
  • logs\ - Directory of log files for each job, check here first if you run into errors
  • working\ - Directory containing intermediate files for each job

Workflow

  1. **QC--fastqc
  2. **Trimming--trim galore
  3. **QC--fastqc
  4. **Align--bwa
  5. **Sort--samtools
  6. **Deduplicate--picard
  7. **Summary--multiqc

Setup environment

  1. Install conda

  2. Clone workflow into working directory

    git clone <repo> <dir>
    cd <dir>
  3. Create a new enviroment

    conda env create -n <project_name> --file environment.yaml
  4. Activate the environment

    conda activate <project_name>
  5. Enable the Bioconda channel

    conda config --add channels bioconda
    conda config --add channels conda-forge
    
  6. Install snakemake

    conda install snakemake

Run workflow

  1. Edit configuration files

    change the path of fastq_dir, output_dir, reference_genome in "config.yaml"

  2. Execute the workflow.

  • The first time you are executing this snakemake pipeline it should run locally, once the first run is over (you can use --dry), you can switch to running it on the cluster.
    snakemake --configfile "config.yaml" --use-conda  --cores N

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published