Analysis Assignment – Pipeline Development

Author: Samuel Perini

With this assignment, I aim to demonstrate my exploratory analysis skills and methodology in three main subjects:

1. Read coverage.

2. Variant calling analysis.

3. Analytical performance.

This repository contains the scripting files that are necessary for running the pipeline but it does not contain the input data that were given for the assignment. The input data should be downloaded and placed in a directory called Files_needed_for_task and this directory should be in the same folder as the repository folder (not inside the repository). For example, both Files_needed_for_task and Bioinformatics_pipeline_development_task (the repository folder name) directories can be placed inside a folder called Assignment_Pipeline_Development.

Assignment_Pipeline_Development/
├── Bioinformatics_pipeline_development_task
└── Files_needed_for_task

The pipeline was built using Snakemake, a Python-based workflow management system, and after cloning this repository inside the directory Assignment_Pipeline_Development, the same bioinformatics steps can be performed by typing the following command line codes:

Install Miniconda
Create an environment called pipeline_development with the required software listed inside the YAML file and activate it

    cd Bioinformatics_pipeline_development_task
    conda env create --name pipeline_development --file ./envs/environment.yaml
    conda activate pipeline_development

Create other environments with the required up-to-date software listed inside different YAML files. The packages of these new environments will be activated inside the pipeline to avoid conflicts with the main environment. Just run the commands below to create them but do not activate these new environments.

    conda env create --prefix ./envs/r-envir --file ./envs/r-environment.yaml
    conda env create --prefix ./envs/var_call_v1 --file ./envs/var-call-env1.yaml
    conda env create --prefix ./envs/var_call_v2 --file ./envs/var-call-env2.yaml

Execute the workflow locally printing the commands -p and using 1 core (change the number of cores as you wish).

    snakemake -pn                           # Dry run: execution without generating output files
    snakemake --dag | dot -Tsvg > dag.svg   # Create Figure 1 of the HTML report
    snakemake -p --cores 1                  # Execute properly the pipeline

The main output is the MultiQC HTML file qc/chr19_multiqc.html which contains summary statistics and interactive plots that will help understanding the HTML report Report_pipeline_development_Samuel.html.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.snakemake		.snakemake
chrom		chrom
envs		envs
qc		qc
rules		rules
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Report_pipeline_development_Samuel.Rmd		Report_pipeline_development_Samuel.Rmd
Report_pipeline_development_Samuel.html		Report_pipeline_development_Samuel.html
Snakefile		Snakefile
config.yaml		config.yaml
dag.svg		dag.svg
evolution.csl		evolution.csl
literature.bib		literature.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analysis Assignment – Pipeline Development

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sam0per/Bioinformatics_pipeline_development_task

Folders and files

Latest commit

History

Repository files navigation

Analysis Assignment – Pipeline Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages