Skip to content

Commit

Permalink
Initial import.
Browse files Browse the repository at this point in the history
  • Loading branch information
johanneskoester committed Oct 6, 2017
0 parents commit 959df16
Show file tree
Hide file tree
Showing 15 changed files with 284 additions and 0 deletions.
19 changes: 19 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
*
!scripts
!scripts/*
!scripts/common
!scripts/common/*
scripts/.snakemake*
!Snakefile
!config.yaml
!samples.tsv
!resources
!resources/*
!envs
!envs/*
!environment.yaml
!LICENSE
!README.md
!rules
!rules/*
!.gitignore
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2017, Johannes Köster

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Snakemake workflow: rna-seq-star-deseq2

[![Snakemake](https://img.shields.io/badge/snakemake-≥4.1.0-brightgreen.svg)](https://snakemake.bitbucket.io)
[![Build Status](https://travis-ci.org/snakemake-workflows/rna-seq-spew.svg?branch=master)](https://travis-ci.org/snakemake-workflows/rna-seq-spew)

This workflow performs a differential expression analysis with STAR and Deseq2.
It is currently under development. No stable release is available yet.

## Authors

* Johannes Köster (@johanneskoester)

## Usage

### Step 1: Install workflow

If you simply want to use this workflow, download and extract the [latest release](https://github.com/snakemake-workflows/rna-seq-spew/releases).
If you intend to modify and further develop this workflow, fork this reposity. Please consider providing any generally applicable modifications via a pull request.

In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, if available, its DOI (see above).

### Step 2: Configure workflow

Configure the workflow according to your needs via editing the file `config.yaml`.

### Step 3: Execute workflow

Test your configuration by performing a dry-run via

snakemake -n

Execute the workflow locally via

snakemake --cores $N

using `$N` cores or run it in a cluster environment via

snakemake --cluster qsub --jobs 100

or

snakemake --drmaa --jobs 100

See the [Snakemake documentation](https://snakemake.readthedocs.io) for further details.
17 changes: 17 additions & 0 deletions Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import pandas as pd


configfile: "config.yaml"
samples = pd.read_table("samples.tsv", index_col=0)


rule all:
input:
expand("results/deseq/{contrast}.tsv",
contrast=config["diffexp"]["contrasts"]),
"results/pca.pdf"


include: "rules/trim.smk"
include: "rules/align.smk"
include: "rules/diffexp.smk"
18 changes: 18 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# the sequencing adapter
adapter: ACGGATCGATCGATCGATCGAT

star:
# the STAR index
index: "path/to/star/index"

pca:
labels:
# columns of sample sheet to use for PCA
- condition

diffexp:
# contrasts for the deseq2 results method
contrasts:
treated-vs-untreated:
- treated
- untreated
6 changes: 6 additions & 0 deletions envs/deseq2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- bioconductor-deseq2 =1.16.1
26 changes: 26 additions & 0 deletions rules/align.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
def get_trimmed(wildcards):
if samples.loc[wildcards.sample, "fq2"]:
# paired-end sample
return expand("trimmed/{sample}.{group}.fastq.gz",
sample=wildcards.sample, group=[1, 2])
# single end sample
return "trimmed/{sample}.fastq.gz"


rule align:
input:
sample=get_trimmed
output:
# see STAR manual for additional output files
"star/{sample}/Aligned.out.bam",
"star/{sample}/ReadsPerGene.out.tab"
log:
"logs/star/{sample}.log"
params:
# path to STAR reference genome index
index=config["star"]["index"],
# optional parameters
extra="--quantMode GeneCounts"
threads: 8
wrapper:
"0.17.4/bio/star/align"
52 changes: 52 additions & 0 deletions rules/diffexp.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
rule count_matrix:
input:
expand("star/{sample}/ReadsPerGene.out.tab", sample=samples.index)
output:
"counts/all.tsv"
params:
samples=samples.index
script:
"../scripts/count-matrix.py"


rule deseq2_init:
input:
counts="counts/all.tsv",
samples="samples.tsv"
output:
"deseq2/all.RData"
conda:
"../envs/deseq2.yaml"
script:
"../scripts/deseq2-init.R"


rule pca:
input:
"deseq2/all.RData"
output:
"results/pca.pdf"
params:
pca_labels=config["pca"]["labels"]
conda:
"../envs/deseq2.yaml"
script:
"../scripts/pca.R"


def get_contrast(wildcards):
return config["diffexp"]["contrasts"][wildcards.contrast]


rule deseq2:
input:
"deseq2/all.RData"
output:
table="results/diffexp/{contrast}.diffexp.tsv",
ma_plot="results/diffexp/{contrast}.ma-plot.pdf",
params:
contrast=get_contrast
conda:
"../envs/deseq2.yaml"
script:
"../scripts/deseq2.R"
31 changes: 31 additions & 0 deletions rules/trim.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
def get_fastq(wildcards):
return samples.loc[wildcards.sample, ["fq1", "fq2"]].dropna()


rule cutadapt_pe:
input:
get_fastq
output:
fastq1="trimmed/{sample}.1.fastq.gz",
fastq2="trimmed/{sample}.2.fastq.gz",
qc="trimmed/{sample}.qc.txt"
params:
config["cutadapt"]["params"]
log:
"logs/cutadapt/{sample}.log"
wrapper:
"0.17.4/bio/cutadapt/pe"


rule cutadapt:
input:
get_fastq
output:
fastq="trimmed/{sample}.fastq.gz",
qc="trimmed/{sample}.qc.txt"
params:
config["cutadapt"]["params"]
log:
"logs/cutadapt/{sample}.log"
wrapper:
"0.17.4/bio/cutadapt/se"
1 change: 1 addition & 0 deletions samples.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sample condition fq1 fq2
1 change: 1 addition & 0 deletions scripts/common/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Any Python script in the scripts folder will be able to import from this module and beyond.
5 changes: 5 additions & 0 deletions scripts/count-matrix.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import pandas as pd

matrix = pd.concat([pd.read_table(f, index_col=0)[1] for f in snakmake.input],
axis=1, names=snakemake.params.samples)
matrix.to_csv(snakemake.output[0], sep="\t")
14 changes: 14 additions & 0 deletions scripts/deseq2-init.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
library("DESeq2")

# colData and countData must have the same sample order, but this is ensured
# by the way we create the count matrix
dds <- DESeqDataSetFromMatrix(countData=snakemake@input[["counts"]],
colData=snakemake@input[["samples"]],
design=~ condition)

# remove uninformative columns
dds <- dds[ rowSums(counts(dds)) > 1, ]
# TODO optionally allow to collapse technical replicates
dds <- DESeq(dds)

save(dds, file=snakemake.output[[1]])
19 changes: 19 additions & 0 deletions scripts/deseq2.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
library("DESeq2")

dds <- load(snakemake@input[[1]])

contrast <- c("condition", snakemake@params[["contrast"]])
res <- results(dds, contrast=contrast)
# shrink fold changes for lowly expressed genes
res <- lfcShrink(dds, contrast=contrast, res=res)
# sort by p-value
res <- res[order(res$padj),]
# TODO explore IHW usage


# store results
pdf(snakemake@output[["ma_plot"]])
plotMA(res, ylim=c(-2,2))
dev.off()

write.table(as.data.frame(res), file=snakemake@output[["table"]])
10 changes: 10 additions & 0 deletions scripts/pca.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
library("DESeq2")

# load deseq2 data
dds <- load(snakemake@input[[1]])

# obtain normalized counts
ntd <- normTransform(dds)
pdf(snakemake@output[[1]])
plotPCA(ntd, intgroup=snakemake@params[["pca_labels"]])
dev.off()

0 comments on commit 959df16

Please sign in to comment.