Description

A Snakemake pipeline for the analysis of data produced by Butt-Seq library preps. It processes paired-end Butt-Seq fastq files and returns several files that may be of interest to the investigator, including:

Deduplicated BAM files, with and without small RNAs computationally removed
Single-nucleotide resolution BAM files
Stringtie assemblies and counts files for potential transcript discovery
A featurecounts table with raw counts
Normalized full-read and single nucleotide resolution BW files.

This pipeline produces analyses and outputs that not every investigator will find a use for, but some may find interesting.

Input files

This pipeline requires 3 fastq files, formatted as such:

___(sample name)_R1_001.fastq.gz - Read 1
___(sample name)_R2_001.fastq.gz - UMI Read
___(sample name)_R3_001.fastq.gz - Read 2

To produce 3 read files from a sequencing run, edit this line in RunInfo.xml:

to

and demultiplex using Bcl2fastq as usual.

Output Files

Deduplicated BAM files, with and without small RNAs computationally removed: found in dedup/ and subSno/
Single-nucleotide resolution BAM files: found in SNR/
Stringtie assemblies and counts files for potential transcript discovery: found in stringtie/
A featurecounts table with raw counts: found in results/counts.featureCounts
Normalized full-read and single nucleotide resolution BW files, separated by strand: found in bws/paired and bws/SNR, respectively

'beds/': A bed file describing the 3' end of small RNAs and one bed file describing the 3' end of exons in the Drosophila Melanogaster assembly dm6.
'scripts/': A series of scripts used to process the data, including:
- removeclipping.py: A script from NGSUtils, slightly modified to suit this pipeline (Breese et al, 2013). This script removes softclipped reads from the BAM file prior to conversion into single nucleotide reads. Without removing softclipping first, the following script will erroneously assign the 3' most end as the soft-clipped base.
- get_SNR_bam.py: A script by Tomás Gomes, slightly modified to suit this pipeline (Nojima et al, 2015). Converts a bam file to only contain the first base of Read 2.
- deseq2_normalization.R: An Rscript to load the featurecounts table into DESeq2 and output normalization factors, which are using as scaling factors by bamCoverage when converting to BigWig files for visualization and/or metagene plotting.

Installation and usage

Installation

If you are new to conda and/or snakemake, you'll need to install the appropriate version of miniconda3 for your operating system.

Clone or download this github repo into a directory containing your RNA-Seq files.
Inside your terminal, create a conda environment with the necessary dependencies by running the following command:

conda env create --name butt --file envs/environment.yaml
Activate your new conda environment with the following command:

conda activate butt
Set up your config.yaml, as described below.
Run the Snakemake pipeline with the following command (Alter the number of cores as desired):

snakemake --cores 8 --configfile config.yaml

Configuration

'config.yaml' must be edited to the following parameters:

samples:

(SampleName_1): Path/To/Sample1

(SampleName_2): Path/To/Sample2

etc. etc.

annotation:

Path/To/GTF/Annotation

index:

Path/To/STAR/Index

smallRNAs:

Path/To/Undesirable/SmallRNAs

exonends:

Path/To/ExonEnds

Small RNAs consist of chromatin-associated RNAs that are not thought to be products of active transcription and are typically not an analyte of interest, so they are computationally removed.

3' End ligation techniques, including Butt-Seq, often capture splicing intermediates which cannot be distinguished from true polymerase pause sites, so any reads mapping precisely to the 3' end of exons are removed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Input files

Output Files

Contents

Installation and usage

Installation

Configuration

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
beds		beds
envs		envs
scripts		scripts
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml

albertdyu/BuTTSeq

Folders and files

Latest commit

History

Repository files navigation

Description

Input files

Output Files

Contents

Installation and usage

Installation

Configuration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages