GitHub

###INTRODUCTION rrnaseq provides a suite of programs to generate basic plots as well as QC-filtering of RNA-seq data. The programs are written in R and are executable from the command-line. It also provides a script that can run the whole suite of programs, called rqc. All programs can be found in the 'bin' sub-directory.

Currently the starting input is a tab-separated file with RPKM values and raw read counts output by rpkmforgenes.py. Most programs also require a file with meta-information about the samples, which can be generated by running 'make_summary_starlog.sh', see the "HOW TO RUN" section below.

###INSTALLATION

If you don't want version control you find releases at https://github.com/edajeda/rrnaseq/releases

Install R dependencies with install.packages or via biocLite. In R:

 pkgs = c('DESeq2', 'genefilter', 'statmod', 'gplots',
 'RColorBrewer', 'impute', 'moduleColor', 'graphics', 'getopt')
 source('http://www.bioconductor.org/biocLite.R')
 biocLite(pkgs)

Add the directory with binaries to your shell path (to for example .profile on OS X or .bashrc in Linux):

export PATH="/home/user/prg/rrnaseq/bin:$PATH"

###HOW TO RUN Below you find an example of how to run the three first criticial steps (make_summary_starlog.pl, get_meta, get_expr). 'get_expr' and 'get_meta', take the output from the programs 'rpkmforgenes.py' and 'make_summary_starlog.sh' as input and generates data matrices with expression values and a matrix with meta-information about the samples, respectively, and store these as separate files in rds-format. All other programs rely on the output from get_meta and get_expr and are straightforward to use, see for example the 'mapstats' call below.

#Define input and output dirs and files
#IN
projectdir='/path/to/your/PROJECT'
stardir=${projectdir}'/star_hg19'
rpkmforgenes_file=${projectdir}/rpkmforgenes_star_hg19/refseq_rpkms.txt

#OUT
datadir=${projectdir}/'rqc/refseq/data'
pdfdir=${projectdir}/'rqc/refseq/pdf'
brenneckedir=${projectdir}'/rqc/diffexp/brennecke'

#Create and change dir
mkdir -p $datadir
cd $datadir

#Get mapping statistics from STAR logs
make_summary_starlog.pl ${stardir} >mapstats..tab

#Get sample meta-information
get_meta -i mapstats.tab -o meta.rds

#Get expression values
get_expr -i $rpkmforgenes_file -o $datadir

#Plot mapping stats
mapstats -m meta.rds -c counts.rds -o ${pdfdir}/'seq_mapping' -q qc.rds

#Dry-run the program 'rqc' to generate a shell script with possible commands to execute
rqc -m mapstats.tab -e $rpkmforgenes_file -d $datadir -p $pdfdir -b $brenneckedir -y
cat rqc.sh

#####Further examples Above, the program 'rqc' was dry-run to generate a shell script (rqc.sh) with possible commands to execute. Look in rqc.sh and change or add input arguments as you wish.

You can also see test/rqc.sh for a complete list of available programs and example program calls, but there the directories are set according to the test directory.

###TEST AND EXAMPLE OUTPUT Example output you find in the 'test/rqc' subdirectory. The file 'run.rqc.sh' in the 'test' subdirectory provides an example of how to run the script 'rqc' that with the dry-run flag will generate a file (rqc.sh) with commands that calls all of the available programs in the rrnaseq suite. See 'run.rqc.sh' and the generated 'rqc.sh' file for a test example:

cd test
sh run.rqc.sh 
cat rqc.sh

###QC-filter To filter genes use the program 'gene_filter'. To filter samples use the program 'sample_filter'. This program relies on an input file (default: qc.rds), which contains a data matrix with all samples as rows and different qc-metrics as columns. Elements in this qc-matrix is set to 1 if a sample failed QC for a particular QC-metric. The QC-metric columns of the qc-matrix is added when running the corresponding program, for example, if you want to add a QC-column relating to the number of expressed genes per sample, run the program 'sample2ngenes_expr'. To then apply the filter run 'sample_filter' with the column-name of that QC-metric as an argument. See test/rqc.sh for an example.

###GETTING HELP Each program have several input arguments that should be considered. For a list of all available arguments for a program use the -h flag, for example:

pca -h

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
annot		annot
bin		bin
doc		doc
lib		lib
test		test
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

eyay/rrnaseq

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages