RiboSeqPy

Transferred from tmargus github page.

Collection of routines for processing Ribo-Seq data joined into Python pipeline. Our main aim is to analyze changes of translational dynamic between two different conditions in yeast. We estimate differences in codon occupancy between two conditions and looking for context what is likely related to higher/lower codon occupancy.

The first part of pipeline starts with fastq preprocessing continues with aligning reads to genome, mapping ribosome positions (uncorrected) and ends with producing metagene plots around start and stop codons.

Second part of pipeline corrects mapped RPF positions according given offsets in the readlength_offsets_5-End.txt to 5' position of P-Site codon: P-Site assignment. It calculates codon relative rpm and codon relative fold difference (FD) and adds sequence information like codons in E,P,A-Site; nucleic acid sequence and it's translation extending from A-Site to tunnel (by default 11 amino acids). FD calculation assumes you have Ribo-Seq data for two conditions (A condition; B wild type). Important limitation is that FD calculation can't handle multi-exon genes. This is how far it goes in the moment.

Prerequisite

Python:

It's ok to use system python but have you own local version gives more flexibility. I used Anaconda Python v.3.5 from Continuum. It comes with a bunch of libraries and have a nice package manager conda. Before conda is able to install bioinformatic libraries/programs you have add the bioconda channel.

    conda config --add channels bioconda

wget Detail installation guide for OS-X and precompiled binaries from Rudix site.
cutadapt

    conda install cutadapt

pigz - optional if not installed cutadapt falls back to single core mode
HISAT2

version 2.0.5 or higher. Hisat2 trims ends of reads with bad quality by default. That leads to uncorrect mapping of ribosome location. From the version 2.0.5 there is an option to turn this behavior off.
bowtie2
samtools

    conda install samtools

pysam

    conda install pysam

Additional data files

Genome.fa - genome sequence in FastA format
ncRNA.fa - non coding RNA in FastA format
Genome.gtf - genome annotation in GTF (gff2) format

Other data files are derived based on those three and commands for that are described in the file build_index.sh. Saccharomyces cerevisiae genome, annotation, ncRNA and indexes are locating in the folder 0-References/. Dummy dataset for testing purpose with 1 milj. reads locates in the folder 1-Raw/.

Usage

Pipeline is split in two parts and controlled by parameters in the file Param.in. Part 1 runs up to mapping read ends to genome and producing metagene plots (steps 1 - 8 in Param.in). Part 2 creates corrected P-Site assignment and computes codon relative fold differences (FD) of between 2 conditions. In Param.in you can specify steps you want to run, read length range, mapping (5' or 3') etc..

    python  Pipeline_part_1.py

Edit offset file (readlength_offsets.txt) according read length and offsets.

    python  Pipeline_part_2.py

Logfiles are generated for each step and stored in Reports/ folder.

Limitations

One important limitation is that calculation of codon relative fold difference can't handle multi-exon genes in the moment. This limitation restricts its usage with bacteria and yeast.

Hard coded variables

Some variables are hard coded in to python script. (a) Adapter sequence for cutadapt is CTGTAGGCACCATCAAT. Locates in Pipeline_part_1.py function cutAdapt(). (b) Chromosome names (I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, Mito) must be compatible with Genome.fa and Genome.gtf.
Locates in Pipeline_part_1.py and Pipeline_part_2.py function yeastChr(). (c) GTF file from ensembl, i. e. must contain features stop_codon and start_codon

References

Firs parts of the code and pipeline backbone is based on a code used in (1) Radhakrishnan, A., et al. Cell (2016) https://github.com/greenlabjhmi/2016-Cell-Dhh1. Second part for calculating codon relative fold difference is Python 3 adaptation similar to (2) Kannan, K., et al. PNAS (2014)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.idea		.idea
0-References		0-References
1-Raw		1-Raw
scripts		scripts
E-MTAB-6938.sdrf.txt		E-MTAB-6938.sdrf.txt
LICENSE		LICENSE
Param.in		Param.in
Pipeline_part_1.py		Pipeline_part_1.py
Pipeline_part_2.py		Pipeline_part_2.py
README.md		README.md
readlength_offsets_eEF3.txt		readlength_offsets_eEF3.txt
readlength_offsets_new1d.txt		readlength_offsets_new1d.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RiboSeqPy

Prerequisite

Additional data files

Usage

Limitations

Hard coded variables

References

About

Releases

Packages

Languages

License

GCA-VH-lab/RiboSeqPy

Folders and files

Latest commit

History

Repository files navigation

RiboSeqPy

Prerequisite

Additional data files

Usage

Limitations

Hard coded variables

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages