This work focuses on the study of Yang et al. (2016) who were interested in the epthithelium-mesenchymal transition (EMT) process. In their work, the EMT has been induced by ectopic expression of Zeb1 in a lung cancer cell line (H358). The authors have studied RNAseq data over 7 days, starting from uninduced cells.
The initial data are available on the NCBI site. In order to reduce time computation, we used only 0.5% of the total RNAseq data at the following address: http://rssf.i2bc.paris-saclay.fr/X-fer/AtelierNGS/TPrnaseq.tar.gz
The pipeline runs on bash. Some package are required for launching some commands such as fastqc, trimmomatic and featureCounts.
sudo apt-get install -y fastqc # For using fastqc
conda install -c bioconda trimmomatic # For using trimmomatic
sudo apt-get install -y subread # For using featureCounts
A machine with at least 16 GB of FREE RAM (to create the index and the mapping on the chromosome 18 of the reference genome).
The pipeline is used to create a file named "hugo-counts.txt" to which is associated, for each gene, the HUGO identifier and the number of reads aligned for each observation. This file is available in the repository Data/Counts. The steps are the followings.
- Clone the Github repository to your machine
git clone https://github.com/Theo-Roncalli/RNAseq-EMT.git
cd RNAseq-EMT
- Importation of reads and reference genome
bash install.sh
- Creation of the counting file which contains, for each HUGO code in Chromosome 18, the numbers of reads per gene and per observation.
bash counting.sh
For cleaning the repository (i.e. delete Data and Figures folders), please type:
bash clean.sh