The RNA-seq data simulator (RSDS) is a command-line interface implemented in Python 3. The tool simulates raw RNA-sequencing data by emulating characteristics of real RNA-seq data. Parameters to control the properties of the simulated data are available as tuneable settings such as fragment-length distribution, customized Phred-quality score modelling, customized transcript expression profiling, as well as the simulation of paired-end and single-end data.
- python >=3.7
- pyfaidx
- numpy
- pandas
- scikit-learn
- scipy
The source code can be downloaded from github and rsds can be installed using the setup.py script:
git.clone https://github.com/darrynzimire/RNASeqDataSimulator.git
cd RNASeqDataSimulator
pip install rsds
Alternatively to install in user space
pip install --user rsds
To simulate a RNA-seq dataset, call rsds from the command-line:
$ rsds
Here is a list of all the parameters for running the RSDS program
Parameters | Description |
---|---|
-r | Read length (default value = 100) |
-n | Number of reads to simulate |
-f | Reference transcriptome file in FASTA format |
-s | Random seed value for reproducibility |
-o | Output file prefix |
-seqmodel | Phred quality-score model |
-countModel | Read count table model file |
-FL | Fragment length distribution parameters (default: mean=250, standard deviation=25 |
-SE | Single-end RNA-seq data |
-PE | Paired-end RNA-seq data |
Some of the scripts contained here, require reference datasets which have not been included in this repository. These datasets are:
- A reference transcriptome file in FASTA format
- RNA-sequencing data
A list of the parameters for generating a quality-score model
Parameters | Description |
---|---|
-i | FASTQ file (read 1) |
-i2 | FASTQ file (read 2) |
-o | Output file prefix |
-q | Quality-score offset |
-Q | Maximum quality score |
-n | Maximum number of reads to process |
-s | Number of simulation iterations |
- Python 3.7
- Numpy
- Darryn Zimire
This project was funded in part by:
- South African National Research Foundation (SA NRF)
- South African Medical Research Council (SA MRC)
- Bill & Melinda Gates Foundation
This project is licensed under the MIT License - see the LICENSE.md file for details
- Professor Gerard Tromp (PhD) - Masters Degree Supervisor
- South African Tuberculosis Bioinformatics Initiative
- University of Stellenbosch
- Stephens et al.,(2016)