The scRNApipe pipeline was originally designed to preprocess and analyse scRNA-Seq data, following the CEL-Seq2 protocol, on the Illumina platforms. Nevertheless, by data transformation, provided by UMIS package, will allow most of single cell protocols to be run through this pipeline.
Read transformation will combine reads into one containing cell, sample and umi barcode sequences incorporated in the read name + a unique identifier (UID) created by concatenating those three barcodes. This UID will allow UMI-tools to remove PCR duplicates on bamfiles containing multiple cells.
In principle, the raw data will be readtransformed and filtered (by UMIs), aligned, gene-deduplicate (by UMI-tools) and counted.
- Quality metrics (optional)
- Preprocessing
- Aligning
- Main Analysis
- Expression Matrix
Main options to tune in this pipeline:
- The Main Analysis can run in count/dedup per contig or default mode (instead of gene)
- Skip deduplication
Detailed reports will be generated for each sample by FastQC. An the end a summarised report will be available for an overall review of all samples at once.
- umis fastqtrasnform (read transformation)
- cb_filter (filtering reads with non-matching CELLULAR barcodes (CB) | 1 mismatch is allowed)
- sb_filter (filtering reads with non-matching SAMPLE barcodes (SB) | 1 mismatch is allowed)
- mb_filter (removing reads with ambiguous (e.g N) bases in the UMI barcodes)
- add_uid (add the UID and save as fastq.gz)
The read name after preprocessing will include CELL_BARCODE:UMI_BARCODE:SAMPLE_BARCODE:UID_[[samplebarcode][cellbarcode][umi]]
Aligning the preprocessed reads against the reference genome by the use of the STAR aligner.
- Counting reads using featureCounts
- Adding XF:Z: tag to the BAM file containing the GeneID
- Deduplication using UMI-Tools
Generate the Expression Matrix based on the GeneID tags
If you'd like to work directly from the git repository:
$ git clone https://github.com/MarinusVL ...
Enter repository and run:
$ python setup.py install
After installation the pipeline can be used:
$ scRNApipe <configuration_file.txt>
For further information about each compartment of the pipeline you can run:
$ scRNApipe --help
scRNApipe is dependent on umis, umi_tools, numpy, pysam, STAR, featureCounts, fastqc and multiqc and Python 2.7