This is an airqtl tutorial to map single-cell expression quantitative trait loci (sceQTLs) and infer causal gene regulatory networks (cGRNs) at the cell state level of specificity. The Randolph et al dataset from their original study is used.
This tutorial is being actively updated. Please check back often.
- Install airqtl and download this folder
- (Optional) Customize pipeline configuration in
Snakefile.config
, especially thedevice
parameter if you prefer to use a CPU or a different GPU. See Understanding and customizing the tutorial. - Run the pipeline with
snakemake -j 1
twice in shell. The first run will download the raw dataset from Zenodo. The second run will read in the cell states to map sceQTLs infer cGRNs for each cell state. - Check the sceQTL output files at
data/association
and cGRN output file atdata/merge.tsv.gz
.
The whole run takes ~1 day on a top-end Dell Alienware Aurora R16®, in which single-cell eQTL mapping takes ~10mins for each cell state. The download step can take longer if your internet is slow.
After a successful run of this tutorial, you can repurpose it for your own dataset.
- Input files of the pipeline are described in the
datasetfiles_data
anddatasetfiles_meta
variables in airqtl.pipeline.dataset. Check the downloaded files indata/raw/
to understand their format. - Each step of the pipeline is defined as a rule sequentially in
Snakefile
. Take the sceQTL association as an example, it corresponds to i) the shell commandairqtl eqtl association
and ii) the python functionairqtl.pipline.eqtl.association
. Therefore, you can learn more from either the commandairqtl eqtl association -h
or the docstring ofairqtl.pipline.eqtl.association
. The output files and logs of each step are located indata/x
andlog/x.log
respectively, where x is the name of the step/rule and can be either a folder or a file with name suffix. Some of the steps are run once for the whole dataset while some are run separately for each cell state. - To change pipeline parameters, modify
Snakefile.config
. You can use custom command-line parameters of each step according to their accepted parameters such as those obtained fromairqtl eqtl association -h
. - To run the tutorial pipeline in parallel or on a cluster, modify
Snakefile
which is based on Snakemake.
- Run this tutorial pipeline successfully
- Understand the format of input files in
data/raw
folder - Perform initial quality control of your own dataset
- Download this tutorial folder to a new location on your computer
- Reformat your own dataset into the accepted format and place the files in newly created
data/raw
folder - Customize the pipeline as needed
- Run the pipeline for your own dataset
- Check the output files
If you encounter any error, you are suggested to first troubleshoot on your own. The error logs are located inside console output and the folder log
.
If you cannot resolve the error or have any other question, please check the FAQ or raise an issue.
If you applied any fix to the code or pipeline, you are strongly suggested to start over from step 1 of Running the tutorial, unless you are experienced and know what you are doing.