Skip to content

Commit

Permalink
Improved code quality.
Browse files Browse the repository at this point in the history
Updated documentation.
Version bump to 0.1.1.
  • Loading branch information
lingfeiwang committed Jan 29, 2025
1 parent 3d4c815 commit 86c2e98
Show file tree
Hide file tree
Showing 22 changed files with 143 additions and 589 deletions.
21 changes: 13 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,40 @@ Airqtl
=========
Airqtl is an efficient method to map expression quantitative trait loci (eQTLs) and infer causal gene regulatory networks (cGRNs) from population-scale single-cell studies. The core of airqtl is Array of Interleaved Repeats (AIR), an efficient data structure to store and process donor-level data in the cell-donor hierarchical setting. Airqtl offers over 8 orders of magnitude of acceleration of eQTL mapping with linear mixed models, arising from its superior time complexity and Graphic Processing Unit (GPU) utilization.

**This respository is being actively updated. Please check back later.**

Installation
=============
Airqtl is on `PyPI <https://pypi.org/project/airqtl>`_. To install airqtl, you should first install `Pytorch 2 <https://pytorch.org/get-started/locally/>`_. Then you can install airqtl with pip: ``pip install airqtl`` or from github: ``pip install git+https://github.com/grnlab/airqtl.git``. Make sure you have added airqtl's install path into PATH environment before using the command-line interface (See FAQ_). Airqtl's installation can take several minutes including installing dependencies.

Usage
=====
Airqtl provides command-line and python interfaces. For starters, you can run airqtl by typing ``airqtl -h`` on command-line. See our tutorials below.
Airqtl provides both command-line and python interfaces. For starters, you can run airqtl by typing ``airqtl -h`` on command-line. Try our tutorial below and adapt it to your own dataset.

Tutorials
==========================
Currently we provide one tutorial to map cell state-specific single-cell eQTLs and infer cGRNs from the Randolph et al dataset in `docs/tutorials`. We are working on better documentation so you can easily understand the tutorial and repurpose it for your own dataset.
Currently we provide `one tutorial <docs/tutorials/randolph>`_ to map cell state-specific single-cell eQTLs and infer cGRNs from the Randolph et al dataset in `docs/tutorials`.

Issues
==========================
Pease raise an issue on `github <https://github.com/grnlab/airqtl/issues/new>`_.

References
==========================
TBA
* `"Airqtl dissects cell state-specific causal gene regulatory networks with efficient single-cell eQTL mapping" <https://www.biorxiv.org/content/10.1101/2025.01.15.633041>`_ (2025) by Lingfei Wang. bioRxiv 2025.01.15.633041.

FAQ
==========================
* What does airqtl stand for?
* **What does airqtl stand for**?
Array of Interleaved Repeats for Quantitative Trait Loci

* I installed airqtl but typing ``airqtl`` says 'command not found'.
* **Why do I see this error:** ``AssertionError: Torch not compiled with CUDA enabled``?

This is because you installed a CPU-only pytorch but tried to run it on GPU. You have several options:

1. To run pytorch on **CPU**, set `device='cpu'` in `Snakefile.config` of the tutorial pipeline you use.
2. To run pytorch on **GPU**, reinstall pytorch with GPU support at `Installation`_.

* **I installed airqtl but typing ``airqtl`` says 'command not found'**.
See below.

* How do I use a specific python version for airqtl's command-line interface?
* **How do I use a specific python version for airqtl's command-line interface**?
You can always use the python command to run airqtl, such as ``python3 -m airqtl`` to replace command ``airqtl``. You can also use a specific path or version for python, such as ``python3.12 -m airqtl`` or ``/usr/bin/python3.12 -m airqtl``. Make sure you have installed airqtl for this python version.
37 changes: 33 additions & 4 deletions docs/tutorials/randolph/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,38 @@
# Cell state-specific single-cell eQTL mapping and cGRN inference

This is an [airqtl](https://github.com/grnlab/airqtl) tutorial to map expression quantitative trait loci (eQTLs) and infer causal gene regulatory networks (cGRNs) at the cell state level of specificity. The Randolph et al [dataset](https://zenodo.org/records/4273999) from their [original study](https://www.science.org/doi/full/10.1126/science.abg0928) is used.
This is an [airqtl](https://github.com/grnlab/airqtl) tutorial to map single-cell expression quantitative trait loci (sceQTLs) and infer causal gene regulatory networks (cGRNs) at the cell state level of specificity. The Randolph et al [dataset](https://zenodo.org/records/4273999) from their [original study](https://www.science.org/doi/full/10.1126/science.abg0928) is used.

To use this tutorial, first [install airqtl](https://github.com/grnlab/airqtl#installation) and download this folder. You may need to update `Snakefile.config`, especially the `device` parameter if you prefer to use a CPU or a different GPU. Then the pipeline can be run with `snakemake -j 1` in shell environment. It takes ~1 day on a top-end Dell Alienware Aurora R16®, in which single-cell eQTL mapping takes ~10mins for each cell state.
**This tutorial is being actively updated. Please check back often.**

More documentation is underway to help you understand and customize this pipeline and repurpose it for your own data.
## Running the tutorial
1. [Install airqtl](https://github.com/grnlab/airqtl#installation) and download this folder
2. (Optional) Customize pipeline configuration in `Snakefile.config`, especially the `device` parameter if you prefer to use a CPU or a different GPU. See [Understanding and customizing the tutorial](#Understanding-and-customizing-the-tutorial).
3. Run the pipeline with `snakemake -j 1` **twice** in shell. The first run will download the raw dataset from Zenodo. The second run will read in the cell states to map sceQTLs infer cGRNs for each cell state.
4. Check the sceQTL output files at `data/association` and cGRN output file at `data/merge.tsv.gz`.

If you face any issues or need any assistance, see [FAQ](https://github.com/grnlab/airqtl#faq) and [Issues](https://github.com/grnlab/airqtl#issues).
The whole run takes ~1 day on a top-end Dell Alienware Aurora R16®, in which single-cell eQTL mapping takes ~10mins for each cell state. The download step can take longer if your internet is slow.

After a successful run of this tutorial, you can [repurpose it for your own dataset](#Repurposing-the-tutorial-pipeline-for-your-own-dataset).

## Understanding and customizing the tutorial
* Input files of the pipeline are described in the `datasetfiles_data` and `datasetfiles_meta` variables in [airqtl.pipeline.dataset](../../../src/airqtl/pipeline/dataset.py). Check the downloaded files in `data/raw/` to understand their format.
* Each step of the pipeline is defined as a rule sequentially in `Snakefile`. Take the sceQTL association as an example, it corresponds to i) the shell command `airqtl eqtl association` and ii) the python function `airqtl.pipline.eqtl.association`. Therefore, you can learn more from either the command `airqtl eqtl association -h` or the docstring of `airqtl.pipline.eqtl.association`. The output files and logs of each step are located in `data/x` and `log/x.log` respectively, where x is the name of the step/rule and can be either a folder or a file with name suffix. Some of the steps are run once for the whole dataset while some are run separately for each cell state.
* To change pipeline parameters, modify `Snakefile.config`. You can use custom command-line parameters of each step according to their accepted parameters such as those obtained from `airqtl eqtl association -h`.
* To run the tutorial pipeline in parallel or on a cluster, modify `Snakefile` which is based on [Snakemake](https://snakemake.readthedocs.io/en/stable/).

## Repurposing the tutorial pipeline for your own dataset
1. [Run this tutorial pipeline](#Running-the-tutorial) successfully
2. [Understand the format of input files](#Understanding-and-customizing-the-tutorial) in `data/raw` folder
3. Perform initial quality control of your own dataset
4. Download this tutorial folder to a new location on your computer
5. Reformat your own dataset into the accepted format and place the files in newly created `data/raw` folder
6. [Customize the pipeline](#Understanding-and-customizing-the-tutorial) as needed
7. [Run the pipeline](#Running-the-tutorial) for your own dataset
8. Check the output files

## Issues
If you encounter any error, you are suggested to first troubleshoot on your own. The error logs are located inside console output and the folder `log`.

If you cannot resolve the error or have any other question, please [check the FAQ](../../../#faq) or [raise an issue](../../../#issues).

If you applied any fix to the code or pipeline, you are strongly suggested to start over from step 1 of [Running the tutorial](#Running-the-tutorial), unless you are experienced and know what you are doing.
6 changes: 3 additions & 3 deletions docs/tutorials/randolph/Snakefile.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@

#Device to use for association mapping. See Pytorch documentation for details.
device='cuda:0'
#Base directory for data for pipeline
#Base directory of data for pipeline
dirbase='data'
#Base directory for log for pipeline
#Base directory of log for pipeline
dirlbase='log'
#List of discrete cell and donor covariates to subset cells into distinct groups for separate sceQTL mapping
subset_covs=(['celltype'],['infection'])
Expand All @@ -16,7 +16,7 @@ covs="none"
#Whether to print verbose log messages
verbose=True

#Optional parameters to finetune each step. See `airqtl subset --help` or similar for details.
#Optional parameters to finetune each step. See `airqtl eqtl subset --help` or similar for details.
params_subset=''
params_qc=''
params_association=f'--device {device}'
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
[project]
name = "airqtl"
version = "0.1.0"
version = "0.1.1"
authors = [
{ name="Lingfei Wang", email="[email protected]" },
]
description = "Array of Interleaved Repeats for Quantitative Trait Loci"
readme = "README.rst"
license = {file = "LICENSE"}
keywords = ["qtl","eqtl","scqtl","sceqtl","lmm","quantitative trait loci","expression quantitative trait loci","single-cell quantitative trait loci","single-cell expression quantitative trait loci","linear mixed model","linear mixed models","population-scale scRNA-seq","gene regulatory network","network inference","causal inference","mendelian randomization"]
requires-python = ">=3.8"
requires-python = ">=3.12"
classifiers = [
"Development Status :: 4 - Beta",
"License :: OSI Approved :: BSD License",
Expand Down
9 changes: 9 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[pylint.MASTER]
ignore-paths=.*/[.].*,^[.].*,__pycache__

[pylint.MESSAGES CONTROL]
disable=W0311,C0301,C0413,W0406,C0103,C0415,E0102,E1101,C0206,C0303,C0209,W0611,R0913,R0914,W1514,C0302,W0511,R0902,E1121,R0915,R0912,E1136,W1202,W0102,W0201,E0101,C0200,W1401,W1201,R1710,W1203,E0401,R0903,E1102,R0401,R0904,C0325,R1735

[flake8]
ignore=F405,E225,E231,F403,E402,N801,W191,E501,E226,W293,E123,E301,E265,E302,E303,E227,E741,W291,E228,C901,E252,E128,E126,N806,N803,E124
exclude = __pycache__,.*,src/dictys/net/layout.py,src/dictys/scripts
17 changes: 6 additions & 11 deletions src/airqtl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,36 +3,31 @@
#
# This file is part of airqtl.

__all__ = ['air','association', 'cov', 'heritability', 'io', 'kinship', 'op', 'pipeline', 'sim', 'utils']
__all__ = ['air','association', 'cov', 'heritability', 'kinship', 'op', 'pipeline', 'sim', 'utils']

from . import *


def _main_func_parser(parser,funcs):
parser.add_argument('-v',
dest='verbose',
action='store_true',
help='Verbose mode.')
parser.add_argument('-v',dest='verbose',action='store_true',help='Verbose mode.')
return parser,funcs

def _main_func_args(args):
import logging
import sys
logging.basicConfig(
format=
'%(levelname)s:%(process)d:%(asctime)s:%(pathname)s:%(lineno)d:%(message)s',
level=logging.DEBUG if args.verbose else logging.WARNING)
logging.basicConfig(format='%(levelname)s:%(process)d:%(asctime)s:%(pathname)s:%(lineno)d:%(message)s',level=logging.DEBUG if args.verbose else logging.WARNING)
logging.info('Started: '+' '.join([f"'{x}'" for x in sys.argv]))
return args

def _main_func_ret(args,ret):
def _main_func_ret(_,ret):
import logging
import sys
logging.info('Completed: '+' '.join([f"'{x}'" for x in sys.argv]))
return ret

def main():
import docstring2argparse as d
d.docstringrunner('airqtl.pipeline',func_filter=lambda name,obj:name[0]!='_' and hasattr(obj,'_da') and obj._da==True,func_parser=_main_func_parser,func_args=_main_func_args,func_ret=_main_func_ret)
d.docstringrunner('airqtl.pipeline',func_filter=lambda name,obj:name[0]!='_' and hasattr(obj,'_da') and obj._da is True,func_parser=_main_func_parser,func_args=_main_func_args,func_ret=_main_func_ret)


assert __name__ != "__main__"
2 changes: 1 addition & 1 deletion src/airqtl/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@

if __name__ == "__main__":
import airqtl
airqtl.main()
airqtl.main()
23 changes: 9 additions & 14 deletions src/airqtl/air.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,9 +106,8 @@ def tofull(self,axis:Union[list[Optional[int]],int,None]=None)->Union[torch.Tens
if any([self.r[x] is not None and x not in axis for x in range(self.ndim)]):
#Output self class
return self.__class__(v,[self.r[x] if x not in axis else None for x in range(self.ndim)]).reduce()
else:
#Output tensor
return v
#Output tensor
return v
def tensor(self)->torch.Tensor:
ans=self.tofull()
assert isinstance(ans,torch.Tensor)
Expand Down Expand Up @@ -218,8 +217,8 @@ def __getitem__(self,key:Tuple[Iterable[int],int,slice])->Union['air',torch.Tens
r[xi]=None
#Reduce dimensions
if len(reduce_dims)>0:
assert all([v.shape[x]==1 for x in reduce_dims])
assert all([r[x] is None for x in reduce_dims])
assert all(v.shape[x]==1 for x in reduce_dims)
assert all(r[x] is None for x in reduce_dims)
v=v.squeeze(reduce_dims)
r=[r[x] for x in range(self.ndim) if x not in reduce_dims]
if v.ndim==0:
Expand Down Expand Up @@ -276,7 +275,6 @@ def toreduce(self,method:str,axis:dict[int,torch.Tensor])->'air':
axis=dict(zip(*axis))
v=self.v
r=list(self.r)
change=False
for xi in axis:
if r[xi] is None:
d=torch.zeros(v.shape[:xi]+(len(axis[xi]),)+v.shape[xi+1:],dtype=v.dtype,device=v.device,requires_grad=self.requires_grad)
Expand Down Expand Up @@ -430,24 +428,21 @@ def reduce(self,inplace:bool=False)->Union[air,torch.Tensor,'composite',None]:
self.vs,self.axis=ans
self._refresh()
return
else:
return self.__class__(*ans)
return self.__class__(*ans)
if isinstance(ans,(torch.Tensor,air)):
if inplace:
self.vs=[ans]
self.axis=0
self._refresh()
return
else:
return ans
return ans
if isinstance(ans,self.__class__):
if inplace:
self.vs=ans.vs
self.axis=ans.axis
self._refresh()
return
else:
return ans
return ans
raise TypeError(f'Unsupported type {type(ans)}.')
def _resolve_axis(self,axis:int)->int:
"""
Expand Down Expand Up @@ -580,8 +575,7 @@ def __matmul__(self,other:Union[air,torch.Tensor,'composite'])->Union[air,torch.
if self.axis!=self.ndim-1:
if self.axis==self.ndim-2:
return self.__class__([x@other for x in self.vs],self.axis).reduce()
else:
return self.__class__([self.vs[x]@other.swapaxes(self.axis,0)[self.sizesc[x]:self.sizesc[x+1]].swapaxes(self.axis,0) for x in range(len(self.vs))],self.axis).reduce()
return self.__class__([self.vs[x]@other.swapaxes(self.axis,0)[self.sizesc[x]:self.sizesc[x+1]].swapaxes(self.axis,0) for x in range(len(self.vs))],self.axis).reduce()
if isinstance(other,self.__class__) and other.axis==other.ndim-2 and len(self.sizes)==len(other.sizes) and (self.sizes==other.sizes).all():
ans=reduce(add,[x@y for x,y in zip(self.vs,other.vs)])
else:
Expand Down Expand Up @@ -616,4 +610,5 @@ def sum(self,axis:int=None)->Union[air,torch.Tensor,'composite',float,int]:
return reduce(add,[x.sum(axis=axis) for x in self.vs])
return self.__class__([x.sum(axis=axis) for x in self.vs],self.axis-(axis<self.axis)).reduce()


assert __name__ != "__main__"
5 changes: 2 additions & 3 deletions src/airqtl/association.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,7 @@ def multi(dx,dy,dc0,dc1,ncs,mkl,mku,l0,f0,f1,nxd,out,fmt,bsx:int=128,bsy:int=327
from scipy.stats import beta

from .air import air
from .op import (mmatmul, mmatmul1, mmatmul2, mmmatmul, mmsquare, msquare,
msquare1)
from .op import (mmatmul, mmatmul1, mmatmul2, mmmatmul, mmsquare, msquare,msquare1)
if mkl is None:
if mku is not None or l0 is not None:
raise TypeError('Set mkl, mku, l0 all to None for linear model or all to not None for linear mixed model.')
Expand All @@ -130,7 +129,6 @@ def multi(dx,dy,dc0,dc1,ncs,mkl,mku,l0,f0,f1,nxd,out,fmt,bsx:int=128,bsy:int=327
ns = ncs.sum()
if dc0 is None:
dc0=np.zeros([0,ns],dtype=float)
nc0 = dc0.shape[0]
nc1 = dc1.shape[0]
# Validity checks
if nd == 0:
Expand Down Expand Up @@ -526,4 +524,5 @@ def multi_gxc(dx,dy,dc0,dc1,ncs,mkl,mku,l0,nxd,out,fmt,dom:bool=False,**ka):
raise ValueError('nxd must be greater than zero and less than or equal to the number of intermediate covariates (dc0.shape[0]).')
return multi(dx,dy,dc0,dc1,ncs,mkl,mku,l0,partial(multi_gxc_f0,nxd,dom=dom),partial(multi_gxc_f1,nxd),nxd,out,fmt,**ka)


assert __name__ != "__main__"
1 change: 1 addition & 0 deletions src/airqtl/cov.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,4 +92,5 @@ def o2d(dc,missing={}):
dcnew=dcnew.loc[:,dcnew.nunique()!=1].copy()
return dcnew


assert __name__ != "__main__"
2 changes: 0 additions & 2 deletions src/airqtl/heritability.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ def nll2(vx, mkl, ns):
"""Analytical computation of negative log likelihood given the MLEs"""
import numpy as np
sx0, sx = vx[:2]
xb = vx[2:].ravel()

n = ns.sum()
beta = sx**2
Expand All @@ -32,7 +31,6 @@ def nll3_est(sx, mkl, dtr, dcr, dpt, dpc, dptc, ns):
"""MLE estimator of other variables, given sx"""
import numpy as np
from normalisr.association import inv_rank
nn = len(ns)

beta = sx**2
t0 = (1 - 1 / (1 + beta * mkl))
Expand Down
Loading

0 comments on commit 86c2e98

Please sign in to comment.