This repository contains the source code to reproduce the results in the paper: "A versatile information retrieval framework for evaluating profile strength and similarity".
This repository supports Python 3.9+ and should work with all modern operating systems (tested with MacOS 13.5, Ubuntu 18.04).
This code depends on widely used Python packages:
- numpy
- scipy
- pandas
- jupyter
- seaborn
- networkx
- umap-learn
- scikit-learn
It also uses pycytominer for profiling data preprocessing and copairs for profile grouping and mAP calculations.
We suggest using Conda for environment management. The following commands create the environment from scratch and install the required packages.
conda create -n map_eval "python>=3.9"
conda activate map_eval
pip install .
Preprocessing of Perturb-seq data requires creating a separate R environment:
conda env create -f perturbseq_processing_environment.yml
Results are organized per dataset in the experiments subdirectory.
Each experiment directory includes brief description of the dataset and scripts and/or Jupyter notebooks to download and preprocess data, calculate metrics, and generate figures for the paper.
- Simulations (Figures 2, S2-5)
- CellHealth data (Figures 3, S6)
- cpg0004 data (Figures S7A, S7C)
- cpg0016orf data (Figures S7B, S7D)
- nELISA data (Figures 4A-B)
- Perturb-seq data (Figures 4C-D, S5A-B, S8)
- Mitocheck data (Figure 5C-D, S9-10)
@article {Kalinin2024.04.01.587631,
author = {Kalinin, Alexandr A. and Arevalo, John and Vulliard, Loan and Serrano, Erik and Tsang, Hillary and Bornholdt, Michael and Rajwa, Bartek and Carpenter, Anne E. and Way, Gregory P. and Singh, Shantanu},
title = {A versatile information retrieval framework for evaluating profile strength and similarity},
elocation-id = {2024.04.01.587631},
year = {2024},
doi = {10.1101/2024.04.01.587631},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/04/02/2024.04.01.587631},
journal = {bioRxiv}
}