`ECLARE`: multi-teacher contrastive learning via ensemble distillation for diagonal integration of single-cell multi-omic data

This repository is dedicated to Ensemble knowledge distillation for Contrastive Learning of ATAC and RNA Embeddings, a.k.a. ECLARE ⚡🍰.

The manuscript is currently available on bioRxiv.

Installation

First, clone the repository:

git clone https://github.com/li-lab-mcgill/ECLARE.git
cd ECLARE

Create a virtual environment (use Python 3.9.6 for best reproducibility):
```
python -m venv eclare_env
```

Activate the virtual environment

Windows

eclare_env\Scripts\activate

macOS and Linux

source eclare_env/bin/activate

Git Bash on Windows

source eclare_env/Scripts/activate

Install the package: For standard installation:
```
pip install .
```
For editable installation (recommended for development):
```
pip install -e .
```

Configuration

Before running the application, you need to set up your configuration file. Follow these steps:

Copy the template configuration file:

cp config/config_template.yaml config/config.yaml

Edit config.yaml to suit your environment. Update paths and settings as necessary:

active_environment: "local_directories"

local_directories:
  outpath: "/your/custom/output/path"
  datapath: "/your/custom/data/path"

Requirements

Python ≥ 3.9 (3.9.6 for best reproducibility)
See setup.py for a complete list of dependencies

Overview of ECLARE framework

ECLARE (Ensemble knowledge distillation for Contrastive Learning of ATAC and RNA Embeddings) is a framework designed to integrate single-cell multi-omic data, specifically scRNA-seq and scATAC-seq data, through these key components:

Multi-Teacher Knowledge Distillation:
- Multiple teacher models are trained on paired datasets (where RNA and ATAC data are available for the same cells)
- These teachers then guide a student model that works with unpaired data
- This approach helps transfer knowledge from well-understood paired samples to situations where only unpaired data is available
Contrastive Learning:
- Uses a refined contrastive learning objective to learn representations of both RNA and ATAC data
- Helps align features across different modalities (RNA and ATAC)
- Enables the model to understand relationships between different data types
Transport-based Loss:
- Implements a transport-based loss function for precise alignment between RNA and ATAC modalities
- Helps ensure that the learned representations are biologically meaningful

The framework is particularly valuable because it:

Addresses the common problem of limited paired multi-omic data
Enables integration of unpaired data through knowledge transfer
Preserves biological structure in the integrated data
Facilitates downstream analyses like gene regulatory network inference

Figure 1 from manuscript: Overview of ECLARE

Demo: analysis on sample paired datasets

We provide a demo notebook to analyze the sample paired datasets. This notebook is located in sample_paired_datasets_analysis.ipynb.

This analysis is based on using DLPFC_Anderson and DLPFC_Ma as source datasets and PFC_Zhu as target dataset. See Table 1 in the manuscript for more details about datasets.

Sample data is available from Zenodo at https://doi.org/10.5281/zenodo.14794845. Instructions for downloading the data are available in the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
build/lib/eclare		build/lib/eclare
config		config
lib/yq		lib/yq
neural-additive-models-pt		neural-additive-models-pt
scripts		scripts
src		src
.gitignore		.gitignore
.pybiomart.sqlite		.pybiomart.sqlite
README.md		README.md
__init__.py		__init__.py
fig1_landscape_no_alpha.png		fig1_landscape_no_alpha.png
sample_analysis.ipynb		sample_analysis.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`ECLARE`: multi-teacher contrastive learning via ensemble distillation for diagonal integration of single-cell multi-omic data

About

Releases

Packages

Languages

li-lab-mcgill/ECLARE

Folders and files

Latest commit

History

Repository files navigation

ECLARE: multi-teacher contrastive learning via ensemble distillation for diagonal integration of single-cell multi-omic data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`ECLARE`: multi-teacher contrastive learning via ensemble distillation for diagonal integration of single-cell multi-omic data

Packages