Skip to content

Latest commit

 

History

History
75 lines (52 loc) · 3.87 KB

README.md

File metadata and controls

75 lines (52 loc) · 3.87 KB

scCaT

An explainable capsulating architecture for sepsis diagnosis transferring from single-cell RNA sequencing Image text

Prerequisite

  • python 3.7.12
  • Tensorflow 2.4.1
  • cudatoolkit 11.8
  • R 4.3 (Only the drawing of AUPRC required)
  • R package (precrec 0.14.4, reticulate 1.34.0)

Getting started

1.Use Anaconda to create a Python virtual environment. Here, we will create a Python 3.7 environment named scCaT:

conda create -n scCaT python=3.7.12

2.You can check whether the virtual environment was successfully created with the following command:

conda env list

3.Activate your virtual environment:

conda activate scCaT

4.Install tensorflow-gpu==2.4.1:

python -m pip install tensorflow-gpu==2.4.1

5.Add the current environment to the Jupyter Notebook kernel. Note that you should be in the "base" environment when running the following command:

python -m ipykernel install --user --name=scCaT --display-name scCaT

6.Use jupyter notebook and run './code/trainCaT.ipynb' to review the code. Other ipynb files can be found by name to understand the corresponding experiments

Data PreProcessing

The data preprocessing procedures were under 'data preprocessing' folder.

./data/dataPreprocessing/readdata.py contains related codes for data preprocessing stage.
./data/dataPreprocessing/changefdrlimit.py changes the Fdr Threshold during experiment.
./data/dataPreprocessing/deleteSparseData.py changes the Sparse Threshold during experiment.
./data/dataPreprocessing/read_specgenes.py selects specific genes for comparison with other biomarkers.

Building scCaT

scCaT was built based on capsule network and transformer. It was trained on single-cell RNA-seq data and then transferred to bulk RNA data for clinical practice. The details of building and training scCaT can be found below.

./code/IntersectSC&Bulk.ipynb extracts the common genes included in the scRNA-seq data and bulk RNA data.
./code/trainCaT.ipynb builds and trains the model on scRNA-seq.
./code/CompBiomarker_onSC.ipynb compares scCaT to the existing biomarkers and traditional machine learning models.
./code/TransferToBulk.ipynb transferrs scCaT to bulk RNA data and evaluates its performance on validation cohorts.
./code/Rotation_testing.ipynb performs transferring on one cohorts and testing on the other cohorts.
./code/visualization.ipynb includes the visualization of the primary capsules, the capsule outputs, and each of the capsule dimensions.

Data and Results contained in the folders

As the size of models and figures are large, we did not upload in this repo. If you are interested in our method, please download from google drive (link below). All the data can be access from the accession number stated in our paper.

./biomarkers contains the predicted results of the existing biomarkers and traditional machine learning models on scRNA-seq data.
./data/dataBulk contains the sepsis cohorts of microarray and bulk RNA-seq used in this study for the evaluation. The data can be downloaded from Gene Expression Omnibus (GEO) database.
./data/dataSC contains the raw data and the processed data of the single-cell RNA-seq data of sepsis. The preprocess data can be access through: https://drive.google.com/drive/folders/1mFuhzhLleHsGR6kBk4JZkgh9TGIB2QaJ?usp=drive_link.
modelsave contains the model called trained on single-cell RNA-seq data and the model fine-tuned on bulk RNA data.scModel:https://drive.google.com/drive/folders/1QKS6s3lBgbheFMtpiI3wpF3LQnEOxvP-?usp=sharing. TransferModel: https://drive.google.com/drive/folders/1WVb_qvGlZf3r9Ug2H6EGmobayjacGTgr?usp=drive_link.

This framework can be generalized to rare disease diagnosis and phenotype detection that has only a few samples available. The details of the study can be found in our paper.