GitHub - fanzhanglab/pyCellPhenoX: An eXplainable Cell-specific machine learning method to predict clinical Phenotypes using single-cell multi-omics

Getting Started...

Here, we introduce CellPhenoX, an eXplainable machine learning method to identify cell-specific phenotypes that influence clinical outcomes for single-cell data. CellPhenoX integrates robust classification models, explainable AI techniques, and a statistical covariate framework to generate interpretable, cell-specific scores that uncover cell populations associated with a clinical phenotype of interest.

Figure 1. CellPhenoX leverages cell neighborhood co-abundance embeddings, Xi , across samples and clinical variable Y as inputs. By applying an adapted SHAP framework for classification models, CellPhenoX generates Interpretable Scores that quantify the contribution of each feature Xi, along with covariates and interaction term Xi, to the prediction of a clinically relevant phenotype Y. The results are visualized at single-cell level, showcasing Interpretable Scores at low-dimensional space, correlated cell type annotations, and associated marker genes.

You can install pyCellPhenoX from PyPI:

pip install pyCellPhenoX

github (link):

# install pyCellPhenoX directly from github
git clone [email protected]:fanzhanglab/pyCellPhenoX.git

Dependencies/ Requirements

When using pyCellPhenoX please ensure you are using the following dependency versions or requirements

python = "^3.9"
pandas = "^2.2.3"
numpy = "^2.1.1"
xgboost = "^2.0"
numba = ">=0.54"
shap = "^0.46.0"
scikit-learn = "^1.5.2"
matplotlib = "^3.9.2"
statsmodels = "^0.14.3"

Tutorials

Please see the Command-line Reference for details. Additonally, please see Vignettes on the documentation page.

API

pyCellPhenoX has four major functions which are apart of the object:

split_data() - Split the data into training, testing, and validation sets
model_train_shap_values() - Train the model using nested cross validation strategy and generate shap values for each fold/CV repeat
get_shap_values() - Aggregate SHAP values for each sample
get_intepretable_score() - Calculate the interpretable score based on SHAP values.

Additional major functions associated with pyCellPhenoX are:

marker_discovery() - Identify markers correlated with the discriminatory power of the Interpretable Score.
nonNegativeMatrixFactorization() - Perform non Negative Matrix Factorization (NMF)
preprocessing() - Prepare the data to be in the correct format for CellPhenoX
principleComponentAnalysis() - Perform Principle Component Analysis (PCA)

Each function has uniqure arguments, see our documentation for more information

License

Distributed under the terms of the MIT license, pyCellPhenoX is free and open source software.

Code of Conduct

For more information please see Code of Conduct or Code of Conduct Documentation

Contributing

For more information please see Contributing or Contributing Documentation

Issues

If you encounter any problems, please file an issue along with a detailed description.

Citation

If you have used pyCellPhenoX in your project, please use the citation below:

Young, J., Inamo, J., Caterer, Z., Krishna, R., Zhang, F. CellPhenoX: An eXplainable Cell-specific machine learning method to predict clinical Phenotypes using single-cell multi-omics, bioRxiv 2025.01.24.634132; doi: https://doi.org/10.1101/2025.01.24.634132

Contact

Please contact [email protected] for further questions or protential collaborative opportunities!

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.github/workflows		.github/workflows
docs		docs
logo		logo
media		media
pyCellPhenoX		pyCellPhenoX
test		test
vignettes		vignettes
.DS_Store		.DS_Store
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
make.bat		make.bat
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started...

Dependencies/ Requirements

Tutorials

API

License

Code of Conduct

Contributing

Issues

Citation

Contact

About

Releases

Packages

Contributors 3

Languages

License

fanzhanglab/pyCellPhenoX

Folders and files

Latest commit

History

Repository files navigation

Getting Started...

Dependencies/ Requirements

Tutorials

API

License

Code of Conduct

Contributing

Issues

Citation

Contact

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages