Applies to the UKB datasetes, UKB dementia, AD and PD classification and SHAP
This is the codebase for IDEARs - Integrated Disease Explanation and Associations Risk Scoring. Its overall architecture is shown below.
To ease the configuation, please install Anaconda and set this up in a virtual environment.
- Install Anaconda:
https://www.anaconda.com/products/individual
- Create the environment:
conda env create -f .\conda-env.yml
- Acticate the environment:
conda activate conda-env
Then on Windows, run startlocal_woDocker.bat
and on Linux, run startlocal_woDocker.sh
- data_gen.py is used to perform ETL on the data and to create the model datasets
- data_proc.py is used for extra data processing including the creation of normalised datasets
- ml.py is used to run the models including logistic regression, XGBoost and for model interpretability using SHAP
- analysis.py is used to create charts, perform extra statistical tests including paired t tests
The jupyter notebooks used for AD are:
- AD_ml_part_1.ipynb
- Master_ml.ipynb
Import modules etc.
This folder shows the implementation of the IDEARs platform.
Michael Allwright - [email protected]