Stroke Prediction Project

Problem Statement

By the Numbers

Global: Strokes are a global epidemic. They are the second leading cause of death and have increased by 70% between 1990 to 2019, with death from strokes increasing by 43% source. The WHO estimates the annual cost of strokes to be over US$721 billion source.

United States: While strokes have been declining for decades in the US, they still have a large financial burden, amounting to ~$34-65 billion annually Source 1, Source 2, source 3. Currently, stroke is the 5th leading cause of death in the US source.

Machine Learning (ML)

Machine Learning Goal

Provide stroke risk prediction so that people may understand their risk rate in a meaningful manner. Predictions will be Low, Moderate, and High. These values were chosen because they would provide better context to the average person rather than a risk percentage. For example, a risk of 15% may not be clear if it is a need for concern or not.

Why Machine Learning?

ML is best suited for complex problems that are not answered by simple logic. In healthcare, disease epidemiology is often complex and our understanding changing. This makes diseases, such as stroke, prime candidates for ML.

Optimal Outcomes

Model Goal: To predict stroke patient probability.

Global: Predicting a stroke can provide an opportunity to take corrective actions before a stroke occurs. Most importantly, this results in fewer deaths and disabilities.

Additionally, the money lost to strokes would boost economies. Assuming cost and stroke occurrence are linear, if strokes were reduced by just 5%, that would inject $36 billion into world economies.

United States: Add $2.65 billion into the US economy.

Instructions

Environment

Pipenv
- python -m venv .venv
- Linux
  - source .venv/bin/activate
- Windows
  - .venv\Scripts\activate
Anaconda or Miniconda
- Using Anaconda or Miniconda is strongly advised.
- Anaconda installation instructions if not already installed.
- Miniconda installation instructions
- conda create -n stroke

Tip

In VS code, Ctrl+Shift+p pulls up option to select Python interpreter.

After activating environment

PDM .toml file is in the main directory.
1. Activate environment of choice.
2. pip install pdm
3. pdm install

notebooks

Run All for 1_data_cleaning.ipynb
Run All for 2_modeling.ipynb

Throughout the notebooks you will find links to further understanding or clarification of various concepts.

Containerization with BentoML

Build bento: bentoml build
Docker container: bentoml containerize [bento name:code given after 'bentoml build']

Make sure docker service is running if setting up locally. Then run this command line:

docker run -it --rm -p 3000:3000 [bento name:code given after 'bentoml build'] serve --production

All created Bentos are stored in /home/user/bentoml/bentos/ by default.

Cloud deployment

log into the AWS Console
Go to Elastic Container Registry. Select Create Registry
In the registry select View push commands
- On local Windows PC use GitBash and follow macOS/Linux commandsNOTE: Must have AWS CLI installed
- AWS console, log in via the prompts
- aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin [censored].dkr.ecr.us-east-1.amazonaws.com
- Skip this command, docker build done though bentoml -> docker build -t stroke_prediction .
- docker tag stroke_prediction:latest [censored].dkr.ecr.us-east-1.amazonaws.com/stroke_prediction:latest
- docker push [censored].dkr.ecr.us-east-1.amazonaws.com/stroke_prediction:latest
Move to Elastic Container Service, then Select Create new Task Definition
- Follow prompts, and be sure to select the image uploaded to the registry.
- Then select Create
Select Clusters on the left pane, then Select create cluster
- Follow the prompts to create a cluster
- Select the cluster
- Select Services and then Create
- Follow the prompts and select the created task
  - Set security group to allow public access to port 3000
  - Select Run Service

Production App Access

Instructions

To access the production site using port 3000.
Select POST then Try it out on the right of the POST section.

In the Request body enter patient information based on the template or use the example patient. Select Execute.

Template

All values must be filled in.
Strings must be within double quotes " "
Float values must be in format 0.0
Capitalization for values must be followed

{

"gender": "Male" or "Female",

"age": float,

"hypertension": 1 or 0,

"heart_disease": 1 or 0,

"ever_married": 1 or 0,

"work_type": "Private" or "Self-employed" or "children" or "Govt_job" or "Never_worked",

"residence_type": "Urban" or "Rural",

"avg_glucose_level": float,

"bmi": float,

"smoking_status": "smokes" or "never smoked" or "Unkown" or "formerly smoked",

"obese": 1 or 0,

"clearly_diabetes": 1 or 0

}

Example Patient

{

"gender": "Male",

"age": 69.0,

"hypertension": 0,

"heart_disease": 1,

"ever_married": 1,

"work_type": "self_employed",

"residence_type": "Urban",

"avg_glucose_level": 195.23,

"bmi": 28.3,

"smoking_status": "smokes",

"obese": 0,

"clearly_diabetes": 1

}

4.) Scroll down to Server response and see the response in the Response body. Possible Responses:

"Stroke Risk: HIGH"
"Stroke Risk: MODERATE"
"Stroke Risk: LOW"

Files

Data Documentation: DATA.md

Conda enviroment: requirements.txt - used to create conda envirment with conda create --name <env> --file requirements.txt.

Note: this is not the right format for pip.

Notebook: notebook.ipynb

Data: healthcare-dataset-stroke-data.csv

ML script: service.py

Note: Model is saved locally via bentoml

Dependency and enviroment management: bentofile.yaml

Standalone Model Creation Script: training.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.trunk		.trunk
catboost_info		catboost_info
src/stroke_prediction		src/stroke_prediction
tests		tests
.gitignore		.gitignore
.pdm-python		.pdm-python
1_data_cleaning.ipynb		1_data_cleaning.ipynb
2_modeling.ipynb		2_modeling.ipynb
DATA.md		DATA.md
README.md		README.md
bentofile.yaml		bentofile.yaml
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
service.py		service.py
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stroke Prediction Project

Problem Statement

By the Numbers

Machine Learning (ML)

Machine Learning Goal

Why Machine Learning?

Optimal Outcomes

Instructions

Environment

After activating environment

notebooks

Containerization with BentoML

Cloud deployment

Production App Access

Instructions

Template

Files

About

Releases

Packages

Languages

gregorywmorris/stroke-prediction

Folders and files

Latest commit

History

Repository files navigation

Stroke Prediction Project

Problem Statement

By the Numbers

Machine Learning (ML)

Machine Learning Goal

Why Machine Learning?

Optimal Outcomes

Instructions

Environment

After activating environment

notebooks

Containerization with BentoML

Cloud deployment

Production App Access

Instructions

Template

Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages