In this model we have predicted the risk of heart disease based on realtime dataset that we have got from a medical university chennai.
DATASET
The dataset we used contians more than 150 patient records and their daily routine data. As a result it helps our model to train with realtime data.Our data set contains many features like age,Dietary habitat,Alcohol consumer or not etc..
Dataset: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Data Preprocessing
We have preprocessed our dataset using Sk-learn.
sk-learn
The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. In general, learning algorithms benefit from standardization of the data set.
Models Used:
- SVM
- Naive Bayes
- Logistic Regression
- Decision Tree
- Random Forest
- LightGBM
- XGboost
Model's Result Prediction Percentage:
SVM:
Support Vector Machine
~Training Set Prediction : 0.6694214876033058
~Testing Set Prediction : 0.5737704918032787
Naive Bayes:
Works based on Naive Bayes algorithm
~Training Set Prediction : 0.8677685950413223
~Testing Set Prediction : 0.7868852459016393
Logistic Regression:
Logistic regression estimates the probability of an disease based on a given dataset of independent variables
~Training Set Prediction : 0.8636363636363636
~Testing Set Prediction : 0.8032786885245902
Decision Tree:
A decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks.
~Training Set Prediction : 1.0
~Testing Set Prediction : 0.7704918032786885
Random Forest:
Works based on Naive Bayes algorithm
~Training Set Prediction : 1.0
~Testing Set Prediction : 0.7704918032786885
LightGBM:
Gradient Boosting Decision Tree (GBDT) algorithm with the addition of two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB).
~Training Set Prediction : 0.9958677685950413
~Testing Set Prediction :0.7704918032786885
XGBoost:
It builds a decision tree for a given boosting iteration, one level at a time, processing the entire dataset concurrently on the GPU.
~Training Set Prediction : 0.987603305785124
~Testing Set Prediction : 0.7540983606557377
CONCLUDED MODEL FROM PREDICTION:
From the prediction results **Logistic Regression** is more suitable for both training and testing data .