The Stroke Prediction Binary Classification project aims to develop a machine learning model that predicts the likelihood of a stroke occurring in patients based on various health and lifestyle factors. This project utilizes a binary classification approach, where the output indicates whether a patient is at risk of having a stroke (1) or not (0).
- Analyze and preprocess health-related data to prepare it for modeling.
- Handle missing values and encode categorical variables.
- Implement various machine learning algorithms to classify stroke risk.
- Compare the performance of different models using appropriate metrics.
- Evaluate and compare the performance of different models using appropriate metrics.
- Assess the accuracy, precision, recall, and F1-score of trained models.
The project uses a dataset that includes features such as:
- Age
- Hypertension
- Heart Disease
- Marital Status
- Work Type
- Residence Type
- Average Glucose Level
- Body Mass Index (BMI)
- Smoking Status
- Python
- Pandas: for data manipulation
- NumPy: for numerical operations
- Scikit-learn: for machine learning algorithms and evaluation metrics
- Matplotlib and Seaborn: for data visualization
To set up the project, follow these steps:
- Clone the repository:
- Navigate to the project directory:
cd Stroke-Prediction-Binary-Classification
- Load the dataset using the provided Jupyter Notebook.
- Explore and preprocess the data (handling missing values, encoding categorical variables).
- Split the dataset into training and testing sets.
- Train various classification models (e.g., Logistic Regression, Random Forest, Support Vector Machine).
- Evaluate model performance using metrics such as accuracy, precision, recall, and F1-score.
Contributions are welcome! Please feel free to submit a pull request or open an issue for any suggestions or improvements.
This project is licensed under the MIT License - see the LICENSE file for details.
Thanks to all contributors and resources that made this project possible.