Stroke: Prediction of risk analysis based on machine learning, exploratory data analysis (EDA) and statistical analysis.

Overview

This repository contains a comprehensive analysis of stroke prediction using the Stroke Prediction Dataset. The analysis includes exploratory data analysis (EDA), statistical testing, and machine learning predictions. The work is implemented in R, with RMarkdown for documentation and Flexdashboard for interactive visualization and insights. View it here

A simple visualization on how it looks:

Key Features

Data Preprocessing:
- Handling missing values (NA) through imputation
- Renaming columns for clarity, removing redundant columns, and organizing data into categorical and numerical variables
- Converting categorical variables to factors for analysis
Exploratory Data Analysis (EDA):
- Visualizations and statistical summaries to understand variable distributions and relationships
Statistical Testing:
- Evaluation of associations between variables using appropriate statistical tests
Machine Learning Models implementation and assessment of:
- Random Forest
- Logistic Regression
- Gradient Boosting
Interactive Dashboard:
- A Flexdashboard for interactive data exploration and visualization. View Dashboard Here.

TESIS.Rmd
Main RMarkdown file with the full analysis, including data preprocessing, EDA, statistical tests, and model evaluations.
DashboardTFM.Rmd
RMarkdown file that generates the Flexdashboard, providing an interactive interface for data exploration.
stroke.csv
You can download it from Kaggle.

Setup and Usage

To run the analysis or explore the dashboard locally:

Clone this repository:

bash git clone https://github.com/santi-souza/stroke-eda-ml.git

Open the files:

Open TESIS.Rmd or DashboardTFM.Rmd in RStudio to view the analysis or render the Flexdashboard.

Analysis Highlights

Data Preprocessing:

Addressed missing values with imputation, clarified column names, removed unnecessary columns, and separated data into categorical (converted to factors) and numerical variables.

Exploratory Data Analysis (EDA):

Generated visualizations and statistical summaries to identify trends and patterns in the data.

Statistical Testing:

Performed hypothesis testing to assess relationships between various features and stroke occurrence.

Machine Learning:

Built and evaluated predictive models to identify key stroke risk factors.

Interactive Dashboard:

Developed an interactive Flexdashboard to enable users to explore the data and model results.

Contributing

Contributions are welcome! If you’d like to suggest improvements or find any issues, feel free to open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
img		img
.gitattributes		.gitattributes
DashboardTFM.Rmd		DashboardTFM.Rmd
README.md		README.md
TESIS.Rmd		TESIS.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stroke: Prediction of risk analysis based on machine learning, exploratory data analysis (EDA) and statistical analysis.

Overview

Key Features

Contents

Setup and Usage

Analysis Highlights

Contributing

About

Releases

Packages

Languages

santi-souza/stroke-eda-ml

Folders and files

Latest commit

History

Repository files navigation

Stroke: Prediction of risk analysis based on machine learning, exploratory data analysis (EDA) and statistical analysis.

Overview

Key Features

Contents

Setup and Usage

Analysis Highlights

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages