Skip to content

mayhixza/insurance-dataset-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Medical Cost Insurance Dataset Analysis

This repository contains code and documentation for an Exploratory Data Analysis (EDA) project conducted on Kaggle's Medical Cost Insurance dataset. The project aims to explore the relationships between various factors and medical insurance costs, utilizing data visualization techniques and implementing linear regression models. The dataset is also available here.

Project Overview:

Dataset: Kaggle's Medical Cost Insurance dataset
Objective: Explore factors influencing medical insurance costs and build predictive models.
Techniques Used: Exploratory Data Analysis, Data Visualization, Linear Regression
Tools Used: Python, Jupyter Notebook, Pandas, Matplotlib, Seaborn, Scikit-learn, SciPy

Variables in dataset

age: age of patient in years
sex: binary value, either 'male' or 'female'
bmi: body mass index of patient
children: number of children of a patient
smoker: binary value, 'yes' or 'no'
region: region where patient resides
charges: insurance charges for a patient

Key Findings:

  1. Identification of significant factors affecting medical insurance costs.
  2. Visualization of relationships between variables using various plots.
  3. Development of predictive models to estimate insurance costs.

Usage:

  1. Clone the repository.
  2. Install necessary dependencies (listed in requirements.txt).
  3. Explore the notebooks for detailed analysis and findings.