This repository contains code and documentation for an Exploratory Data Analysis (EDA) project conducted on Kaggle's Medical Cost Insurance dataset. The project aims to explore the relationships between various factors and medical insurance costs, utilizing data visualization techniques and implementing linear regression models. The dataset is also available here.
Dataset: Kaggle's Medical Cost Insurance dataset
Objective: Explore factors influencing medical insurance costs and build predictive models.
Techniques Used: Exploratory Data Analysis, Data Visualization, Linear Regression
Tools Used: Python, Jupyter Notebook, Pandas, Matplotlib, Seaborn, Scikit-learn, SciPy
age: age of patient in years
sex: binary value, either 'male' or 'female'
bmi: body mass index of patient
children: number of children of a patient
smoker: binary value, 'yes' or 'no'
region: region where patient resides
charges: insurance charges for a patient
- Identification of significant factors affecting medical insurance costs.
- Visualization of relationships between variables using various plots.
- Development of predictive models to estimate insurance costs.
- Clone the repository.
- Install necessary dependencies (listed in requirements.txt).
- Explore the notebooks for detailed analysis and findings.