This project focuses on building machine learning models to predict the income level of individuals based on various demographic and employment-related features.
The dataset used in this project contains information about individuals including their age, education, occupation, etc., as well as their income level (whether it's above or below $50K per year). The dataset is available in the file adult_income_data.csv
.
adult_income_data.csv
: The dataset containing information about individuals and their income levels.Adult_Income_Prediction.ipynb
: Jupyter Notebook containing the code for data analysis, exploratory data analysis (EDA), outlier analysis, visualization, and building machine learning models.
- Exploratory data analysis (EDA) is performed to understand the structure and characteristics of the dataset.
- Outlier analysis is conducted to identify and handle outliers in the data.
- It is very comprehensive project for data analysis and data manipulation methods, like encoding, data discretization, classification etc.
Encoding and other required data manipulation methods are also used.
- Machine learning models are built using various classification algorithms such as logistic regression, decision trees, random forests, etc.
- Hyperparameter tuning is performed to optimize the performance of the models.
- The performance of the machine learning models is evaluated using appropriate classification metrics such as accuracy, precision, recall, F1-score, etc.
- The predictions of the models are compared with actual income levels to assess their effectiveness.