Goal:
To create machine learning models for customer churning prediction for a telco company.
Dataset:
From Kaggle - https://www.kaggle.com/datasets/blastchar/telco-customer-churn/data
Process:
- Clean the dataset with missing values and set them to 0.
- Explore the data with univariate and bivariate analysis.
- Transform the features with one-hot encoding and standardization.
- Split the data for training and testing with stratification to address the issue of imbalanced dataset.
- Compare different machine learning models.
- Logistic Regression
- Random Forest
- SVM
- Ada Boost
- XG Boost
- Explore XG Boost in depth with hyperparameter tuning.
- Evaluate the model with confusion matrix.
- Construct a single tree for further understanding of different features.