This repository contains a machine learning/data science pipeline for customer churn prediction. It includes Python-based models such as Logistic Regression with regularization, Random Forest, SVM, and XGBoost, all trained using demographic and transaction data. This project encompasses comprehensive exploratory data analysis, data preprocessing, model development, tuning, interpretation, and strategic recommendations derived from complex data insights, all documented through a report and presentation tailored for technical and non-technical stakeholders.
The core components of this project include:
-
Extensive Exploratory Data Analysis (EDA): A thorough examination of the dataset to uncover critical patterns and relationships.
-
Data Preprocessing and Splitting: Transforming raw data to suit machine learning algorithms, ensuring input data's robustness and effectiveness.
-
Model Development and Hyperparameter Tuning: Systematic building and optimization of various models to achieve the best possible performance.
-
Interpretation of Results: Detailed analysis of the model's predictions, focusing on global and local feature importance.
-
Strategic Recommendations: Drawing actionable insights from the data to inform strategic decision-making and improve customer retention.
Tuning Parameters:
Model Performance:
Performance Visualization:
Feature Importance for XGBoost using metirc 'Gain':