Skip to content

Predict customer churn using machine learning models with the Telco Customer Churn dataset. Includes EDA, feature engineering, and Random Forest classification.

Notifications You must be signed in to change notification settings

n-liyana/customer-churn-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Churn Prediction

Overview

This project aims to predict customer churn using a machine learning approach. Customer churn refers to the likelihood of a customer leaving a service or subscription. By identifying potential churners, businesses can implement strategies to retain them and reduce revenue loss.

Features

  • Data Preprocessing: Handled missing values, scaled numerical features, and encoded categorical variables.
  • Exploratory Data Analysis (EDA): Visualized data distributions and relationships.
  • Modeling: Built a Random Forest Classifier to predict customer churn.
  • Evaluation: Used metrics like ROC-AUC, confusion matrix, and classification report to assess the model's performance.
  • Feature Importance: Identified the most influential features in predicting churn.
  • Optimization: Performed hyperparameter tuning using GridSearchCV.
  • Deployment: Developed a Streamlit application for user-friendly predictions.

Dataset

The dataset used is Telco-Customer-Churn.csv, which contains information about customers, their subscription details, and whether they churned or not.

Columns in the Dataset:

  • gender, Partner, Dependents: Categorical demographic information.
  • tenure: Number of months a customer has been with the company.
  • MonthlyCharges, TotalCharges: Numerical subscription metrics.
  • Churn: Target variable indicating if the customer churned.

Requirements

Python Libraries:

  • pandas
  • numpy
  • scikit-learn
  • matplotlib
  • seaborn
  • joblib
  • imbalanced-learn
  • streamlit

To install dependencies: pip install -r requirements.txt

How to Run

  1. Clone the Repository git clone (https://github.com/n-liyana/customer-churn-prediction)

  2. Run the Jupyter Notebook or Python Script Open the Jupyter Notebook or run the Python script in your preferred IDE.

  3. Launch the Streamlit App streamlit run app.py

Results

Evaluation Metrics:

  • ROC-AUC Score: Measures the model's ability to distinguish between classes.
  • Confusion Matrix: Visualizes the number of correct and incorrect predictions.
  • Feature Importance: Identifies key factors influencing customer churn.

Outputs:

  • Churn Distribution: Visual representation of churn vs. non-churn customers.
  • Correlation Matrix: Heatmap showing relationships between features.
  • Confusion Matrix: Evaluation of prediction accuracy.
  • ROC Curve: Visualization of the model's performance.

Project Structure

project/ ├── data/ │ └── Telco-Customer-Churn.csv ├── outputs/ │ ├── Churn Distribution.png │ ├── Correlation Matrix.png │ ├── Confusion Matrix.png │ └── ROC Curve.png ├── churn_model.pkl # Trained model ├── app.py # Streamlit app ├── requirements.txt # Required Python libraries ├── README.txt # Project description └── churn_analysis.py # Main script

License

This project is open source and free to use under the MIT License.

About

Predict customer churn using machine learning models with the Telco Customer Churn dataset. Includes EDA, feature engineering, and Random Forest classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages