Customer Churn Prediction
This project aims to predict customer churn using a machine learning approach. Customer churn refers to the likelihood of a customer leaving a service or subscription. By identifying potential churners, businesses can implement strategies to retain them and reduce revenue loss.
- Data Preprocessing: Handled missing values, scaled numerical features, and encoded categorical variables.
- Exploratory Data Analysis (EDA): Visualized data distributions and relationships.
- Modeling: Built a Random Forest Classifier to predict customer churn.
- Evaluation: Used metrics like ROC-AUC, confusion matrix, and classification report to assess the model's performance.
- Feature Importance: Identified the most influential features in predicting churn.
- Optimization: Performed hyperparameter tuning using GridSearchCV.
- Deployment: Developed a Streamlit application for user-friendly predictions.
The dataset used is Telco-Customer-Churn.csv
, which contains information about customers, their subscription details, and whether they churned or not.
Columns in the Dataset:
- gender, Partner, Dependents: Categorical demographic information.
- tenure: Number of months a customer has been with the company.
- MonthlyCharges, TotalCharges: Numerical subscription metrics.
- Churn: Target variable indicating if the customer churned.
Python Libraries:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- joblib
- imbalanced-learn
- streamlit
To install dependencies: pip install -r requirements.txt
-
Clone the Repository git clone (https://github.com/n-liyana/customer-churn-prediction)
-
Run the Jupyter Notebook or Python Script Open the Jupyter Notebook or run the Python script in your preferred IDE.
-
Launch the Streamlit App streamlit run app.py
Evaluation Metrics:
- ROC-AUC Score: Measures the model's ability to distinguish between classes.
- Confusion Matrix: Visualizes the number of correct and incorrect predictions.
- Feature Importance: Identifies key factors influencing customer churn.
Outputs:
- Churn Distribution: Visual representation of churn vs. non-churn customers.
- Correlation Matrix: Heatmap showing relationships between features.
- Confusion Matrix: Evaluation of prediction accuracy.
- ROC Curve: Visualization of the model's performance.
project/ ├── data/ │ └── Telco-Customer-Churn.csv ├── outputs/ │ ├── Churn Distribution.png │ ├── Correlation Matrix.png │ ├── Confusion Matrix.png │ └── ROC Curve.png ├── churn_model.pkl # Trained model ├── app.py # Streamlit app ├── requirements.txt # Required Python libraries ├── README.txt # Project description └── churn_analysis.py # Main script
This project is open source and free to use under the MIT License.