This project aims to predict whether a customer will subscribe to a term deposit based on a variety of customer features. Using the Bank Marketing dataset, which includes data from previous marketing campaigns, the goal is to build, compare, and evaluate multiple machine learning models to optimize predictive accuracy.
- Numpy and Pandas for efficient data manipulation and analysis.
- Seaborn and Plotly for data visualization to explore relationships between features and the target variable.
- Scikit-learn (sklearn) for model building and evaluation.
- StandardScaler for feature scaling and improving the performance of certain algorithms.
- Logistic Regression: A baseline model for classification.
- DecisionTreeClassifier: A non-parametric model that learns decision rules from data features.
- RandomForestClassifier: An ensemble model that improves accuracy through bagging and multiple decision trees.
- XGBClassifier (XGBoost): A powerful gradient boosting algorithm optimized for accuracy and speed.
- GaussianNB: A probabilistic model based on Bayes' theorem.
- K-Nearest Neighbors (KNN): A simple, instance-based learning method that classifies data points based on the majority class of their nearest neighbors.
The project involves comparison and tuning of these models to select the most efficient classifier. Various metrics like accuracy, precision, recall, and AUC-ROC are used to evaluate model performance.