Skip to content

nandapg0204/Ensemble-methods

Repository files navigation

To get started with ensemble methods, clone the repository and follow the examples provided in the examples directory. Ensure you have the necessary dependencies installed, which can be done using pip install -r requirements.txt.

Ensemble methods

Ensemble refers to group of models working together. Ensemble methods are the techniques to build a hybrid model by combining multiple models together.The intuition behind this approach is capitalize on individual models strengths while mitigating their weaknesses.Each model captures slightly different aspect of data making an ensemble more robust. Ensemble methods often perform better compared to indivisual models. The results from all the models in the ensemble are aggregated to form the final result. In classification tasks class with highest voting is predicted as final result where as in regression tasks average of all the results is predicted as final result.

Note : Models that are used to build a ensemble (strong classifier) are referred as base model or weak classifier.

Types of Ensemble methods are

     1. Bagging

     2. Boosting

     3. Stacking

     4. Voting

1. Bagging : Also referred as bootstrap aggregating. In this ensemble approach, multiple instances of same base model are trained on different subsets of training data. This method aims to capture various patterns from the data by creating diverse training sets. Each subset is randomly selected with replacement, a process known as bootstrap sampling, and each subset has the same number of samples as the original dataset. These models are trained independently on their respective subsets. A well-known example of this technique is Random Forest, which combines multiple decision trees to improve overall performance

2. Boosting : In this ensemble approach, models are trained in sequence where each subsequent model focuses on errors made by previous models. The intuition behind this approach is to learn from the mistakes.
Popular boosting methods are
1. AdaBoosting : Short form of Adaptive Boosting. In this approach, classifiers are trained sequentially, giving more weight to data points that are misclassified in previous rounds.
Training process :
Step 1. At the start of training, every sample is given the same importance. If there are N samples, each one is assigned a weight of 1 divided by N.The training data looks like the following
Features Target Weights
feature A target X 1 / N
feature B target Y 1 / N
feature C target X 1 / N
feature D target Z 1 / N
Step 2. A weak classifier is trained on weighted training data. After training, we calculate the error by evaluating the same weak classifier on same training data. We use this error to increase the weights of the misclassified examples and decrease the weights of the correctly classified ones.Let the α be the error, updated training data looks like the following.
Features Target Old Weights Correctly predicted New Weights
feature A target X 1 / N Yes 1 / N × exp(−α)
feature B target Y 1 / N No 1 / N × exp(α)
feature C target X 1 / N No 1 / N × exp(α)
feature D target Z 1 / N Yes 1 / N × exp(-α)
Step 3. The second weak classifier is trained the weight adjusted data. Then evaluated and weights are adjusted accordingy. This process is repeatly until desired number of weak classifiers are trained.
-→ AdaBoost improves accuracy by focusing on mistakes. The loss function is designed to pay more attention for the samples with higher weights.
-→ The Sklearn version of AdaBoost has two versions boosting algorithms , SAMME(original version) and SAMME.R(updated version). SAMME.R tends to perform better compared to SAMME
1. Gradient Boosting Machines(GBM) : In GBM, models are trained in sequence, with each subsequent model focuses on residuals to improve ensemble accuracy.
Training process :
Step 1. Initally a weak classifier is trained on entire training data and then evaluated to calcualte residuals
For Regression (House Price Prediction):
Median Income Number of Bedrooms Target Price Predicted Target Price Residual
$60,000 3 $350,000 $340,000 $10,000
$75,000 4 $450,000 $460,000 -$10,000
$50,000 2 $250,000 $255,000 -$5,000
$80,000 5 $500,000 $490,000 $10,000
Classification (Binary classification with 0.5 threshold ):
Feature 1 Feature 2 True Label Predicted probability Residual
321 23 1 0.75 0.25
512 21 0 0.65 -0.65
599 312 1 0.85 0.15
621 311 0 0.70 -0.70
Step 2. The next model is trained to predict these residuals. The training data for sub sequent model looks like the following.
For Regression (House Price Prediction):
Median Income Number of Bedrooms Residual
$60,000 3 $10,000
$75,000 4 -$10,000
$50,000 2 -$5,000
$80,000 5 $10,000
Classification (Binary classification with 0.5 threshold ):
Feature 1 Feature 2 Residual
321 23 0.25
512 21 -0.65
599 312 0.15
621 311 -0.70
Step 3. After training the second model, calculate the residuals for the entire ensemble (both the first and second weak classifiers). Use these residuals to train the next model. Repeat this process until you have the desired number of weak classifiers.
Advanced Gradient Boosting Techniques :
1. XGBoost (Extreme Gradient Boosting): This is an optimized version of GBM with additional features for better performance and efficiency
Features :
Regularization: Adds L1 and L2 regularization to reduce overfitting.
Tree Pruning: Uses depth-first search for tree growth and pruning.
Missing Values Handling: Supports both CPU and GPU acceleration.
Parallel Processing: Adds L1 and L2 regularization to reduce overfitting.
Histogram-Based Splitting: Uses bins to split data, improving efficiency.
2. LightGBM (Light Gradient Boosting Machine): Designed for faster training and lower memory usage with large datasets.
Features :
Histogram-Based Algorithms: Efficiently handles large datasets by using histogram-based approaches.
Speed and Efficiency: Optimized for faster training and prediction.
3. CatBoost : Handles categorical features automatically and is robust to various data distributions.
Features :
Categorical Feature Handling: Directly processes categorical features.
Regularization: Uses advanced regularization techniques to reduce overfitting.
4. Lightning Boost : An extension of XGBoost, focusing on faster training and scalability.
Features :
Improved Efficiency: Better optimization for large datasets compared to XGBoost.
Accelerated Training: Supports both CPU and GPU acceleration.

3. Stacking ensemble : Stacking is an ensemble learning method where multiple models are trained together in layers. Several base models are trained, and they can be different types. Then, a meta-model is trained using the predictions from these base models. The idea is that the meta-model learns which base models to trust in different situations. But stacking is less popularly used compared to other ensemble methods.

Flower

4. Voting ensemble : Voting is a straightforward ensemble learning technique that combines the predictions of multiple models to make a final decision.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published