-
Notifications
You must be signed in to change notification settings - Fork 74
Tree Ensembles
MLTK implements popular tree ensemble methods:
- Random Forests
- Boosted Trees
- Additive Groves
MLTK supports the following boosting algorithms:
-
Least-squares Regression (
mltk.predictor.tree.ensemble.brt.LSBoostLearner
) -
Least-absolute-deviation Regression (
mltk.predictor.tree.ensemble.brt.LADBoostLearner
) -
Logit Boost (
mltk.predictor.tree.ensemble.brt.LogitBoostLearner
) - Boosted Decision Tables
The output for all learners under mltk.predictor.tree.ensemble.brt
will produce a BRT
object (boosted regression tree). The default tree learner for most boosted tree ensemble learners is mltk.predictor.tree.RegressionTreeLearner
. For LogitBoostLearner
, robust tree learner is required. The base learner is independent of the boosting framework and therefore it is flexible to configure different boosting algorithms.
When using validation set to select the best model, one can specify convergence criteria to terminate the algorithm early once it converges. See this page for details.
LSBoostLearner
only works on regression problems. The following code trains a BRT
object using standard method.
RegressionTreeLearner rtLearner = new RegressionTreeLearner();
rtLearner.setConstructionMode(Mode.NUM_LEAVES_LIMITED);
rtLearner.setMaxNumLeaves(30);
LSBoostLearner learner = new LSBoostLearner();
learner.setLearningRate(0.01);
learner.setMaxNumIters(1000);
learner.setTreeLearner(rtLearner);
BRT brt = learner.build(trainSet);
This code trains a BRT
object using a specific method.
RegressionTreeLearner rtLearner = new RegressionTreeLearner();
rtLearner.setConstructionMode(Mode.NUM_LEAVES_LIMITED);
rtLearner.setMaxNumLeaves(30);
LSBoostLearner learner = new LSBoostLearner();
learner.setLearningRate(0.01);
learner.setTreeLearner(rtLearner);
BRT brt = learner.buildRegressor(trainSet, 1000);
It is possible to use root mean squared error (RMSE) as the metric to select the best model on a validation set.
RegressionTreeLearner rtLearner = new RegressionTreeLearner();
rtLearner.setConstructionMode(Mode.NUM_LEAVES_LIMITED);
rtLearner.setMaxNumLeaves(30);
LSBoostLearner learner = new LSBoostLearner();
learner.setLearningRate(0.01);
learner.setMetric(new RMSE());
learner.setTreeLearner(rtLearner);
BRT brt = learner.buildBinaryClassifier(trainSet, validSet, 1000);
LADBoostLearner
only works on regression problems. The following code trains a BRT
object using standard method.
RegressionTreeLearner rtLearner = new RegressionTreeLearner();
rtLearner.setConstructionMode(Mode.NUM_LEAVES_LIMITED);
rtLearner.setMaxNumLeaves(30);
LADBoostLearner learner = new LADBoostLearner();
learner.setLearningRate(0.01);
learner.setMaxNumIters(1000);
learner.setTreeLearner(rtLearner);
BRT brt = learner.build(trainSet);
This code trains a BRT
object using a specific method.
RegressionTreeLearner rtLearner = new RegressionTreeLearner();
rtLearner.setConstructionMode(Mode.NUM_LEAVES_LIMITED);
rtLearner.setMaxNumLeaves(30);
LADBoostLearner learner = new LADBoostLearner();
learner.setLearningRate(0.01);
BRT brt = learner.buildRegressor(trainSet, 1000);
It is possible to use mean absolute error (MAE) as the metric to select the best model on a validation set.
RegressionTreeLearner rtLearner = new RegressionTreeLearner();
rtLearner.setConstructionMode(Mode.NUM_LEAVES_LIMITED);
rtLearner.setMaxNumLeaves(30);
LADBoostLearner learner = new LADBoostLearner();
learner.setLearningRate(0.01);
learner.setMetric(new MAE());
learner.setTreeLearner(rtLearner);
BRT brt = learner.buildBinaryClassifier(trainSet, validSet, 1000);
LogitBoostLearner
only works on classification problems. In LogitBoostLearner
, the tree learner must be robust tree learner. The following code trains a BRT
object using standard method.
RobustRegressionTreeLearner rtLearner = new RobustRegressionTreeLearner();
rtLearner.setConstructionMode(Mode.NUM_LEAVES_LIMITED);
rtLearner.setMaxNumLeaves(30);
LogitBoostLearner learner = new LogitBoostLearner();
learner.setLearningRate(0.01);
learner.setMaxNumIters(1000);
learner.setTreeLearner(rtLearner);
BRT brt = learner.build(trainSet);
This code trains a BRT
object using a specific method.
RobustRegressionTreeLearner rtLearner = new RobustRegressionTreeLearner();
rtLearner.setConstructionMode(Mode.NUM_LEAVES_LIMITED);
rtLearner.setMaxNumLeaves(30);
LogitBoostLearner learner = new LogitBoostLearner();
learner.setLearningRate(0.01);
learner.setTreeLearner(rtLearner);
BRT brt = learner.buildClassifier(trainSet, 1000);
For binary classification problems, it is possible to use AUC as the metric to select the best model on a validation set.
RobustRegressionTreeLearner rtLearner = new RobustRegressionTreeLearner();
rtLearner.setConstructionMode(Mode.NUM_LEAVES_LIMITED);
rtLearner.setMaxNumLeaves(30);
LogitBoostLearner learner = new LogitBoostLearner();
learner.setLearningRate(0.01);
learner.setMetric(new AUC());
learner.setTreeLearner(rtLearner);
BRT brt = learner.buildBinaryClassifier(trainSet, validSet, 1000);
Currently for multi-class LogitBoost we only support mis-classification rate as metric.
Boosted decision tables (BDTs) use decision tables as base learner. BDT is faster in scoring and sometimes more accurate than BRT. To train a BDT, use mltk.predictor.tree.DecisionTableLearner
as the tree learner in any of the boosting algorithms above that returns a BRT
object. The following code trains and constructs a BDT
.
RobusDecisionTableLearner dtLearner = new RobusDecisionTableLearner();
dtLearner.setConstructionMode(Mode.MULTI_PASS_RANDOM);
dtLearner.setNumPasses(2);
dtLearner.setMaxDepth(5);
LogitBoostLearner learner = new LogitBoostLearner();
learner.setLearningRate(0.01);
learner.setMetric(new AUC());
learner.setTreeLearner(dtLearner);
BRT brt = learner.buildClassifier(trainSet, 1000);
BDT bdt = BDT.constructBDT(brt);