xgboost loss functions

Background

xgboost is an extremely fast R package for learning nonlinear machine learning models using gradient boosting algorithms. It supports various different kinds of outputs via the objective argument. However it is missing

For (left, right, and interval) censored outputs, AFT (https://en.wikipedia.org/wiki/Accelerated_failure_time_model) losses (Gaussian, Logistic).
For count data with an upper bound, Binomial loss = negative log likelihood of the https://en.wikipedia.org/wiki/Binomial_distribution

Related work

Other R packages such as gbm implement the Cox loss for boosting with censored output regression. However gbm supports neither AFT nor binomial losses.

Other R packages such as glmnet implement the binomial loss for regularized linear models. However it is a linear model so may not be as accurate as boosting for some applications/data sets.

Details of your coding project

Figure out a method for passing these outputs to xgboost. In both cases (binomial/censored) the outputs can be represented as a 2-column matrix. Typically in R the

censored outputs would be specified via Surv(lower.limit, upper.limit, type=”interval2”)
binomial/count outputs would be specified as in glmnet, “two-column matrix of counts or proportions (the second column is treated as the target class.” The loss function in this case is the negative log likelihood of the binomial distribution. It can actually be used (inefficiently) in the current version of xgboost, by duplicating the feature matrix, then using the logistic loss with non-uniform weights. y = [n-vector of ones, n-vector of zeros], w = [n-vector of success counts, n-vector of failure counts]. However this is relatively inefficient because a data set of size 2n must be constructed. In the proposed GSOC project we should implement the loss which works on the original data set of size n.

In xgboost, implement the binomial loss for count outputs, and the Gaussian/Logistic AFT losses for censored outputs.

Docs

Tests

Expected impact

This project will provide support for two common but currently un-implemented loss functions / output data types in the xgboost package.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

Toby Hocking <[email protected]> is a machine learning researcher and R package developer.
Hyunsu Cho <[email protected]> is an expert in XGBoost internals and the core C++ stack.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

MENTORS: write several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You’ll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project! Please modify the suggestions below to make them specific for your project.

Easy: create an Rmd file with source code that performs 5-fold cross-validation to evaluate the predictive accuracy of xgboost for a non-default objective function e.g. count:poisson. Make a figure that shows the test error for xgboost and an un-informed baseline that ignores all input features (i.e all predictions should be equal to the mean of the labels in the train data).
Easy: Compile XGBoost R package from the latest source using CMake:

git clone --recursive https://github.com/dmlc/xgboost
mkdir build
cd build
cmake .. -DR_LIB=ON  # R_LIB option enables R package
make -j4
make install    # this command installs XGBoost R package

Easy: Write a customized objective function in XGBoost-R. Use the following function, which penalizes over-estimation twice as under-estimation:
```
my_loss(y, yhat) = max(yhat - y, 0.5 * (y - yhat))^2
```
The first and second partial derivatives of my_loss with respect to the second argument are
```
grad(y, yhat) = ifelse(yhat > y, 0.5 * (yhat - y), -2 * (y - yhat))
hess(y, yhat) = ifelse(yhat > y, 0.5, 2)
```
See the example of customized objective at https://github.com/dmlc/xgboost/blob/master/R-package/demo/custom_objective.R
Medium: write a vignette in LaTeX or MathJax explaining how to use the logistic loss with non-uniform weights to get the binomial loss function in xgboost.
Medium: Add diagnostic logging in the C++ codebase. Diagnostic logging is helpful when understanding what the C++ code does and also debugging it. For this test, add an std::cout statement to print the prediction and true label, per data point, inside the gradient computation logic for the Cox regression. Hint: Look into src/objective directory. Tip: make sure to re-compile before running your R script.
Medium: Derive the formula for first- and second-order partial derivatives of the loss function for binary classification. The probability for obtaining the i-th label (y_i) given the i-th training data point (x_i) is as follows:
```
P(y_i | x_i) = ifelse(y_i = 1, sigmoid(yhat_i), 1 - sigmoid(yhat_i))
             = sigmoid(yhat_i)^(y_i) * (1 - sigmoid(yhat_i))^(1 - y_i)
```
where
- yhat_i is a prediction score (range between -inf to inf) for x_i produced by our model
- the label y_i is either 0 or 1
- sigmoid(*) is the sigmoid function.
Note that the sigmoid function converts any real number into a probability value between 0 and 1.
- Q1. Explain why the first expression is equivalent to the second expression.
Using the principle of Maximum Likelihood Estimation, we will choose the best yhat_i so as to maximize the value of P(y_i | x_i), i.e. choose yhat_i to make the training data most probable. The “distance” between the prediction yhat_i and the true label y_i, is given as the negative logarithm of P(y_i | x_i):
```
loss(y_i, yhat_i) = -log(P(y_i | x_i))
                  = -log(sigmoid(yhat_i)^(y_i) * (1 - sigmoid(yhat_i))^(1 - y_i))
```
- Q2. Explain how minimizing the loss function loss(y_i, yhat_i) is equivalent to maximizing the probability P(y_i | x_i).
- Q3. Simplify the expression for loss(y_i, yhat_i). Show your steps (i.e. don’t just write the answer, show how you got it).
- Q4. Now compute the first and second partial derivatives of loss(y_i, yhat_i) with respect to the second variable yhat_i. Then express the two derivatives in terms of sigmoid(yhat_i). Notice how simple the expressions become. Again, show your steps (i.e. don’t just write the answer, show how you got it).
- Q5. In the source code src/objective/regression_loss.h, locate the structure that implements this loss function.

Hard: Create your own loss function in C++. Create a new file my_obj.cc inside src/objective directory with the following content:

#include <dmlc/omp.h>
#include <xgboost/logging.h>
#include <xgboost/objective.h>
#include <vector>
#include <algorithm>
#include <utility>

namespace xgboost {
namespace obj {

DMLC_REGISTRY_FILE_TAG(my_obj);

class MyLossObj : public ObjFunction {
public:
  void Configure(const std::vector<std::pair<std::string, std::string> >& args) override {}

  void GetGradient(const HostDeviceVector<bst_float>& preds,
                  const MetaInfo& info,
                  int iter,
                  HostDeviceVector<GradientPair>* out_gpair) override {
    /* Boilerplate */
    CHECK_EQ(preds.Size(), info.labels_.Size());
    const auto& yhat = preds.HostVector();
    const auto& y = info.labels_.HostVector();
    out_gpair->Resize(y.size());
    std::vector<GradientPair>& gpair = out_gpair->HostVector();

    // Implementation for your loss function goes here
    // TODO: Read from yhat (predicted labels) and y (true labels) and
    // assign first/second-order gradients to gpair, as follows:
    //
    //   gpair[i] = GradientPair( [first-order grad], [second-order grad] )

    // ...
  }
  const char* DefaultEvalMetric() const override {
    return "rmse";
  }
};

// register the objective functions
XGBOOST_REGISTER_OBJECTIVE(MyLossObj, "my:loss")
.describe("My very first loss function")
.set_body([]() { return new MyLossObj(); });

}  // namespace obj
}  // namespace xgboost

Now implement the my_loss function in C++. Recall that my_loss was defined to be

my_loss(y, yhat) = max(yhat - y, 0.5 * (y - yhat))^2

Hint. To test, set the objective parameter to my:loss.

Solutions of tests

Students, please post a link to your test results here.

Name : Aditya Samantaray
Email : [email protected], [email protected]

University : International Institute of Information Technology, Bhubaneswar

Course : Computer Engineering

Solution to Easy Test 1 : EasyTest1 RmdFile

Solution to Easy Test 2 : EasyTest2

Solution to Easy Test 3 : EasyTest3 RmdFile

Solution to Medium Test 1 : MediumTest1 Tex

Solution to Medium Test 2 : MediumTest2

Solution to Medium Test 3 : MediumTest3 Tex

Solution to Hard Test : HardTest

Name : Divyansh Chawla

Email : [email protected], [email protected]

University : Indian Institute of Technology, Kanpur

Course : Integrated B.S. & M.S. in Economics (Majorly Econometrics)

Solution to Easy Test 1: Solution , rmd File

Solution to Easy Test 2: Solution

Solution to Easy Test 3: Solution , rmd File

Solution to Medium Test 1: Solution , TeX File

Solution to Medium Test 3: Solution , TeX File

Solution to Hard Test: Solution Files

Name: Avinash Barnwal

Email : [email protected], [email protected]

University: Stony Brook Univerisity

Department: Applied Mathematics and Statistics

Course : Ph.D. - Statistics

Solution to Easy Test 1 : EasyTest1 RmdFile

Solution to Easy Test 2 : EasyTest2

Solution to Easy Test 3 : EasyTest3 RmdFile

Solution to Medium Test 1 : MediumTest1 Tex

Solution to Medium Test 2 : MediumTest2 RmdFile Change

Solution to Medium Test 3 : MediumTest3 Tex

Solution to Hard Test : HardTest RmdFile Change

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xgboost loss functions

Background

Related work

Details of your coding project

Expected impact

Mentors

Tests

Solutions of tests

Clone this wiki locally