forked from dmlc/xgboost
-
Notifications
You must be signed in to change notification settings - Fork 0
Parameters
kalenxixi edited this page May 28, 2014
·
48 revisions
Before running XGboost, we must set three types of parameters, general parameters, booster parameters and task parameters:
- General parameters relates to which booster we are using to do boosting, commonly tree or linear model
- Booster parameters depends on which booster you have chosen
- Task parameters depends on the learning scenario, for example, regression tasks may use different parameters with ranking tasks.
- booster_type [default=0]
- which booster to use. 0 means using tree boosters, 1 means using linear boosters. The details about different boosters are described here.
- silent [default=1]
- 1 means printing running messages, 0 means silent mode.
- nthread
- number of parallel threads used to run xgboost
- num_pbuffer [set automatically by xgboost, no need to be set by user]
- size of prediction buffer, normally set to number of training instances. The buffers are used to save the prediction results of last boosting step.
- bst:num_feature [set automatically by xgboost, no need to be set by user]
- feature dimension used in boosting, set to maximum dimension of the feature
- bst:eta [default=0.3]
- step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features. and bst:eta actually shrinkage the feature weights to make the boosting process more conservative.
- bst:gamma
- minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be.
- bst:max_depth [default=6]
- maximum depth of a tree
- bst:min_child_weight [default=1]
- minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than bst:min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be.
- bst:subsample [default=1]
- subsample ratio of the training instance. Setting it to 0.5 means that XGBoost randomly collected half of the data instances to grow trees and this will prevent overfitting.
- bst:colsample_bytree [default=1]
- subsample ratio of the features while building a tree booster. Setting it to 0.5 means that XGBoost randomly collected half of the features to grow a tree in each iteration and this will prevent overfitting
- bst:tree_maker [default=1]
- Constructing method to build a tree, different methods may have slight different efficiency, recommend to use default option
- 0: Old tree maker adapted from SVDFeature, normally not needed
- 1: Column major expansion parallel tree maker, this is the most memory efficient one
- 2: Row major expansion parallel tree maker
- bst:lambda [default=0]
- L2 regularization term on weights
- bst:alpha [default=0]
- L1 regularization term on weights
- bst:lambda_bias
- L2 regularization term on bias, default 0(no L1 reg on bias because it is not important)
- objective [ default=reg:linear ]
- specify the learning task and the corresponding learning objective, and the objective options are below:
- "reg:linear" --linear regression
- "reg:logistic" --logistic regression
- "binary:logistic" --logistic regression for binary classification, output probability
- "binary:logitraw" --logistic regression for binary classification, output score before logistic transformation
- "multi:softmax" --set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)
- "rank:pairwise" --set XGBoost to do ranking task by minimizing the pairwise loss
- num_round
- the number of round for boosting.
- data
- The path of training data
- test:data
- The path of test data to do prediction
- base_score [ default=0.5 ]
- the initial prediction score of all instances, global bias
- eval_metric [ default according to objective ] options: rmse, error
- evaluation metrics for validation data, options: rmse, error, an initial metric will be assigned according to objective( rmse for regression, and error for classification )
- use_buffer [ default=1 ]
- whether create binary buffer for text input, this normally will speedup loading when do training repeatively
- seed [ default=0 ]
- random number seed.
- save_period [default=0]
- the period to save the model, setting save_period=10 means that for every 10 rounds XGBoost will save the model, setting it to 0 means not save any model during training.
- task [default=train] options: train, pred, eval, dump
- train: training using data
- pred: making prediction for test:data
- eval: for evaluating statistics specified by eval[name]=filenam
- dump: for dump the learned model into text format(preliminary)
- model_in [default=NULL]
- path to input model, needed for test, eval, dump, if it is specified in training, xgboost will continue training from the input model
- model_out [default=NULL]
- path to output model after training finishes, if not specified, will output like 0003.model where 0003 is number of rounds to do boosting.
- model_dir [default=models]
- The output directory of the saved models during training
- fmap
- feature map, used for dump model
- name_dump [default=dump.txt]
- name of model dump file
- name_pred [default=pred.txt]
- name of prediction file, used in pred mode