You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I guess you have lot's of categorical features in your dataset (possibly with high cardinality).
When we are training models, we generate CTR tables for categorical features on-the-fly as they are needed, so it's totally normal, that GPU memory usage shows practically no correlation with resulting model size - we calculate all selected CTR tables after training savinge them in the model object.
To reduce model size, in 0.24 we have finally implemented model size regularization - now we are penalyzing model splits that are using large CTR tables. model_size_reg is now turned on by default both on CPU and GPU and set to 0.5. You can play with this parameter, raising it to achieve smaller model size.
Also, you could reduce model size by limiting CTR complexity, setting max_ctr_complexity parameter - by default we are trying to greedily make combinations with up to 4 categorical features.
You can read about this params in the new blog post on towardsdatascience and in tutorial covering categorical feature parameters
catboost/catboost#1023
catboost/catboost#1028
The text was updated successfully, but these errors were encountered: