Replace random forest #1116

benjamc · 2024-06-24T13:00:33Z

Issue: Installation of cpp difficult, replace by sth pythonic.

possible alternative: Romain Egele, https://github.com/deephyper/deephyper/blob/develop/deephyper/skopt/learning/forest.py#L83
what is special about our pyrfr package?
check installation of smac for python 3.11 → does swig get installed? if not, what is missing? → check github workflows (see Eddies comment Feb 23rd in SMAC3)

H.S.:

RF implemented as described in Algorithm runtime prediction: Methods & evaluation Frank Hutter∗, Lin Xu, Holger H. Hoos, Kevin Leyton-Brown. in Section 4.3.2. SMAC uses same implementation but different HPs
changes related to bias/variance → F. reduces variance
in SMAC: no compute law of total variance used. H.S. tried using it with it, leading to worse performance.
max features is really function dependent but should not be a problem. Maybe optimizing HPs of scikit learn is enough. BBOB: extremely randomized forests(scikit learn skopt (bias/variance) works a bit better.
Idea: Integrate skopt models into SMAC

hadarshavit · 2024-09-02T12:48:55Z

I investigated it a bit more.
In the original SMAC (see extended version: https://www.cs.ubc.ca/labs/algorithms/Projects/SMAC/papers/10-TR-SMAC.pdf, section 4.1 "Transformations of the Cost Metric") they explain the transformation in the aggregation of the leaves samples (which happens in line 222 in the current SMAC implementation

SMAC3/smac/model/random_forest/random_forest.py

Line 222 in 9d19475

    
           preds_as_array = np.log(np.nanmean(np.exp(preds_as_array), axis=2) + VERY_SMALL_NUMBER)

)

Note that the current implementation computes each leaf value for every sample. It can also create huge matrices (the "preds_as_array" matrix).

I check the scikit-learn implementation of random forest. There is an option to set the DecesionTreeRegressor split to "random" instead of "best" (https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor), which I think is more similar to the SMAC implementation.
To have the log-transformations, a change to the criterion is required (i.e, compute the node value in a different way https://github.com/scikit-learn/scikit-learn/blob/4aeb191100f409c880d033683972ab9f47963fa4/sklearn/tree/_criterion.pyx#L1032). Such change should be possible as different criteria already use different terminal values ("MSE and Poisson deviance both set the predicted value of terminal nodes to the learned mean value of the node whereas the MAE sets the predicted value of terminal nodes to the median" from https://scikit-learn.org/stable/modules/tree.html#regression-criteria)

benjamc added the dependency Pull requests that update a dependency file. label Jun 24, 2024

sarah-segel mentioned this issue Aug 7, 2024

Replace random forest automl/DeepCAVE#171

Open

benjamc mentioned this issue Aug 27, 2024

Feature request: Improve cross platform compatability #1138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace random forest #1116

Replace random forest #1116

benjamc commented Jun 24, 2024

hadarshavit commented Sep 2, 2024

Replace random forest #1116

Replace random forest #1116

Comments

benjamc commented Jun 24, 2024

hadarshavit commented Sep 2, 2024