Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
### Description Edited: This proposes a new operator `TreeEnsemble` that supersedes the pre-existing `TreeEnsembleRegressor` and `TreeEnsembleClassifier` operators. It will require a bump to `ai.onnx.ml` opset 5. Further details can be found in onnx#5851. A summary of the updates: 1. TreeEnsemble supports double outputs. 2. Adds a `'SET_MEMBER'` node mode to encode set membership. 3. Type errors are raised if split values do not have the same type as the input and if the `nodes_*` attributes do not have the same length (and likewise for `leaf_*`). 4. Integer input types are dropped. - With the remaining attributes only being represented in floating point, this can be replicated by simply using a Cast standard operator before the tree regressor with no behaviour change. 5. `base_values` is dropped. - This attribute simply specified an offset added after target values are aggregated. This can be implemented by using the Add standard operator. 6. The general encoding has been changed to reduce redundancy. Before, all nodes contained fields like `truenodeids` which are only relevant for interior nodes and not leaves. Since leaves will account for at least roughly half the nodes in a binary decision tree, this is highly wasteful. Therefore, this representation has fields for `nodes_*` for interior nodes and `leaf_*` for leaf nodes. - The relationship between leaf and target is now strictly such that a leaf can have one target (and a target may continue to be contributed by many leaves). This nuance is discussed [here](onnx#5851 (comment)). 7. Enumerations are held in integer attributes rather than strings (`aggregate_function`, `post_transform`, `nodes_modes`). 8. The use of treeids and nodeids is dropped in favour of using the index into the `nodes_*` and `leaf_*` attributes to define the tree structure directly with no indirection. A `tree_roots` field has been added to denote the roots of each decision tree in the ensemble. The `TreeEnsembleRegressor` can be implemented by directly using this operator. The `TreeEnsembleClassifier` can be implemented by using this operator and then computing the top class for each input by applying an ArgMax operation for each output before using `LabelEncoder/GatherND` to produce the requisite label. As per the reference implementation tests, this representation can continue to perform the same operations as before as used while adding some new capability in set memberships. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? --> <!-- - If it fixes an open issue, please link to the issue here. --> Addresses onnx#5851. Signed-off-by: Aditya Goel <[email protected]> Signed-off-by: Aditya Goel <[email protected]>
- Loading branch information