Skip to content

Commit

Permalink
another minor update to feature_binarizer_from_trees.ipynb
Browse files Browse the repository at this point in the history
  • Loading branch information
floidgilbert committed Mar 11, 2020
1 parent ca927ee commit 0be6808
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions examples/rbm/feature_binarizer_from_trees.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# `FeatureBinarizerFromTrees`\n",
"\n",
"The `FeatureBinarizerFromTrees` transformer binarizes features for BooleanRuleCG (BRCG), LogisticRuleRegression (LogRR), and LinearRuleRegression (LinearRR) models. It generates binary features based on the splits in fitted decision trees. This approach naturally creates optimal thresholds and returns only important features. Compared to `FeatureBinarizer`, the `FeatureBinarizerFromTrees` transformer reduces the number of features required to produce an accurate model. Not only does this shorten training times, but more importantly, it often results in simpler rule sets.\n",
"The `FeatureBinarizerFromTrees` transformer binarizes features for BooleanRuleCG (BRCG), LogisticRuleRegression (LogRR), and LinearRuleRegression (LinearRR) models. It generates binary features (i.e. rules) based on the splits in fitted decision trees. This approach naturally creates optimal thresholds and returns only important features. Compared to `FeatureBinarizer`, the `FeatureBinarizerFromTrees` transformer reduces the number of features required to produce an accurate model. Not only does this shorten training times, but more importantly, it often results in simpler rule sets.\n",
"\n",
"This notebook demonstrates basic `FeatureBinarizerFromTrees`, compares `FeatureBinarizer`, and concludes with a formal performance comparison."
]
Expand Down Expand Up @@ -360,7 +360,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The model trains in around 10 seconds and appears to improve accuracy significantly. Though more features improved the fit in this case, it is important to point out that more features is not always better. For both explainability and accuracy, we suggest starting with a small number of features. From there, increase the number of features incrementally until accuracy plateaus or the explanation is sufficient."
"The model trains in around 10 seconds and appears to improve accuracy significantly. Though more features improved the fit in this case, it is important to point out that more features are not always better. For both explainability and accuracy, we suggest starting with a small number of features. From there, increase the number of features incrementally until accuracy plateaus or the explanation is sufficient."
]
},
{
Expand Down Expand Up @@ -398,7 +398,7 @@
"source": [
"## Using `FeatureBinarizerFromTrees` with Linear Models\n",
"\n",
"To use `FeatureBinarizerFromTrees` with LogRR and LinearRR, set `returnOrd=True`. The transformer will return a standardized data frame of ordinal features in addition to the binarized features. The standardized features can then be passed to the linear model to improve accuracy. (Make sure to set `useOrd=True` for the linear model.)"
"To use `FeatureBinarizerFromTrees` with LogRR and LinearRR, set `returnOrd=True`. Like the standard `FeatureBinarizer`, the transformer will return a standardized data frame of ordinal features in addition to the binarized features. The standardized features can then be passed to the linear model to improve accuracy. (Make sure to set `useOrd=True` for the linear model.)"
]
},
{
Expand Down Expand Up @@ -604,7 +604,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The standard `FeatureBinarizer` creates thresholds by binning the data in a user-specified number of quantiles. The default setting of 9 thresholds creates 1,528 features for these data when negations are enabled. This is a very large feature space."
"The standard `FeatureBinarizer` creates thresholds by binning the data into a user-specified number of quantiles. The default setting of 9 thresholds creates 1,528 features for these data when negations are enabled. This is a very large feature space."
]
},
{
Expand Down

0 comments on commit 0be6808

Please sign in to comment.