From 0be680835503a124515501d79cec422f825adaf5 Mon Sep 17 00:00:00 2001 From: Floid Gilbert Date: Wed, 11 Mar 2020 16:54:49 -0600 Subject: [PATCH] another minor update to feature_binarizer_from_trees.ipynb --- examples/rbm/feature_binarizer_from_trees.ipynb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/examples/rbm/feature_binarizer_from_trees.ipynb b/examples/rbm/feature_binarizer_from_trees.ipynb index 11b3895..685b7d4 100644 --- a/examples/rbm/feature_binarizer_from_trees.ipynb +++ b/examples/rbm/feature_binarizer_from_trees.ipynb @@ -6,7 +6,7 @@ "source": [ "# `FeatureBinarizerFromTrees`\n", "\n", - "The `FeatureBinarizerFromTrees` transformer binarizes features for BooleanRuleCG (BRCG), LogisticRuleRegression (LogRR), and LinearRuleRegression (LinearRR) models. It generates binary features based on the splits in fitted decision trees. This approach naturally creates optimal thresholds and returns only important features. Compared to `FeatureBinarizer`, the `FeatureBinarizerFromTrees` transformer reduces the number of features required to produce an accurate model. Not only does this shorten training times, but more importantly, it often results in simpler rule sets.\n", + "The `FeatureBinarizerFromTrees` transformer binarizes features for BooleanRuleCG (BRCG), LogisticRuleRegression (LogRR), and LinearRuleRegression (LinearRR) models. It generates binary features (i.e. rules) based on the splits in fitted decision trees. This approach naturally creates optimal thresholds and returns only important features. Compared to `FeatureBinarizer`, the `FeatureBinarizerFromTrees` transformer reduces the number of features required to produce an accurate model. Not only does this shorten training times, but more importantly, it often results in simpler rule sets.\n", "\n", "This notebook demonstrates basic `FeatureBinarizerFromTrees`, compares `FeatureBinarizer`, and concludes with a formal performance comparison." ] @@ -360,7 +360,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The model trains in around 10 seconds and appears to improve accuracy significantly. Though more features improved the fit in this case, it is important to point out that more features is not always better. For both explainability and accuracy, we suggest starting with a small number of features. From there, increase the number of features incrementally until accuracy plateaus or the explanation is sufficient." + "The model trains in around 10 seconds and appears to improve accuracy significantly. Though more features improved the fit in this case, it is important to point out that more features are not always better. For both explainability and accuracy, we suggest starting with a small number of features. From there, increase the number of features incrementally until accuracy plateaus or the explanation is sufficient." ] }, { @@ -398,7 +398,7 @@ "source": [ "## Using `FeatureBinarizerFromTrees` with Linear Models\n", "\n", - "To use `FeatureBinarizerFromTrees` with LogRR and LinearRR, set `returnOrd=True`. The transformer will return a standardized data frame of ordinal features in addition to the binarized features. The standardized features can then be passed to the linear model to improve accuracy. (Make sure to set `useOrd=True` for the linear model.)" + "To use `FeatureBinarizerFromTrees` with LogRR and LinearRR, set `returnOrd=True`. Like the standard `FeatureBinarizer`, the transformer will return a standardized data frame of ordinal features in addition to the binarized features. The standardized features can then be passed to the linear model to improve accuracy. (Make sure to set `useOrd=True` for the linear model.)" ] }, { @@ -604,7 +604,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The standard `FeatureBinarizer` creates thresholds by binning the data in a user-specified number of quantiles. The default setting of 9 thresholds creates 1,528 features for these data when negations are enabled. This is a very large feature space." + "The standard `FeatureBinarizer` creates thresholds by binning the data into a user-specified number of quantiles. The default setting of 9 thresholds creates 1,528 features for these data when negations are enabled. This is a very large feature space." ] }, {