Better Ideas for Classifiers with Categoricals #35

shaycrk · 2017-10-20T19:15:59Z

The two classifiers in #34 are a first pass at handling categorical features in a smarter way than a bunch of independent dummies, but here are a couple of ideas for doing this a bit more intelligently, albeit with a deeper re-write of the sklearn decision tree code, ideally more efficiently than trying all 2^n partitions of a categorical:

Respect categoricals when subsetting features at each node (using the same logic as in Classifiers that respect categoricals #34 but at each decision point rather than just overall for the entire tree)
Respect categoricals when subsetting features at each node. Then, if a categorical value is chosen to split on, ensure that categorical is included in the considered features at subsequent nodes (or maybe with a selection probability falling off with some decay as you go down the tree)
Respect categoricals when subsetting features at each node. Then, for each categorical in the selected subset, train a simple logit model for the outcome across the categorical values (since triage categoricals are actually aggregations of categoricals, better to use a model than simple correlations/conditional averages) - possibly trained on a sub-sample of the data for efficiency - and consider this score as a continuous variable to split on rather than the categorical value columns themselves. If chosen, the node will need to keep track of the categorical columns and logit model for predicting on new examples. This approach would allow for splitting based on all values of the categorical concurrently without having to attempt all possible combinations.

jesteria · 2017-12-12T22:15:33Z

This issue was moved to dssg/triage#296

shaycrk added the new-feature label Oct 20, 2017

jesteria closed this as completed Dec 12, 2017

jesteria mentioned this issue Dec 12, 2017

Better Ideas for Classifiers with Categoricals dssg/triage#296

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Ideas for Classifiers with Categoricals #35

Better Ideas for Classifiers with Categoricals #35

shaycrk commented Oct 20, 2017

jesteria commented Dec 12, 2017

Better Ideas for Classifiers with Categoricals #35

Better Ideas for Classifiers with Categoricals #35

Comments

shaycrk commented Oct 20, 2017

jesteria commented Dec 12, 2017