We interpret random forests, and bagging ensemble estimators in general, via the framework of nonparametric Bayesian (npB) analysis. The ensemble strategies are revealed as approximations to posterior mean inference for complex summaries of the population data generating process. This insight motivates a class of fully Bayesian forest algorithms that provide gains in interpretability (from a Bayesian perspective) and predictive performance over their classically bagged predecessors. The npB framework is then used to reinterpret common distributed algorithms for efficient forest estimation on Big Data. From this, we propose a novel blocking strategy for fitting tree ensemble predictors on internet-scale data stored in a distributed file system (such as HDFS).
-
Notifications
You must be signed in to change notification settings - Fork 14
TaddyLab/bayesian-forest
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published