GitHub - yogacha/Non-parametric-Decision-Tree

Non-parametric Regression Tree

All objective functions based on absolute value (square, absolute, Huber... ) have implicit parametric distribution assumption in disguise (Gaussian, Laplace...), thus fail badly when assumption not satisfied.

To have a robust (outliers-free) regression tree, we should use objective function only depends on relative value.

Background

Let's say we have a decision stump on the dth dimension that splits data points into reds(xi=1) and blues(xi=0).

To find the best split, first define a objective function depends only on relative values of data points.

Non-parametric approach

Kendall's Tau, also known as bubble sort distance. Defined as follow:

When i, j have same color, goes to zero, thus it's equivalent to:

There is a trade-off between balance & difference.

In practice, we can use other impurities (entropy...) & differences (log odds...) as long as .

Note that:

(Gini impurity)

(conditional probability, draw one blue & one red without replacement)

Problem Statement

where is kendall rank correlation coefficient shown before.

and

Compare with SSE

where r is pearson correlation coefficient.

Algorithm

Calculate following for every possible stump,

Initialize:

Set all points to blue , tau = 0, maximum = 0
Sort samples by Xd as a queue
get next item i from queue, turn it red.
Calculate tau, record stump if |tau| > maximum
repeat 2. until all points are red

In short:

Dynamic Programing

With precompute:

sort every dimension

Time complexity

Then for every depth:

calculate cummulate sum for each leaf

Time complexity

Further

Just like what elastic net does, we can consider both abosulte & relative value by weighted sum.

Ex: Objective function =

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
icon		icon
.gitignore		.gitignore
README.md		README.md
tree.py		tree.py
usage.py		usage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Non-parametric Regression Tree

Background

Non-parametric approach

Problem Statement

Compare with SSE

Algorithm

Dynamic Programing

Further

About

Releases

Packages

Languages

yogacha/Non-parametric-Decision-Tree

Folders and files

Latest commit

History

Repository files navigation

Non-parametric Regression Tree

Background

Non-parametric approach

Problem Statement

Compare with SSE

Algorithm

Dynamic Programing

Further

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages