- Simple case: 2 variables
-
Linear equation:
$y = mx + b$ ,$m$ = slope of the line,$b$ = intersection of the line with the yy axis -
Estimation of parameters:
- The estimation for
$m$ should be statistically significantly different from 0 - The estimation for
$b$ may or may bot be statistically significantly different from 0
- The estimation for
- Assume errors are independently and identically distributed, with constant variance.
Given the value of x, the model estimates the value of y (
- Error:
$\sigma - y$ ,$\sigma$ = value estimated,$y$ = true value
- Do not use mean error
- Mean absolute error: estimates "typical" error
- Mean squared error: assigns more weight to larger error (may be dominated by a few cases)
- Values depend on the scale of the target variable
- Trivial model is assuming the mean value for every instance;
- Regression is only useful if its error is lower than the one obtained with the trivial prediction
- fin kNN
- predict the average of their target values (instead of majority voting)
- Train (splitting criterion based on the sum of the variances, instead of gini or entropy)
- Prediction (average of targets in the leaf, instead of majority voting)
- Variants
- Model trees (using MLR or KNN in the leaves instead of the average)
- MARS (multivariate adaptive regression splines)
- Single output node (predicted y = score)
- Continuous activation function
- Margin: minimize the tube "around" the data (instead of maximizing the distance to the closest examples from each class)
-
Bias: type of model an algorithm is able to learn given a set of training data; related to hypothesis language (e.g. linear vs. quadratic)
-
Variance: variation in model an algorithm is able to learn, given different training data (e.g. small changes)
-
Bias-Variance trade-off: low bias implies high variance and vice-versa