Linear Regression

Linear Regression is a technique used to predict the unknown value of a variable (Dependent variable) from the knowns value of another varialbs(Independent variable)
Linear Regression When we want to predict real continuous values as an output
Regression is of 2 type
- Simple Linear Regression
- Multivariate Linear Regression
Simple Linear Regression

It contains only one input variable. Only one straight line.
- Y = Dependent Variable
- a = Y intercept
- b = slope of the line
- X = Independent variable
Y intercept(a): is the value of the Dependent variable(y) when the value of the indipendent variable is zero(0). This is the point at which the line cuts the y-axis.
Slope(b): is the change in the Dependent Variable(y) for a unit increase in the indipendent variable. It is the tangent of the angle made by the line with the x-axis.

$b=\frac{y_2-y_1}{x_2-x_1}$

$Y=\beta _0+\beta_1x_1+\beta_2x_2+\beta_3x_3+...+\beta_nx_n$

Here we need to find out all the $\inline \beta$ coefficients.
It means the beta is creating some kind of relationship or slope it is creating with respect to the output variable
Findout the relationshio between beta and the output variables..

$AllResidual = \sum (Y_p-Y_a)$

$ResidualSumOfSqures = \sum (Y_p-Y_a)^2$

Residual Sum Of Squres(RSS) = Ordinary least squares(OLS) = Error function = Loss Function.
Now we have to take the minimum of all errors. That's will give us the beest fit line.
Inorder to fing out the minimum of a function we need to use first order derivation and second order derivation to find out the minima and maxima.

$ResidualSumOfSqures = \left (min \sum (Y_p-Y_a)^2\right )$

Best-fit-line we need to find out when we findout the 'a' and the 'b' that's the line which is the min error.

Linear Regression tends to establish a relationship between a dependent variable(Y) and one or more independent variable(X) by finding the best fit of the straight line.
The equation for the Linear model is Y = mX+c, where m is the slope and c is the intercept
There is no straight line that runs through all the data points.
So, the objective here is to fit the best fit of a straight line that will try to minimize the error between the expected and actual value.

Where -

Suppose we have made a model and the testing data contains outliers then the best fit line are not in the place where it supposed to be, So we need to re allocate the best fit line in order to get a optimum result.

Dataset with small number of features- Programmatically make a correlation hit map - find which features more than 90% correlated and drop the features. Drawback is loss of information.
Dataset with high number of features- Ridge and lasso Regression

Provide feedback