Skip to content

Linear Regression

AB edited this page Jun 29, 2021 · 15 revisions

Linear Regression

Definition

  • Linear Regression is a technique used to predict the unknown value of a variable (Dependent variable) from the knowns value of another varialbs(Independent variable)

  • Linear Regression When we want to predict real continuous values as an output

  • Regression is of 2 type

    • Simple Linear Regression
    • Multivariate Linear Regression
  • Simple Linear Regression

  • It contains only one input variable. Only one straight line.

    • Y = Dependent Variable
    • a = Y intercept
    • b = slope of the line
    • X = Independent variable
  • Y intercept(a): is the value of the Dependent variable(y) when the value of the indipendent variable is zero(0). This is the point at which the line cuts the y-axis.

  • Slope(b): is the change in the Dependent Variable(y) for a unit increase in the indipendent variable. It is the tangent of the angle made by the line with the x-axis.

  • Multi Regrtession Models
  • Here we need to find out all the coefficients.
  • It means the beta is creating some kind of relationship or slope it is creating with respect to the output variable
  • Findout the relationshio between beta and the output variables..

linear.png

  • The Deviation or the Error is like-
  • Deviation is called Residual.
  • The mail goal is to reduce the error.
  • -ve Y actuals also there thats why we take the squre of the whole residuals
  • Residual Sum Of Squres(RSS) = Ordinary least squares(OLS) = Error function = Loss Function.

  • Now we have to take the minimum of all errors. That's will give us the beest fit line.

  • Inorder to fing out the minimum of a function we need to use first order derivation and second order derivation to find out the minima and maxima.

  • Best-fit-line we need to find out when we findout the 'a' and the 'b' that's the line which is the min error.

Linear Regression

  • Linear Regression tends to establish a relationship between a dependent variable(Y) and one or more independent variable(X) by finding the best fit of the straight line.
  • The equation for the Linear model is Y = mX+c, where m is the slope and c is the intercept
  • There is no straight line that runs through all the data points.
  • So, the objective here is to fit the best fit of a straight line that will try to minimize the error between the expected and actual value.

Where to use Linear Regression?

Where -

  1. data will be free from outliers.
  2. data should not having null values for all features.
  3. outcomes will be in linear relationship with all features.
  4. there will be no collinearity

why not Outliers

  • Suppose we have made a model and the testing data contains outliers then the best fit line are not in the place where it supposed to be, So we need to re allocate the best fit line in order to get a optimum result.

Collinearity

High Collinearity between features is a scenario when output features/variables are/is dependent on two different features. This Multicollinearity effects on the model performance coz we are doing reputative things.

Solving Multicollinearity problem:

  • Dataset with small number of features- Programmatically make a correlation hit map - find which features more than 90% correlated and drop the features. Drawback is loss of information.
  • Dataset with high number of features- Ridge and lasso Regression