It is the basic algorithm at which everybody would like to start their learning in Data Science.
Now, what exactly the Linear Regression is
1. Linear Regression is the supervised learning algorithm where it’s main aim is to find the line that best fits the given data.
2. Here ‘Fitting the best line for given data’ means finding the relation between dependent and independent variables present in the data.
Note 1: you need to use Linear regression only when
your dependent and independent variables have linear relationship.
Note 2: Here Independent variables can be both
discreet or continuous data, but dependent variables should be continuous data.
Ok, Let me explain with good example,
Source: https://miro.medium.com/max/327/1*cFq7XW-Z69fDBil9wjyEBQ.pngIn the above example,
If we observe the data, As ‘years of Experience’ is
increasing, ‘Salary’ also increasing. It means they have linear relationship.
So here we can apply Linear regression.
Ok, we observed the linear relation, Now how can
you find best fit line?
We know that,
‘years of Experience’ (Y) directly proportional to
‘Salary’ (S) which means we can write it as follows.
Y = m * S
Now we need to add bias ‘b’ so that we will be more
accurate.
Y=m*S + b
Ok, we got the line equation, what now?
Yes, we got a line equation, but for same data we get so many lines, because ‘m’ & ‘b’ can be any values.
Source:
https://miro.medium.com/max/801/1*VLNSWcbBYZddA1WVRr8jNQ.png
Here now comes The ultimate part to find ‘m’ &
‘b’ such that we need to find an optimal line that best fits the data and also
should outperform all other lines with less errors.
What exactly the error means, How do we quantify it in Linear regression? (Quantifying the error nothing but the cost function we can also call it as Loss function or Error function.)
1. By finding the best fit line we need to decrease the error between original value and predicted value.
2. During finding the errors we get both positive and negative errors, In order to quantify both we have a cost function known as Root Mean Squared Error.
Now, our main aim is to minimize the cost function.
How do we do it?
There comes Optimizers like Gradient descent,
Stochastic Gradient descent, Adagrad, Adam etc...
So, with the help of optimizers we update ‘m’ &
‘b’ so that we will get optimal values there by we get best fit line. With the
help of that line we can able to predict the Future data.
Gradient descent is the basic optimizer, lets
discuss about it.
We use gradient descent to minimize the cost function
iteratively by updating parameters ‘m’ & ‘b’.
Step 1: lets initialize ‘m’ & ‘b’ randomly.
Step 2: Now we need to update ‘m’ & ‘b’ as follows:
Similarly, For ‘b’ we need to update accordingly.
We need to do it iteratively till,
1. mnew almost equals to mold
2. bnew almost equals to bold
Thus we will get the optimal parameters there by we
get best fitted line so that we can use it for prediction.
ASSUMPTIONS:
1. linearly related
2. Error must be gaussian distributed
3. Features must be non multicollinear
4. The variance of residual is same for any value of X - Homoscedasticity.
APPLICATIONS:
1. weather forecasting.
2. predicting the price of house.
3. predicting the stock price in the stock market.
Comments
Post a Comment