Skip to main content

How to conduct Hypothesis Testing step by step - simple and elegant (part 3)

Step by Step procedure in Conducting the Hypothesis Testing: prerequisites:   Part 1:  What is Hypothesis Testing and When it is Used? Part 2:  How to decide Null and Alternate Hypothesis?                    https://www.isixsigma.com/   Before diving into the steps, first let’s understand the Important Terminology we need to understand: Null Hypothesis: It is a well-established fact which is believed by all the people. It can also defined as an Assumption that treats everything equally and similarly. Alternate Hypothesis: It is a new claim made against the Null Hypothesis. If we have enough evidence against the Null Hypothesis we will reject Null Hypothesis. P-value: Probability of Null Hypothesis being true. Significance level: probability of rejecting the Null Hypothesis when it is true. It is a critical point where we decide whether the Null Hypothesis is rejected or not. ...

Linear Regression - SIMPLIFIED

It is the basic algorithm at which everybody would like to start their learning in Data Science.

      

     Now, what exactly the Linear Regression is

1. Linear Regression is the supervised learning algorithm where it’s main aim is to find the line that best fits the given data.

2. Here ‘Fitting the best line for given data’ means finding the relation between dependent and independent variables present in the data.

 

Note 1: you need to use Linear regression only when your dependent and independent variables have linear relationship.

Note 2: Here Independent variables can be both discreet or continuous data, but dependent variables should be continuous data.

Ok, Let me explain with good example,

                                         
                                                                                        Source:  https://miro.medium.com/max/327/1*cFq7XW-Z69fDBil9wjyEBQ.png

In the above example,

If we observe the data, As ‘years of Experience’ is increasing, ‘Salary’ also increasing. It means they have linear relationship. So here we can apply Linear regression.

Ok, we observed the linear relation, Now how can you find best fit line?

We know that,

‘years of Experience’ (Y) directly proportional to ‘Salary’ (S) which means we can write it as follows.

 Y = m * S

Now we need to add bias ‘b’ so that we will be more accurate.    

 Y=m*S + b

Ok, we got the line equation, what now?

Yes, we got a line equation, but for same data we get so many lines, because ‘m’ & ‘b’ can be any values.

                                                              Source: https://miro.medium.com/max/801/1*VLNSWcbBYZddA1WVRr8jNQ.png

Here now comes The ultimate part to find ‘m’ & ‘b’ such that we need to find an optimal line that best fits the data and also should outperform all other lines with less errors.

 

What exactly the error means, How do we quantify it in Linear regression? (Quantifying the error nothing but the cost function we can also call it as Loss function or Error function.)

1. By finding the best fit line we need to decrease the error between original value and predicted value.

2. During finding the errors we get both positive and negative errors, In order to quantify both we have a cost function known as Root Mean Squared Error.


Source: GEEKFORGEEKS

                                                     Source: https://www.jmp.com/en_us/statistics-knowledge-portal/what-is-multiple-regression/fitting-multiple-regression-model/_jcr_content/par/styledcontainer_2069/par/lightbox_4130/lightboxImage.img.png/1548702778023.png

 

Now, our main aim is to minimize the cost function. How do we do it? 

There comes Optimizers like Gradient descent, Stochastic Gradient descent, Adagrad, Adam etc... 

So, with the help of optimizers we update ‘m’ & ‘b’ so that we will get optimal values there by we get best fit line. With the help of that line we can able to predict the Future data.

Gradient descent is the basic optimizer, lets discuss about it.

We use gradient descent to minimize the cost function iteratively by updating parameters ‘m’ & ‘b’.

Step 1: lets initialize ‘m’ & ‘b’ randomly.

Step 2: Now we need to update ‘m’ & ‘b’ as follows:

              

Similarly, For ‘b’ we need to update accordingly.

We need to do it iteratively till,

1.   mnew almost equals to mold   

2. bnew almost equals to bold

Thus we will get the optimal parameters there by we get best fitted line so that we can use it for prediction.

 

ASSUMPTIONS:  

1. linearly related

2. Error must be gaussian distributed

3. Features must be non multicollinear

4. The variance of residual is same for any value of X - Homoscedasticity.

 

APPLICATIONS: 

1. weather forecasting.

2. predicting the price of house.

3. predicting the stock price in the stock market.

 

Comments

Popular posts from this blog

How to conduct Hypothesis Testing step by step - simple and elegant (part 3)

Step by Step procedure in Conducting the Hypothesis Testing: prerequisites:   Part 1:  What is Hypothesis Testing and When it is Used? Part 2:  How to decide Null and Alternate Hypothesis?                    https://www.isixsigma.com/   Before diving into the steps, first let’s understand the Important Terminology we need to understand: Null Hypothesis: It is a well-established fact which is believed by all the people. It can also defined as an Assumption that treats everything equally and similarly. Alternate Hypothesis: It is a new claim made against the Null Hypothesis. If we have enough evidence against the Null Hypothesis we will reject Null Hypothesis. P-value: Probability of Null Hypothesis being true. Significance level: probability of rejecting the Null Hypothesis when it is true. It is a critical point where we decide whether the Null Hypothesis is rejected or not. ...

Simple Understanding of INCEPTION_V3 and it's Architecture (part 3)

INCEPTION_V3 Prerequisites:  VGGNet OR VGG16:  VGG16 Architecture   By looking it's name, everybody think's that it is a complicated story just like the movie INCEPTION. But trust me, I will prove you that it is wrong by explaining in the most detailed way. Till now, If we take a layer in any neural network we only applied single operation like convolution or maxpooling and also with fixed kernel size for the whole layer.  But Now, The idea is, why can't we use all the operations in a single layer at a time. There comes INCEPTION_V3. Lets zoom a single layer in the inception_v3, source: It's a screenshot from AndrewNg class If you observe the above figure, convolution operation with kernel sizes 1x1,3x3,5x5 and the max-pool operation, all have applied at a time. Here comes a problem, COMPUTATION. only from single layer we are getting billions of computations. For example, lets do a simple mathematical calculation here, Note: To understand this you need to know how c...

Simple Understanding of VGG16 and it's Architecture (part 2)

Vgg16 Architecture:   Prerequisites:  ALEXNet  :  ALEXNet Architecture   * vgg refers to visual geometry group who has developed vgg16 architecture. * 16 refers to number of layers. VGG16 or VGGNet is a convolution neural network and it is a simplified and better version of Alexnet. Remembering Alexnet architecture is very difficult, Later on 2014, vgg16 came as an architecture and it is very simple to remember. VGG16 is very simple where:  ·   It is built on Convolution operations with fixed kernel_size=(3x3), padding=’same’, and stride=1 for all Convolution layers. ·     And also Maxpooling operation with pool_size=(2x2) and stride=2 for all Maxpool layers.       Let's see the architecture of vgg16 source:https://qph.fs.quoracdn.net/main-qimg-e657c195fc2696c7d5fc0b1e3682fde6 If you observe the figure, As said all convolution layers have 3x3 kernels and all maxpool layers have size of 2x2. Note 1: For...