Alexnet is a state of art convolution neural network, because it has outperformed all the algorithms in terms of accuracy in ImageNet competition around 2012.
ImageNet competition is a challenge where a dataset is provided with 1000 categories with millions of images to be trained for image recognition.
ALEXNET
ARCHITECTURE:
Source: https://i0.wp.com/ramok.tech/wp-content/uploads/2017/12/2017-12-31_01h31_40.jpg
In the whole
architecture, There are only convolution and max pooling operations
followed by fully connected layers (yellow color) at the end, for classification.
For simple and
fast understanding of the architecture you need to remember a formula:
[(n-k+2p)/s]+1
n -> size of image
k -> kernel size
p -> padding
s -> stride
· Now if we observe the architecture, we are passing 224x224x3 image as an input to the first layer.
Lets simply understand what happening at first layer, so that you can easily analyze for other layers:
· In the first layer it is showing
convolution operation with kernel size (11x11) and with stride (s=4).
Note 1: You need to remember that 96 such kernels are using for convolution operations. So, they represented it as 11x11x96.
Note 2: If padding is 'same', it means there is a padding operation otherwise there is no padding.
Now, Lets take our formula,
[(n-k+2p)/s]+1
Substitute n=224, k=11, s=4, p=0.
[(224-11)/4] +1
= [213/4] +1 = [53.15] +1 = 54+1 = 55
So the output
we get is 55x55, since they applied 96 kernels, the resultant output we get is
size of 55x55x96.
Similarly, The
above math is applicable for all the layers both convolution and max pooling, Please
try it on your own.
è In
our figure, At the end of the blue color region the result should flattened
and fed it into hidden layers.
è Till
blue color, It is extracting image features and after that it is doing image
classification.
Reasons Alexnet
has outperformed all the algorithms till 2012:
1. In Alexnet, They have used advanced
concepts like Data Augmentation, Dropout layer, Relu Activation unit, Local
Response Normalization (Normalizing all channels corresponding to a pixel
instead of normalizing the whole tensor).
2.
The whole architecture has built on
GPU’s.
Disadvantages:
1. At every layer, They have used
different kernel sizes like 11x11,5x5,3x3 with different padding and strides.
2. Difficult to remember the architecture.
Comments
Post a Comment