Training a Neural Network with MNIST Data

Training a Model with MNIST data is very simple. It’s basically a classification model to look into the image and predict the handwritten digit accurately.MNIST contains 70,000 images of handwritten digits: 60,000 for training and 10,000 for testing. The images are grayscale, 28x28 pixels, and centered to reduce preprocessing and get started quicker.

All of us can do it. But to make it a little fun we have added some constraints to it. Let’s understand the model input, output and architecture first before getting into the code. SO here is how the readme has been structured

Input
Constraints
Network
Parameters
Output
How we arrived at it
Takeaway

Setting up the Environment

We will be using PyTorch to train a convolutional neural network to recognize MNIST's handwritten digits in this article. PyTorch is a very popular framework for deep learning like Tensorflow, CNTK and Caffe2. But unlike these other frameworks PyTorch has dynamic execution graphs, meaning the computation graph is created on the fly.

Now that we have setup the basics, lets try and experiment to achieve 99.4% or above validation accuracy with less than 8000 parameters. The following description gives a very High level view of 13 different steps for achieving the target( The details are in the individual notebooks)

Step-1

Typical Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling

Target:

In this step

I have choosen vanilla architecture of 6 convolution layer and 2 transtion blocks (maxpool)
Used GAP in the the last layer.
My target is to achieve 99% validation accuracy
Once I get 99% accuracy , I know I can refine the mode further to achieve higher accuracy with less parameters
Run it for 20 epochs to study how the accuracy changes in vanila architecture

Result:

Observed Train accuracy: 99.58%
Validation accuracy:99.23%
Number of parameters: 40,202

Analysis:

I could see that validation accuracy is steadily increasing over epochs, and finally got validation accuracy of 99.23%. This tells me that it is a good architecture to explore further.
I also noticed that train accuracy is 99.58% which is much higher than validation accuracy of 99.23%,. This means that the model is possibly overfitting. But as number of parameters is 40,202 which is around 4 times my target parameters, I will try to reduce the parameters in next step and observe the impact before trying out other options to increase the validation accuracy

Step2

Typical Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling

Target:

In this step

I have choosen vanilla architecture of 6 convolution layer and 2 transtion blocks (maxpool) but reduced the number of channels to keep the number of paramaters less
Used GAP in the the last layer.
My target is to achieve near 99% validation accuracy with less than 10,000 parameters
Once I avhieve this , the I can refine the model further.
Need to achieve the above accuracy within 15 epochs

Result:

Observed Train accuracy: 98.96%
Validation accuracy:98.9%
Number of parameters: 8,442

Analysis:

As expected validation accuracy has reduced to 98.9% from the vanilla architecure with accuracy 99.23 but almost near to what I would call a candidate for tuning
This reduction in accuracy is for the reduction in number of parameters. But unlike vanilla architecture, the difference between Train Accuray and Validation accuracy is very less, which means it is not overfitting.
Also, number of parameters 8442 is well within my target of 10K parameters

Step3

Typical Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling

Target:

In this step

Added Dropout to the Model architecture from Step-2
Used GAP in the the last layer.
Target is to achieve over 99% validation accuracy with the same number of parameters which is 8582 in step-2
If I am able to achieve over 99% accuracy then I would consider the change to be positive and try to enhance further to improve the accuracy
Run it for 15 epochs to study how the change affects the accuracy improvement and also if it's a stable increase

Result:

Observed Train accuracy: 98.69%
Validation accuracy:99.35%
Number of parameters: 8,582

Analysis:

As expected validation accuracy has improved over 99% and touched 99.35% in this step.It's also important to see that many epochs achieved over 99% accuracy which is a positive sign
I also observe that validation accuracy 99.35 is much higher than training accuracy 98.69. These could be because of regularization effect of batch normalization and droupout introudced in this step
Also, number of parameters is 8582 is well within my target of 10K parameters

Step 4

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Added Image augmentation of random rotation between -7 to +7 degrees

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling

Target:

In this step

Target is to achieve over 99.4% validation accuracy with the same number of parameters. Random rotation should help achieve this as during the visual inspection of the input images I find that few images are rotated , Hence to learn those type of images this augmentation technique should help us achieve the required accuracy %
Run it for 15 epochs to study how the accuracy changes with Image Augmentation technique

Result:

Observed Train accuracy: 98.5%
Validation accuracy:99.37%
Number of parameters: 8,582

Analysis:

Expected this change would help model achieve over 99.4 % accuracy but it stopped at 99.37% in this step. However it's slightly higher than the step-3 accuracy of 99.35%.
I also observe that validation accuracy 99.37% is much higher than training accuracy 98.5%. These are because with image augmentation effect as well
However points to note is that - Even after few techniques , I am still behind the target of 99.40 validation accuracy and slightly over 8000 Parameters
As this showed some improvement , I believe tunign further may improve the accuracy

Step 5

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Image augmentation of random rotation between -7 to +7 degrees

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling
In this step I am going to introducing StepLR with step size 6 and gamma value 0.1. This is the additional feature over the model in Step-4

Target:

In this step

Step LR helps to change the leanring rate a little after a fixed number of epochs , which at times helps to converge. So my hope is that it will improve the validation accuracy . So the target is to achieve over 99.4% validation accuracy with same number of parameters
Run it for less than 15 epochs to study how the accuracy changes with StepLR

Result:

Observed Train accuracy: 98.39%
Validation accuracy:99.25%
Number of parameters: 8,582

Analysis:

Suprisingly the validation accuracy dropped from the previous step which means that either the StepLR is not helping or the step size is not.
Now I have two ways to look at it , The first one is to drop stepLR or use some other mechanism to improve .

Step 6

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Image augmentation of random rotation between -7 to +7 degrees

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling
StepLR with step size 6 and gamma value 0.1.

Target:

In this step

User a lesser value for drop out and see if that helps to improve the validation accuracy
Target is to achieve over 99.4% validation accuracy with same number of parameters
Run it for less than 15 epochs to study how the accuracy changes with Image Augmentation technique

Result:

Observed Train accuracy: 98.68%
Validation accuracy:99.4%
Number of parameters: 8,582

Analysis:

As expected validation accuracy increased to 99.4% from the last step in which the accuracy dropped to 99.25%
Even though the accuracy hit the target of 99.4% but it happended in only one step. At the same time the accuracy jumped a lot and didn't stabilize from lower epoch to higher epoch
This model needs further improvement as it doesn't seems to be a very smooth model

Step 7

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Image augmentation of random rotation between -5 to +5 degrees

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling
StepLR with step size 6 and gamma value 0.1.

Target:

In this step

MNIST Dataset is a simple dataset with very little variation. Hence We may not need a higher batch size for the learning rate we are using . At the same time the rotation of (-7 ,+7) might be slightly higher.
Hence I want to make the following changes to it a) Change Batch size from 128 to 64 b) Modify rotation from (-7,+7) to (-5,+5)

Result:

Observed Train accuracy: 99%
Validation accuracy:99.42%
Number of parameters: 8,582

Analysis:

The Validation accuracy increased to 99.42%
It can also be seen that the accuracy has stabilized towards the last few steps which is a good sign
Now as we have achieved our target accuracy I should try optimizing it further

Step 8

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Image augmentation of random rotation between -5 to +5 degrees

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling
StepLR with step size 6 and gamma value 0.1.

Target:

In this step

The Requirement for us is to achieve better accuracy with less number of steps.
Knowing MNIST dataset , I believe slightly higher leanring rate may reach the minimum faster , So I am going to increase the LR a little

Result:

Observed Train accuracy: 99.04%
Validation accuracy:99.44 %
Number of parameters: 8,582

Analysis:

As expected the validation accuracy improved and stabilized much better
The model also improved from one epoch to another and converged faster

Step 9

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Image augmentation of random rotation between -5 to +5 degrees

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling
StepLR with step size 6 and gamma value 0.1.

Target:

In this step

I am going to try and reduce the number of parameters less than 8000
I am also going to increase the LR a little more and see if that helps converge faster as the number of parameter is being reduced a little from last step
Target is to achieve over 99.4% validation accuracy with less than 8,000 parameters
Run it for less than 15 epochs

Result:

Observed Train accuracy: 99.04%
Validation accuracy:99.42%
Number of parameters: 7,836

Analysis:

After the model completed all the 14 epochs I saw that the model has overshooted 99.4% accuracy in multiple steps which is a good sign
I achieved our target with less number of parameters
However I want to try and experiment on reducing the parameter further and see if the model is getting stabilized even further

Step 10

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Image augmentation of random rotation between -5 to +5 degrees

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling
StepLR with step size 7 and gamma value 0.1.

Target:

In this step

I reduced the number of parameters further below 6000
I am also going to increase the step size to 7 assuming that the first 6 steps may take the loss closure to the minimum
Target is to achieve over 99.4% validation accuracy with less than 6,000 parameters
Run it for less than 15 epochs

Result:

Observed Train accuracy: 98.91%
Validation accuracy:99.38%
Number of parameters: 5,854

Analysis:

With less parameters the model could not reach the required accuracy
Having said the above and looking at the pattern of increase in validation accuracy from one epoch to another , I think i can try to optimize it further by keeping the parameters same

Step 11

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Image augmentation of random rotation between -5 to +5 degrees

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling
StepLR with step size 7 and gamma value 0.1.

Target:

In this step

In this I increased the parameters a little to make it over 6000. The reason is for me to ensure that parameters play a big role here .
Target is to achieve over 99.4% validation accuracy with around 6K parameters and more importantly the consistency
Run it for less than 15 epochs

Result:

Observed Train accuracy: 98.86%
Validation accuracy:99.45%
Number of parameters: 6,254

Analysis:

As expected the model validation accuracy increased to 99.45%
The Best part of the model is the way accuracy stabilized from one epoch to another
According to me this is the best model i have trained so far of all the steps
Now as I achieved the expected accuracy , let me again try to bring the parameters below 6000 and still achieve accuracy of 99.4%

Step 12

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Image augmentation of random rotation between -5 to +5 degrees

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling
In this step I will change the architecture from step4 by introducing StepLR with step size 7 and gamma value 0.1.

Target:

In this step

I decreased the number of parameters and now it's slightly over 6k
Target is to achieve over 99.4% validation accuracy
Run it for less than 15 epochs to study how the accuracy changes with Image Augmentation technique

Result:

Observed Train accuracy: 98.92%
Validation accuracy:99.4%
Number of parameters: 6,054

Analysis:

The accuracy just touched 99.4% but not seems to be stabilized better. I didn't expect this behaviour
I believe that the channel distribution could be wrong. Let me try one last option to play around with channels

Step 13

Structure

Convolution Layer of 3 x 3 Kernel with Padding of 1
ReLu
Maxpool
Dropout - Regularization Technique
Batch Normalization
Image augmentation of random rotation between -5 to +5 degrees7

Additional Structures

Convolution Layer of 1 x 1 Kernel to consolidate without convolution
Global Average Pooling
StepLR with step size 7 and gamma value 0.1.

Target:

In this step

Reduced the number of parameters but the channel size in different conv layers are different, especially keeping the 4 layer structure in mind
Target is to achieve over 99.4% validation accuracy with less than 6,000 parameters
Run it for less than 15 epochs to study how the accuracy changes with Image Augmentation technique

Result:

Observed Train accuracy: 98.88%
Validation accuracy:99.4%
Number of parameters: 5,854

Analysis:

The model finally achieved 99.4% accuracy and seems to be pretty stable
EVen though I achieved 99.4% with less than 6k parameters, According to me the best model is step 11 where it is more consistent
I will try more to see if i can bring the parameters down some more by keeping the target of 99.4%

Summary

I could achieve the requirements , however the best step forward would be to reduce the parameters further and see if I can still achieve 99.4% accuracy

nkanungo / era_s7 Goto Github PK

era_s7's Introduction

Training a Neural Network with MNIST Data

Setting up the Environment

Step-1

Typical Structure

Additional Structures

Target:

Result:

Analysis:

Step2

Typical Structure

Additional Structures

Target:

Result:

Analysis:

Step3

Typical Structure

Additional Structures

Target:

Result:

Analysis:

Step 4

Structure

Additional Structures

Target:

Result:

Analysis:

Step 5

Structure

Additional Structures

Target:

Result:

Analysis:

Step 6

Structure

Additional Structures

Target:

Result:

Analysis:

Step 7

Structure

Additional Structures

Target:

Result:

Analysis:

Step 8

Structure

Additional Structures

Target:

Result:

Analysis:

Step 9

Structure

Additional Structures

Target:

Result:

Analysis:

Step 10

Structure

Additional Structures

Target:

Result:

Analysis:

Step 11

Structure

Additional Structures

Target:

Result:

Analysis:

Step 12

Structure

Additional Structures

Target:

Result:

Analysis:

Step 13

Structure

Additional Structures

Target: