Handwritten Digit Recognition using MLP

Aim:

To Recognize the Handwritten Digits using Multilayer perceptron.

Theory:

We have used an MLP to recognize the digits
A multilayer perceptron (MLP) is a feedforward artificial neural network that generates a set of outputs from a set of inputs. An MLP is characterized by several layers of input nodes connected as a directed graph between the input and output layers. MLP uses back propagation for training the network. MLP is a deep learning method.
A multilayer perceptron is a neural network connecting multiple layers in a directed graph, which means that the signal path through the nodes only goes one way. Each node, apart from the input nodes, has a nonlinear activation function. An MLP uses backpropagation as a supervised learning technique.
MLP is widely used for solving problems that require supervised learning as well as research into computational neuroscience and parallel distributed processing. Applications include speech recognition, image recognition and machine translation.

MLP has the following features:

Adjusts the synaptic weights based on Error Correction Rule
Adopts LMS
Possess Backpropagation algorithm for recurrent propagation of error
Consists of two passes
- Feed Forward pass
- Backward pass

Learning process – Backpropagation

Computationally efficient method

3 Distinctive Characteristics of MLP:

Each neuron in network includes a non-linear activation function
Contains one or more hidden layers with hidden neurons
Network exhibits high degree of connectivity determined by the synapses of the network

3 Signals involved in MLP are:

Functional Signal
input signal
propagates forward neuron by neuron thro network and emerges at an output signal
F(x,w) at each neuron as it passes

Error Signal

Originates at an output neuron
Propagates backward through the network neuron
Involves error dependent function in one way or the other
Each hidden neuron or output neuron of MLP is designed to perform two computations:
- The computation of the function signal appearing at the output of a neuron which is expressed as a continuous non-linear function of the input signal and synaptic weights associated with that neuron
- The computation of an estimate of the gradient vector is needed for the backward pass through the network

TWO PASSES OF COMPUTATION:

In the forward pass:
- Synaptic weights remain unaltered
- Function signal are computed neuron by neuron
- Function signal of jth neuron is
- If jth neuron is output neuron, the m=mL and output of j th neuron is
- Forward phase begins with in the first hidden layer and end by computing ej(n) in the output layer
In the backward pass,
- It starts from the output layer by passing error signal towards leftward layer neurons to compute local gradient recursively in each neuron
- It changes the synaptic weight by delta rule
Gradient descent is used as an optimisation algorithm here.
Gradient descent is an iterative first-order optimisation algorithm used to find a local minimum/maximum of a given function.

Algorithm :

Import the necessary libraries of python.
After that, create a dataframe and use it in a call to the read_csv() function of the pandas library along with the name of the CSV file containing the dataset.
Divide the dataset into two parts. Where the first part is for training and the second is for testing.
Define all the basic functions needed to create an MLP.
Find the weights and bias of each neuon using the gradient descent algorithm.
Make predictions using the defined functions.
Create a function to test the predictions which also contains the algorithm to plot the image.
NOw, test the predictions and find the accuracy.

Program

Importing Libraries

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

data = pd.read_csv('train.csv')

Splitting Dataset

data = np.array(data)
m, n = data.shape
np.random.shuffle(data) ## shuffle before splitting into dev and training sets

data_dev = data[0:1000].T
Y_dev = data_dev[0]
X_dev = data_dev[1:n]
X_dev = X_dev / 255.

data_train = data[1000:m].T
Y_train = data_train[0]
X_train = data_train[1:n]
X_train = X_train / 255.
_,m_train = X_train.shape
Y_train

Defining Basic Fuctions

def init_params():
    W1 = np.random.rand(10, 784) - 0.5
    b1 = np.random.rand(10, 1) - 0.5
    W2 = np.random.rand(10, 10) - 0.5
    b2 = np.random.rand(10, 1) - 0.5
    return W1, b1, W2, b2

def ReLU(Z):
    return np.maximum(Z, 0)

def softmax(Z):
    A = np.exp(Z) / sum(np.exp(Z))
    return A
    
def forward_prop(W1, b1, W2, b2, X):
    Z1 = W1.dot(X) + b1
    A1 = ReLU(Z1)
    Z2 = W2.dot(A1) + b2
    A2 = softmax(Z2)
    return Z1, A1, Z2, A2

def ReLU_deriv(Z):
    return Z > 0

def one_hot(Y):
    one_hot_Y = np.zeros((Y.size, Y.max() + 1))
    one_hot_Y[np.arange(Y.size), Y] = 1
    one_hot_Y = one_hot_Y.T
    return one_hot_Y

def backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y):
    one_hot_Y = one_hot(Y)
    dZ2 = A2 - one_hot_Y
    dW2 = 1 / m * dZ2.dot(A1.T)
    db2 = 1 / m * np.sum(dZ2)
    dZ1 = W2.T.dot(dZ2) * ReLU_deriv(Z1)
    dW1 = 1 / m * dZ1.dot(X.T)
    db1 = 1 / m * np.sum(dZ1)
    return dW1, db1, dW2, db2

def update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha):
    W1 = W1 - alpha * dW1
    b1 = b1 - alpha * db1    
    W2 = W2 - alpha * dW2  
    b2 = b2 - alpha * db2    
    return W1, b1, W2, b2

def get_predictions(A2):
    return np.argmax(A2, 0)

def get_accuracy(predictions, Y):
    print(predictions, Y)
    return np.sum(predictions == Y) / Y.size

def gradient_descent(X, Y, alpha, iterations):
    W1, b1, W2, b2 = init_params()
    for i in range(iterations):
        Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
        dW1, db1, dW2, db2 = backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y)
        W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha)
        if i % 10 == 0:
            print("Iteration: ", i)
            predictions = get_predictions(A2)
            print(get_accuracy(predictions, Y))
    return W1, b1, W2, b2

Gradient Descent

W1, b1, W2, b2 = gradient_descent(X_train, Y_train, 0.10, 500)

Make Prediictions & Plot image

def make_predictions(X, W1, b1, W2, b2):
    _, _, _, A2 = forward_prop(W1, b1, W2, b2, X)
    predictions = get_predictions(A2)
    return predictions

def test_prediction(index, W1, b1, W2, b2):
    current_image = X_train[:, index, None]
    prediction = make_predictions(X_train[:, index, None], W1, b1, W2, b2)
    label = Y_train[index]
    print("Prediction: ", prediction)
    print("Label: ", label)
    
    current_image = current_image.reshape((28, 28)) * 255
    plt.gray()
    plt.imshow(current_image, interpolation='nearest')
    plt.show()

Test the predictions

test_prediction(0, W1, b1, W2, b2)
test_prediction(1, W1, b1, W2, b2)
test_prediction(2, W1, b1, W2, b2)
test_prediction(3, W1, b1, W2, b2)

Find the accuracy

dev_predictions = make_predictions(X_dev, W1, b1, W2, b2)
get_accuracy(dev_predictions, Y_dev)

shafeeqahameds / ex-6-handwritten-digit-recognition-using-mlp Goto Github PK

ex-6-handwritten-digit-recognition-using-mlp's Introduction

Handwritten Digit Recognition using MLP

Aim:

Theory:

TWO PASSES OF COMPUTATION:

Algorithm :

Program

Importing Libraries

Splitting Dataset

Defining Basic Fuctions

Gradient Descent

Make Prediictions & Plot image

Test the predictions

Find the accuracy

Output

Y_train

Gradient Descent

Test Predictions

Accuracy

Result:

ex-6-handwritten-digit-recognition-using-mlp's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org