Giter Club home page Giter Club logo

ex-6-handwritten-digit-recognition-using-mlp's Introduction

Handwritten Digit Recognition using MLP

Aim:

To Recognize the Handwritten Digits using Multilayer perceptron.

Theory:

  • We have used an MLP to recognize the digits
  • A multilayer perceptron (MLP) is a feedforward artificial neural network that generates a set of outputs from a set of inputs. An MLP is characterized by several layers of input nodes connected as a directed graph between the input and output layers. MLP uses back propagation for training the network. MLP is a deep learning method.
  • A multilayer perceptron is a neural network connecting multiple layers in a directed graph, which means that the signal path through the nodes only goes one way. Each node, apart from the input nodes, has a nonlinear activation function. An MLP uses backpropagation as a supervised learning technique.
  • MLP is widely used for solving problems that require supervised learning as well as research into computational neuroscience and parallel distributed processing. Applications include speech recognition, image recognition and machine translation.

MLP has the following features:

  • Adjusts the synaptic weights based on Error Correction Rule
  • Adopts LMS
  • Possess Backpropagation algorithm for recurrent propagation of error
  • Consists of two passes
    • Feed Forward pass
    • Backward pass

Learning process โ€“ Backpropagation

Computationally efficient method

image 10

3 Distinctive Characteristics of MLP:

  • Each neuron in network includes a non-linear activation function
    image

  • Contains one or more hidden layers with hidden neurons

  • Network exhibits high degree of connectivity determined by the synapses of the network

3 Signals involved in MLP are:

  • Functional Signal
  • input signal
  • propagates forward neuron by neuron thro network and emerges at an output signal
  • F(x,w) at each neuron as it passes

Error Signal

  • Originates at an output neuron
  • Propagates backward through the network neuron
  • Involves error dependent function in one way or the other
  • Each hidden neuron or output neuron of MLP is designed to perform two computations:
    • The computation of the function signal appearing at the output of a neuron which is expressed as a continuous non-linear function of the input signal and synaptic weights associated with that neuron
    • The computation of an estimate of the gradient vector is needed for the backward pass through the network

TWO PASSES OF COMPUTATION:

  • In the forward pass:

    • Synaptic weights remain unaltered

    • Function signal are computed neuron by neuron

    • Function signal of jth neuron is
      image
      image
      image

    • If jth neuron is output neuron, the m=mL and output of j th neuron is
      image

    • Forward phase begins with in the first hidden layer and end by computing ej(n) in the output layer
      image

  • In the backward pass,

    • It starts from the output layer by passing error signal towards leftward layer neurons to compute local gradient recursively in each neuron

    • It changes the synaptic weight by delta rule image

  • Gradient descent is used as an optimisation algorithm here.

  • Gradient descent is an iterative first-order optimisation algorithm used to find a local minimum/maximum of a given function.

Algorithm :

  1. Import the necessary libraries of python.

  2. After that, create a dataframe and use it in a call to the read_csv() function of the pandas library along with the name of the CSV file containing the dataset.

  3. Divide the dataset into two parts. Where the first part is for training and the second is for testing.

  4. Define all the basic functions needed to create an MLP.

  5. Find the weights and bias of each neuon using the gradient descent algorithm.

  6. Make predictions using the defined functions.

  7. Create a function to test the predictions which also contains the algorithm to plot the image.

  8. NOw, test the predictions and find the accuracy.

Program

Importing Libraries

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

data = pd.read_csv('train.csv')

Splitting Dataset

data = np.array(data)
m, n = data.shape
np.random.shuffle(data) ## shuffle before splitting into dev and training sets

data_dev = data[0:1000].T
Y_dev = data_dev[0]
X_dev = data_dev[1:n]
X_dev = X_dev / 255.

data_train = data[1000:m].T
Y_train = data_train[0]
X_train = data_train[1:n]
X_train = X_train / 255.
_,m_train = X_train.shape
Y_train

Defining Basic Fuctions

def init_params():
    W1 = np.random.rand(10, 784) - 0.5
    b1 = np.random.rand(10, 1) - 0.5
    W2 = np.random.rand(10, 10) - 0.5
    b2 = np.random.rand(10, 1) - 0.5
    return W1, b1, W2, b2

def ReLU(Z):
    return np.maximum(Z, 0)

def softmax(Z):
    A = np.exp(Z) / sum(np.exp(Z))
    return A
    
def forward_prop(W1, b1, W2, b2, X):
    Z1 = W1.dot(X) + b1
    A1 = ReLU(Z1)
    Z2 = W2.dot(A1) + b2
    A2 = softmax(Z2)
    return Z1, A1, Z2, A2

def ReLU_deriv(Z):
    return Z > 0

def one_hot(Y):
    one_hot_Y = np.zeros((Y.size, Y.max() + 1))
    one_hot_Y[np.arange(Y.size), Y] = 1
    one_hot_Y = one_hot_Y.T
    return one_hot_Y

def backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y):
    one_hot_Y = one_hot(Y)
    dZ2 = A2 - one_hot_Y
    dW2 = 1 / m * dZ2.dot(A1.T)
    db2 = 1 / m * np.sum(dZ2)
    dZ1 = W2.T.dot(dZ2) * ReLU_deriv(Z1)
    dW1 = 1 / m * dZ1.dot(X.T)
    db1 = 1 / m * np.sum(dZ1)
    return dW1, db1, dW2, db2

def update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha):
    W1 = W1 - alpha * dW1
    b1 = b1 - alpha * db1    
    W2 = W2 - alpha * dW2  
    b2 = b2 - alpha * db2    
    return W1, b1, W2, b2

def get_predictions(A2):
    return np.argmax(A2, 0)

def get_accuracy(predictions, Y):
    print(predictions, Y)
    return np.sum(predictions == Y) / Y.size

def gradient_descent(X, Y, alpha, iterations):
    W1, b1, W2, b2 = init_params()
    for i in range(iterations):
        Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
        dW1, db1, dW2, db2 = backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y)
        W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha)
        if i % 10 == 0:
            print("Iteration: ", i)
            predictions = get_predictions(A2)
            print(get_accuracy(predictions, Y))
    return W1, b1, W2, b2

Gradient Descent

W1, b1, W2, b2 = gradient_descent(X_train, Y_train, 0.10, 500)

Make Prediictions & Plot image

def make_predictions(X, W1, b1, W2, b2):
    _, _, _, A2 = forward_prop(W1, b1, W2, b2, X)
    predictions = get_predictions(A2)
    return predictions

def test_prediction(index, W1, b1, W2, b2):
    current_image = X_train[:, index, None]
    prediction = make_predictions(X_train[:, index, None], W1, b1, W2, b2)
    label = Y_train[index]
    print("Prediction: ", prediction)
    print("Label: ", label)
    
    current_image = current_image.reshape((28, 28)) * 255
    plt.gray()
    plt.imshow(current_image, interpolation='nearest')
    plt.show()

Test the predictions

test_prediction(0, W1, b1, W2, b2)
test_prediction(1, W1, b1, W2, b2)
test_prediction(2, W1, b1, W2, b2)
test_prediction(3, W1, b1, W2, b2)

Find the accuracy

dev_predictions = make_predictions(X_dev, W1, b1, W2, b2)
get_accuracy(dev_predictions, Y_dev)

Output

Y_train

y

Gradient Descent

gd1

gd2

Test Predictions

p1

p2

p3

p4

Accuracy

acc

Result:

Thus, a MLP is created to recognize the handwritten digits

ex-6-handwritten-digit-recognition-using-mlp's People

Contributors

lavanyajoyce avatar shafeeqahameds avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.