Giter Club home page Giter Club logo

carnd-traffic-sign-classifier-project's Introduction

Self-Driving Car Engineer Nanodegree

Luis Miguel Zapata


Traffic Sign Recognition Classifier

This projects aims to develop a Deep Neural Network classifier for traffic signs from the German Traffic Sign Dataset.

1. Dataset.

The dataset consists of 3 different pickle files containing images and labels for the training, the validation and the testing.

  • Number of training examples = 34799
  • Number of validation examples = 4410
  • Number of testing examples = 12630

As it can be seen in the histograms shown below, the amount of examples per class in every dataset show similar shapes and regardless from not having an uniform distributions, the amount of examples per class in the training set is proportional to the same class in validation and test datasets per class.

Training Validation Testing

The images come in RGB format and are already contained in numpy arrays and the labels come as a list of integers.

  • Image data shape = (32, 32, 3)
  • Number of classes = 43

Every image has a corresponding label and these labels correspond to a category of traffic signal. These categories can be seen in the file signnames.csv.

Example 1 Example 2 Example 3

2. Preprocessing

The pre-processing pipeline consists in three different steps.

  • Grayscale: The RGB channels disappear and only one channel corresponding to intensities remain.
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  • Histogram equalzation: The contrast of the image is enhanced obtaining a more uniform histogram.
gray_equalized = cv2.equalizeHist(gray)
Original Grayscale Histogram Equalization
  • Normalization: The values of the intensity image no longer go from 0 to 255, but they range now from -1 to 1 in floating point format. Still using Matplotlib visualization the image looks identical to equalized one.

norm_image = (gray_equalized - 128.0)/ 128.0

This procedure will enhance the results since the images itselves and their information is more meaningful.

3. Model Architecture.

The model used here is inspired in the LeNet architecture which consists in the following:

  • 2D Convolution: 6 Filters 5x5 + a Bias and stride of 1 by 1. Input = 32x32x1. Output = 28x28x6.
  • ReLu activation.
  • Max Pooling: Filter 2x2. Stride 1 x 1. Input = 28x28x6. Output = 14x14x6.
  • 2D Convolution: 16 Filters 5x5 + Bias. Stride 1 by 1. Input = 14x14x6. Output = 10x10x16.
  • ReLu Activation.
  • Max Pooling: Filter 2x2. Stride 1 x 1. Input = 10x10x16. Output = 5x5x16.
  • Flatten: Input = 5x5x16. Output = 400.
  • Multilayer perceptron: Input = 400. Output = 120.
  • Multilayer perceptron: Input = 120. Output = 84.
  • ReLu Activation.
  • Multilayer perceptron (Output): Input = 84. Output = 43.

3. Model Training.

Placeholders for both the features and the labels are defined and the one-hot encoding is applied to the labels.

x = tf.placeholder(tf.float32, (None, 32, 32, 1))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y, 43)

The learning rate is set to 0.001 experimentaly and the loss function will be optimized using the Adam Optimization Algorithm. This loss function is the cross entropy between the one-hot encoded labels and the logits obtained by the model described before.

rate = 0.001

logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=one_hot_y, logits=logits)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)

The accuracy of the model will be the mean of the correct predictions and this metric will be evaluated every epoch for every image of the batch. This procedure will be performed 100 times and for every time a set of 128 images will be used.

# A model to evaluate the accuracy is defined
EPOCHS = 100
BATCH_SIZE = 128

correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()

def evaluate(X_data, y_data):
    num_examples = len(X_data)
    total_accuracy = 0
    sess = tf.get_default_session()
    for offset in range(0, num_examples, BATCH_SIZE):
        batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
        accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y})
        total_accuracy += (accuracy * len(batch_x))
    return total_accuracy / num_examples

The training consists in optimizing the CNN weights according to the cross entropy results for every epoch.

# The features and labels are shuffled every epoch
from sklearn.utils import shuffle

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    num_examples = len(X_train)
    
    print("Training...")
    print()
    for i in range(EPOCHS):
        X_train, y_train = shuffle(X_train, y_train)
        for offset in range(0, num_examples, BATCH_SIZE):
            end = offset + BATCH_SIZE
            batch_x, batch_y = X_train[offset:end], y_train[offset:end]
            sess.run(training_operation, feed_dict={x: batch_x, y: batch_y})
            
        validation_accuracy = evaluate(X_valid, y_valid)
        print("EPOCH {} ...".format(i+1))
        print("Validation Accuracy = {:.3f}".format(validation_accuracy))
        print()
        
    saver.save(sess, './lenet')
    print("Model saved")

The resulting accurracy score for this configuration of hyper parameters and netwrok model is around 0.948 obtaining also a score of 0.932 for the testing set.

# Check performance in the test data set
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))

    test_accuracy = evaluate(X_test, y_test)
    print("Test Accuracy = {:.3f}".format(test_accuracy))

4. New images.

A set of 5 new images is downloaded from the internet, these images do not belong to the original dataset and this is in order to test it's performance. To feed these trough the neural network the images have to be resized and their labels are loaded into a python's list [31 4 26 17 22].

images.append(cv2.resize(np_image, (32, 32)))
Test 1 Test 2 Test 3
Test 4 Test 5

Before making prediction on these images, these have to go trough the same pre-processing steps performed while the network was being trained and from here the predictions can be performed obtaining: [23 4 18 17 34].

# Preprocess the images
test_images_processed = preprocessing(test_images)

prediction = tf.argmax(logits, 1)
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))
    results = sess.run(prediction, feed_dict={x:test_images_processed})
    print("Prediction = {}".format(results))   

Only two out of 5 images obtained a correct prediction for an accuracy of 0.40 meaning that the network still does not know how to completely handle the differences between these images and the ones it was trained with. From sight for images 1 and 5 it can be seen that the network relies on the image to not to be rotated and image number 3 differs from the images from the dataset in not having the black shape of the traffic light which might explain it's failure in prediction.

Top 5 predictions

Using the function tf.nn.top_k() the classes of highest probabilities for each image can be obtain. This is useful to gain some insight of what the network is doing or why it missclassified some input.

k = 5
softmax = tf.nn.softmax(logits)
softmax_shape = softmax.shape
top_values, top_indices = tf.nn.top_k(softmax, k)
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))
    val, ind = sess.run([top_values, top_indices], feed_dict={x:test_images_processed})
    print("Values per image = {}".format(val))
    print("Index per image = {}".format(ind))

Notice that the second largest probability for the first image is it's correct class, the third image has it's correct class in the fifth place and for image number 5 it does not appear in it's 5 top values meaning that the CNN layers somehow get a few correct activations.

  • Correct class: 31, 5 most probable classes: [23 31 19 5 30]
  • Correct class: 4, 5 most probable classes: [4 0 1 31 37]
  • Correct class: 26, 5 most probable classes: [18 11 37 27 26]
  • Correct class: 17, 5 most probable classes: [17 13 34 32 14]
  • Correct class: 22, 5 most probable classes: [34 28 35 1 30]

carnd-traffic-sign-classifier-project's People

Contributors

lmzh123 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.