arend100 / final Goto Github PK

View Code? Open in Web Editor NEW

# Final_Project ## English Alphabet Machine Learning Model This project utilizes the EMNIST-letters image database. This database is an extension of the MNIST database of handwritten digits. EMNIST datasets can be found at: https://www.nist.gov/node/1298471/emnist-dataset. MNIST datasets can be found at: http://yann.lecun.com/exdb/mnist/. This project imported data by pip installing the emnist (and mnist) libraries. Note - the "data/python-mnist" is from sorki/python-mnist on github. This repository can also be utilized with EMNIST testing. This project did not utilize the sorki folder. #### Notes on EMNIST 'Letters' Dataset "The EMNIST Balanced dataset contains a set of characters with an equal number of samples per class. The EMNIST Letters dataset merges a balanced set of the uppercase and lowercase letters into a single 26-class task. The EMNIST Digits and EMNIST MNIST dataset provide balanced handwritten digit datasets directly compatible with the original MNIST dataset. The EMNIST Letters dataset seeks to further reduce the errors occurring from case confusion by merging all the uppercase and lowercase classes to form a balanced 26-class classification task. In a similar vein, the EMNIST Digits class contains a balanced subset of the digits dataset containing 28,000 samples of each digit." ## About Model This model uses the EMNIST 'Letters' dataset. Refer to 'EMNIST.ipynb' for a detailed look at the model. This model trained at 95% accuracy and test at 91% accuracy. ``` model.fit(X_train, y_train, batch_size=128, epochs=10, shuffle=True, verbose=2) ``` ```Epoch 10/10 1248000/1248000 - 19s - loss: 0.1345 - acc: 0.9512 `` ``` model_loss, model_accuracy = model.evaluate(X_test, y_test, verbose=2) print(f"Loss: {model_loss}, Accuracy: {model_accuracy}") 20800/20800 - 2s - loss: 0.4131 - acc: 0.9130 Loss: 0.41310153188356846, Accuracy: 0.9129807949066162 ## Predicting the Model A: 1, B: 2, C: 3, D: 4, E: 5, F: 6, G: 7, H: 8, I: 9, J: 10, K: 11, L: 12, M: 13, N: 14, O: 15, P: 16, Q: 17, R: 18, S: 19, T: 20, U: 21, V: 22, W: 23, X: 24, Y: 25, Z: 26 This model tested a couple letters from the test. Both were correctly predicted. This model also collected some images from the internet to test. These images came from https://graphemica.com. These images are in the 'images' folder. The first image predicted incorrectly, but the same letter in a different font was predicted correctly. The second imported image seems to be more similar to the emnist dataset. This model attempted to read newly handwritten data (written and imported myself) to test the accuracy of the emnist model made and actual handwriting. The results were not as thought. An image of each letter of the alphabet (capital only) were pictured and uploaded. Their are 4-sets: Pencil, Pen, Sharpie, and Marker. The goal of this was to determine which writing utensil would test the most accurate with the emnist set, however; the pictures imported are all sideways and rotating them made the images much lighter and illegible thus predicting failure for every one tested (only one of each utensil was predicted in the EMNIST.ipynb) A self created model based off the 26 images uploaded for each utensil were created only to have an accuracy of 0. More photos would need to be taken as well as better pixel conversions need to be done to get from 4D down to 2D. ## Conclusions The emnist model itself runs extremely well. The model run with api images need more testing but predicted correct results. Actual handwritten data needs to be redone to come up with better predictions. This is a case of bad data going in, bad data coming out with the self model. Better quality pictures need to be taken and uploaded so rotating is not needed by the 'pillow' library.

Jupyter Notebook 99.77% HTML 0.23%

final's Introduction

Final_Project

English Alphabet Machine Learning Model

This project utilizes the EMNIST-letters image database. This database is an extension of the MNIST database of handwritten digits. EMNIST datasets can be found at: https://www.nist.gov/node/1298471/emnist-dataset. MNIST datasets can be found at: http://yann.lecun.com/exdb/mnist/. This project imported data by pip installing the emnist (and mnist) libraries. Note - the "data/python-mnist" is from sorki/python-mnist on github. This repository can also be utilized with EMNIST testing. This project did not utilize the sorki folder.

Notes on EMNIST 'Letters' Dataset "

The EMNIST Balanced dataset contains a set of characters with an equal number of samples per class. The EMNIST Letters dataset merges a balanced set of the uppercase and lowercase letters into a single 26-class task. The EMNIST Digits and EMNIST MNIST dataset provide balanced handwritten digit datasets directly compatible with the original MNIST dataset. The EMNIST Letters dataset seeks to further reduce the errors occurring from case confusion by merging all the uppercase and lowercase classes to form a balanced 26-class classification task. In a similar vein, the EMNIST Digits class contains a balanced subset of the digits dataset containing 28,000 samples of each digit."

About Model

This model uses the EMNIST 'Letters' dataset. Refer to 'EMNIST.ipynb' for a detailed look at the model. This model trained at 95% accuracy and test at 91% accuracy. model.fit(X_train, y_train, batch_size=128, epochs=10, shuffle=True, verbose=2) Epoch 10/10 1248000/1248000 - 19s - loss: 0.1345 - acc: 0.9512 `` model_loss, model_accuracy = model.evaluate(X_test, y_test, verbose=2) print(f"Loss: {model_loss}, Accuracy: {model_accuracy}") 20800/20800 - 2s - loss: 0.4131 - acc: 0.9130 Loss: 0.41310153188356846, Accuracy: 0.9129807949066162

Predicting the Model

A: 1, B: 2, C: 3, D: 4, E: 5, F: 6, G: 7, H: 8, I: 9, J: 10, K: 11, L: 12, M: 13, N: 14, O: 15, P: 16, Q: 17, R: 18, S: 19, T: 20, U: 21, V: 22, W: 23, X: 24, Y: 25, Z: 26
This model tested a couple letters from the test. Both were correctly predicted. This model also collected some images from the internet to test. These images came from https://graphemica.com. These images are in the 'images' folder. The first image predicted incorrectly, but the same letter in a different font was predicted correctly. The second imported image seems to be more similar to the emnist dataset. This model attempted to read newly handwritten data (written and imported myself) to test the accuracy of the emnist model made and actual handwriting. The results were not as thought. An image of each letter of the alphabet (capital only) were pictured and uploaded. Their are 4-sets: Pencil, Pen, Sharpie, and Marker. The goal of this was to determine which writing utensil would test the most accurate with the emnist set, however; the pictures imported are all sideways and rotating them made the images much lighter and illegible thus predicting failure for every one tested (only one of each utensil was predicted in the EMNIST.ipynb) A self created model based off the 26 images uploaded for each utensil were created only to have an accuracy of 0. More photos would need to be taken as well as better pixel conversions need to be done to get from 4D down to 2D.

Conclusions

The emnist model itself runs extremely well. The model run with api images need more testing but predicted correct results. Actual handwritten data needs to be redone to come up with better predictions. This is a case of bad data going in, bad data coming out with the self model. Better quality pictures need to be taken and uploaded so rotating is not needed by the 'pillow' library.

Recommend Projects

arend100 / final Goto Github PK

final's Introduction

Final_Project

English Alphabet Machine Learning Model

Notes on EMNIST 'Letters' Dataset "

About Model

Predicting the Model

Conclusions

final's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent