Giter Club home page Giter Club logo

sign-language-recognition's Introduction

Sign language recognition

Hi! This repository is a blog about a sign language recognition project I did. The final program was able to identify 100 sign language words in real-time. Here are two videos demonstrating the final program.

tall.woman.decide.change.pink.shirt.mov
black.bird.eat.apple.before.brown.cow.mov

Pretty cool, don't you think? As I mentioned, this repository is structured as a blog and includes 15 posts detailing my approach, the issues I encountered, and my thoughts along the way.

Project description

American sign language is commonly used by the deaf community in North America. The language is entierly visual and involves making complex gestures with the hands.

My goal for this project was to create a sign language interpretation program that could recognize American sign language letters and words. I wanted to use various deep learning models and methods such as convolutional and recurrent networks. I also wanted to practice with libraries such as TensorFlow, Keras and OpenCV.

The project is separated into multiple phases.

Phase 1 - ASL alphabet recognition

image

For this first phase I had 2 main goals.

  1. Train a model that can classify still images of ASL letters.
  2. Run the model in real time using a live video from my webcam.

Blog posts 1 - 4 are related to Phase 1.

  1. Introduction
  2. Building a basic CNN model
  3. Testing the model
  4. Improving the model

The following code files contain the code for this phase:

"Code files/Phase 1 development files/sign_alphabet_train_model.py"
"Code files/Phase 1 development files/run_sign_language_alphabet_detector.py"

Here is a video of the final result of Phase 1.

Sign.language.recognition.final.mp4

Phase 1 was a success (mostly). I created a custom neural network that could classify images from my webcam and identify letters of the alphabet. The model accuracy wasn't perfect, but it was enough of a proof of concept to justify moving to phase 2.

Phase 2 - Word level recognition

In sign language, words are more complex than the alphabet. Here is a video of someone signing the word 'book'.

07068.mov

As you can see, there is movement involved. In sign language, most words cannot be identified from a single image. Phase 2 of the project had basically the same goals as phase 1, but using videos instead of still images.

For phase 2, I had 2 main goals.

  1. Train a model that can classify video clips of ASL words.
  2. Run the model in real time using video from my webcam.

Blog posts 5 - 15 are related to phase 2

  1. Graduating from the alphabet to words
  2. Training a model for video identification
  3. Reorganizing and inspecting the dataset
  4. Training a model for video identification
  5. Setting up the real-time video classification
  6. Scaling up the model
  7. Switching to a pose estimation approach
  8. Increasing the holistic feature model vocabulary
  9. Refactoring to improve program speed
  10. Implementing the new holistic cropping approach
  11. Testing the final model

The code for these posts is available in '\Code files\Phase 2 development and test files'. However, these files are not well organized and may be difficult to follow. I recommend reading through the blog posts which include important sections of code and accompanying explanations. The final program is available in '\Code files\ASL_word_detector_main.py'.

Several different methods were used in Phase 2. A generic CNN based feature extractor was tested as well as the YOLOv5 model for object detection. In the end, both of these appraoches were abandoned in favour of the mediapipe Holistic model. The mediapipe model tracks body landmarks. Then, a custom cropping function draws a bounding box around the points that were generated by the Holistic model. Below is an example of the holistic model landmark tracking. The coordinates of these landmarks were used as features and passed to a custom classification model.

holistic.and.cropping.shown.mov

The custom classification model used GRU layers followed by several dense layers. The classification model was trained to identify 100 different words. Below is an example of the program in action. In the below example, the program recorded 2 seconds of video then classified that 2 second clip. In later versions of the code (such as the examples at the top of the page) this 2 second interval was reduced to 1 second.

mother.want.son.study.but.son.decide.play.basketball.mov

The final classification model reached a validation accuracy of 91.74%. When tested on myself performing all 100 sign language words, the model correctly identified 78 words on the first try and 94 of the words in 3 tries or fewer. Overall I am pleased with the model performance.

What I learned

The project provided an opportunity to practice a variety of python libraries and machine learning techniques.

In this project I worked with tensorflow, keras, opencv, pandas, and numpy libraries, among others. I used transfer learning for feature extraction as well as object detection with the YOLOv5 model. I identified problem with the model and dataset, and came up with solutions to improve classification performance. This included abandoning the aforementioned YOLOv5 model and re-factoring the code in order to run the code in real-time. The final program uses the mediapipe Holistic model for feature extraction and a custom GRU neural network for classification.

Thank you for taking the time to check out my project!

sign-language-recognition's People

Contributors

spardoel avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.