Sign Language to Text Conversion

This project uses computer vision and deep learning to convert American Sign Language finger spelling into text in real-time. It allows deaf and hard of hearing individuals to more easily communicate with others not familiar with sign language.

Prerequisites

Python 3.8+
OpenCV
TensorFlow 2.0+
Keras
Numpy
Tkinter
Hunspell

Installation

Clone the repo:

git clone https://github.com/yourusername/Sign-Language-to-Text.git

Create a virtual environment:

python -m venv venv
source venv/bin/activate # Linux/Mac
.\venv\Scripts\activate # Windows

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Run the main application script:

python app_working.ipynb

This will open the sign language to text conversion application window:

![Application Window][]

Position your hand within the detection frame and perform ASL finger spelling gestures. The application will recognize the signs in real-time and display:

The predicted letter
The predicted word
The predicted sentence

Suggested word completions are displayed at the bottom which can be selected to autocomplete the current word.

Methodology

The high-level methodology is:

Frame capture and ROI extraction
Preprocessing (grayscale, blur, thresholding)
Prediction using CNN model
Post-processing of predictions
Displaying results

Preprocessing

Each captured frame undergoes:

Grayscale conversion
Gaussian blur
Adaptive thresholding
Binary thresholding

This isolates the hand gesture and reduces noise.![Preprocessing][]

CNN Model

The core of the system is a Convolutional Neural Network which classifies the preprocessed image into one of 26 classes (A-Z).

The model architecture is:

Conv2D layer (32 filters, 3x3 kernel)
Max Pooling (2x2)
Conv2D layer (32 filters, 3x3 kernel)
Max Pooling (2x2)
Flatten
Dense layer (128 units, ReLU)
Dropout (0.4)
Dense layer (96 units, ReLU)
Dropout (0.4)
Dense layer (64 units, ReLU)
Output Dense layer (27 units, Softmax)

The model is trained on a custom dataset of ASL finger spelling images. Data augmentation is used to improve robustness.

Prediction Flow

For each frame:

Preprocess frame
Get CNN prediction
If high confidence, update current letter
Else if timeout, update word and sentence
Display results
Get word suggestions from Hunspell
Display suggestions

Example

Suppose the user finger spells "H-E-L-L-O".

"H" is held, CNN predicts "H". Current letter becomes "H".
"E" is held, CNN predicts "E". Current letter becomes "E", word becomes "HE".
"L" is held, CNN predicts "L". Current letter becomes "L", word becomes "HEL".
"L" is held, CNN predicts "L". Current letter stays "L", word becomes "HELL".
"O" is held, CNN predicts "O". Current letter becomes "O", word becomes "HELLO".
Hunspell suggests completions like "HELLOS", "HELLOED", etc.
User can select a suggestion or continue finger spelling.

The sentence continues to grow until the user clears it with a keyboard interrupt.

Conclusion

This real-time sign language to text conversion system using deep learning enables easier communication between deaf/hard of hearing individuals and others. The CNN model accurately classifies ASL finger spelling gestures, while the Hunspell integration provides intelligent word completions for faster communication.

Future work could expand this to complete ASL gestures beyond finger spelling, and potentially other sign languages as well. It could also be ported to mobile devices for even greater accessibility.

sidkush / sign-language-to-text Goto Github PK

sign-language-to-text's Introduction

Sign Language to Text Conversion

Prerequisites

Installation

Usage

Methodology

Preprocessing

CNN Model

Prediction Flow

Example

Conclusion

sign-language-to-text's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent