This project uses computer vision and deep learning to convert American Sign Language finger spelling into text in real-time. It allows deaf and hard of hearing individuals to more easily communicate with others not familiar with sign language.
- Python 3.8+
- OpenCV
- TensorFlow 2.0+
- Keras
- Numpy
- Tkinter
- Hunspell
- Clone the repo:
git clone https://github.com/yourusername/Sign-Language-to-Text.git
- Create a virtual environment:
python -m venv venv source venv/bin/activate # Linux/Mac .\venv\Scripts\activate # Windows
- Install the required packages:
pip install -r requirements.txt
Run the main application script:
python app_working.ipynb
This will open the sign language to text conversion application window:
![Application Window][]
Position your hand within the detection frame and perform ASL finger spelling gestures. The application will recognize the signs in real-time and display:
- The predicted letter
- The predicted word
- The predicted sentence
Suggested word completions are displayed at the bottom which can be selected to autocomplete the current word.
The high-level methodology is:
- Frame capture and ROI extraction
- Preprocessing (grayscale, blur, thresholding)
- Prediction using CNN model
- Post-processing of predictions
- Displaying results
Each captured frame undergoes:
- Grayscale conversion
- Gaussian blur
- Adaptive thresholding
- Binary thresholding
This isolates the hand gesture and reduces noise.![Preprocessing][]
The core of the system is a Convolutional Neural Network which classifies the preprocessed image into one of 26 classes (A-Z).
The model architecture is:
- Conv2D layer (32 filters, 3x3 kernel)
- Max Pooling (2x2)
- Conv2D layer (32 filters, 3x3 kernel)
- Max Pooling (2x2)
- Flatten
- Dense layer (128 units, ReLU)
- Dropout (0.4)
- Dense layer (96 units, ReLU)
- Dropout (0.4)
- Dense layer (64 units, ReLU)
- Output Dense layer (27 units, Softmax)
The model is trained on a custom dataset of ASL finger spelling images. Data augmentation is used to improve robustness.
For each frame:
- Preprocess frame
- Get CNN prediction
- If high confidence, update current letter
- Else if timeout, update word and sentence
- Display results
- Get word suggestions from Hunspell
- Display suggestions
Suppose the user finger spells "H-E-L-L-O".
- "H" is held, CNN predicts "H". Current letter becomes "H".
- "E" is held, CNN predicts "E". Current letter becomes "E", word becomes "HE".
- "L" is held, CNN predicts "L". Current letter becomes "L", word becomes "HEL".
- "L" is held, CNN predicts "L". Current letter stays "L", word becomes "HELL".
- "O" is held, CNN predicts "O". Current letter becomes "O", word becomes "HELLO".
- Hunspell suggests completions like "HELLOS", "HELLOED", etc.
- User can select a suggestion or continue finger spelling.
The sentence continues to grow until the user clears it with a keyboard interrupt.
This real-time sign language to text conversion system using deep learning enables easier communication between deaf/hard of hearing individuals and others. The CNN model accurately classifies ASL finger spelling gestures, while the Hunspell integration provides intelligent word completions for faster communication.
Future work could expand this to complete ASL gestures beyond finger spelling, and potentially other sign languages as well. It could also be ported to mobile devices for even greater accessibility.