Giter Club home page Giter Club logo

wlasl-recognition-and-translation's Introduction

WLASL-Recognition-and-Translation

This repository contains the "WLASL Recognition and Translation", employing the WLASL dataset descriped in "Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison" by Dongxu Li.

The project uses Cuda and pytorch, hence a system with NVIDIA graphics is required. Also, to run the system a minimum of 4-5 Gb of dedicated GPU Memory is needed.

Download Dataset


The dataset used in this project is the "WLASL" dataset and it can be found here on Kaggle

Download the dataset and place it in data/ (in the same path as WLASL directory)

Steps to Run


To run the project follow the steps

  1. Clone the repo

git clone https://github.com/alanjeremiah/WLASL-Recognition-and-Translation.git

  1. Install the packages mentioned in the requirements.txt file

Note: Need to install the correct compatible version of the cudatoolkit with pytorch. The compatible version with the command line can be found here. Below is the CLI used in this project


conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

  1. Open the WLASL/I3D folder and unzip the NLP folder in that path

  2. Open the run.py file to run the application


python run.py

Model


This repo uses the I3D model. To train the model, view the original "WLASL" repo here

NLP


The NLP models used in this project are the KeyToText and the NGram model.

The KeyToText was built over T5 model by Gagan, the repo can be found here

Demo


The end results of the project looks like this.

The conversion of Sign language to Spoken Language.

Test.mp4

wlasl-recognition-and-translation's People

Contributors

alanjeremiah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

wlasl-recognition-and-translation's Issues

Live Translation?

Does any of these scripts run the model and take input live and convert into text?
I can't find anything like this.

Training Query

I wanted to know that how are you training the model?
Are you only training the top classification layer or are you training the whole model?
If the whole model is being trained, isn't the dataset not big enough to prevent overfitting?
Also this model only replaces the I3D classification layer(600 classes) with the number of classes we suggest. Right? Or is it doing something else?

Unable to Fetch nlp.zip due to Git LFS Quota Exceeded

I am encountering an issue while trying to clone the repository and fetch the nlp.zip file due to a Git LFS quota exceeded error. Upon inspecting the raw file in the repository, I see the following Git LFS pointer:

version https://git-lfs.github.com/spec/v1
oid sha256:a66ba8f8a2a8bac6722cb0775a18c9206e2603aa8fd8ca6e3b5af4a8d3caa219
size 493238288
Since the repository has exceeded its data quota, I'm unable to access this file.

Steps to Reproduce:

Clone the repository:
Attempt to fetch nlp.zip.
Expected Behavior:
I should be able to fetch the nlp.zip file without encountering a Git LFS quota exceeded error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.