Skip to content

Latest commit

 

History

History
78 lines (36 loc) · 2.01 KB

File metadata and controls

78 lines (36 loc) · 2.01 KB

WLASL-Recognition-and-Translation

This repository contains the "WLASL Recognition and Translation", employing the WLASL dataset descriped in "Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison" by Dongxu Li.

The project uses Cuda and pytorch, hence a system with NVIDIA graphics is required. Also, to run the system a minimum of 4-5 Gb of dedicated GPU Memory is needed.

Download Dataset


The dataset used in this project is the "WLASL" dataset and it can be found here on Kaggle

Download the dataset and place it in data/ (in the same path as WLASL directory)

Steps to Run


To run the project follow the steps

  1. Clone the repo

git clone https://github.com/alanjeremiah/WLASL-Recognition-and-Translation.git

  1. Install the packages mentioned in the requirements.txt file

Note: Need to install the correct compatible version of the cudatoolkit with pytorch. The compatible version with the command line can be found here. Below is the CLI used in this project


conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

  1. Open the WLASL/I3D folder and unzip the NLP folder in that path

  2. Open the run.py file to run the application


python run.py

Model


This repo uses the I3D model. To train the model, view the original "WLASL" repo here

NLP


The NLP models used in this project are the KeyToText and the NGram model.

The KeyToText was built over T5 model by Gagan, the repo can be found here

Demo


The end results of the project looks like this.

The conversion of Sign language to Spoken Language.

Test.mp4