Skip to content

A web-based tool that translates English text into International Sign Language using pre-rendered 3D sign videos and AI-generated gloss using Chatgpt.

Notifications You must be signed in to change notification settings

Ataba29/SoundSigns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SoundSigns: Speech to Sign Language Translator

Overview

SoundSigns is a comprehensive web-based application that translates spoken English into International Sign Language (ISL) in real-time. The system captures speech, converts it to text, translates it into ISL gloss, and displays the translation through a 3D animated avatar using pre-rendered video clips.

Features

  • Real-time Speech Recognition: Browser-based speech-to-text conversion using Web Speech API
  • ISL Gloss Translation: Converts English text to International Sign Language gloss using ChatGPT API
  • 3D Avatar Animation: Visual sign language representation through pre-rendered MP4 video clips
  • Video Assembly: Seamless concatenation of individual sign videos into coherent sentences
  • Interactive Interface: Clean, responsive UI with microphone controls and video playback
  • Multi-format Support: Covers alphabet letters (A-Z), numbers (0-9), and common vocabulary
  • Download Functionality: Save translated sign language videos for offline use
  • Cross-browser Compatibility: Works on modern browsers supporting Web Speech API

Architecture

The application follows a modular three-tier architecture:

  • Frontend: React.js with Tailwind CSS handling user interaction and video processing
  • Backend: Flask server managing API communications and text-to-gloss conversion
  • Dataset: Curated collection of ~150 pre-rendered ISL sign videos

Prerequisites

  • Python 3.8+
  • Node.js 14+
  • OpenAI API Key
  • Modern web browser with Web Speech API support (Chrome, Edge recommended)

Installation

Backend Setup

  1. Install Python dependencies:
pip install sounddevice numpy openai flask flask-cors python-dotenv
  1. Create a .env file in the backend/ directory:
OPENAI_API_KEY=your_openai_key_here

Security Note: Never commit the .env file to version control.

Frontend Setup

  1. Navigate to the frontend directory and install dependencies:
cd frontend
npm install

Running the Application

  1. Start the frontend development server:
cd frontend
npm run dev
  1. In a separate terminal, start the backend server from the project root:
py backend/transcription.py
  1. Access the application at http://localhost:3000 (or the port specified by your dev server)

Usage

  1. Voice Input: Click the microphone button and speak clearly in English
  2. Transcription: View the real-time speech-to-text conversion
  3. Translation: See the ISL gloss translation displayed
  4. Video Playback: Watch the 3D avatar perform the signed translation
  5. Controls: Use play, replay, and download buttons to control video playback

Project Structure

project-root/
├── backend/
│   ├── .env                 # Environment variables (not in version control)
│   └── transcription.py     # Flask server and API logic
├── frontend/
│   ├── src/
│   │   ├── components/      # React components
│   │   └── App.jsx         # Main application file
│   └── assets/
│       └── videos/         # Pre-rendered sign language videos
│           ├── letters/    # A-Z alphabet signs
│           ├── numbers/    # 0-9 numerical signs
│           └── words/      # Common vocabulary signs

Technologies Used

  • Frontend: React.js, Tailwind CSS, Web Speech API
  • Backend: Python, Flask, Flask-CORS
  • Translation: OpenAI GPT-3.5-turbo API
  • Video Processing: Browser-based video concatenation
  • Dataset: Pre-rendered MP4 videos with 3D ISL avatar

System Requirements

  • Browser: Chrome, Edge, or other browsers with Web Speech API support
  • Microphone: Required for speech input
  • Internet Connection: Required for OpenAI API access

Known Limitations

  • Limited vocabulary dataset (~150 signs)
  • Words not in the dataset are finger-spelled letter by letter
  • Translation accuracy depends on ChatGPT's ISL gloss generation
  • Requires a quiet environment for optimal speech recognition
  • System latency of 3-5 seconds for the complete translation process

Contributing

This project was developed as a completed academic capstone project and is no longer under active development.
At this time, we are not accepting contributions or pull requests.

Thank you for your interest and understanding.

Dataset Attribution

The sign language video dataset is sourced from the open-source "Text-Speech to Sign Language Generator" project by JS-Coderr (2024), available on GitHub.

License

This project is the intellectual property of Ahmad Ataba, Waseem Saleem, and Braude Engineering College.
It was developed as a capstone project for academic purposes.
All rights reserved. Redistribution or commercial use is not permitted without explicit permission from the authors or the institution.

Our Team

Support

For technical issues or questions about the application, please refer to the project documentation or contact the development team.


Note: This application is designed for educational and accessibility purposes. For critical communication needs, professional sign language interpretation is recommended.

About

A web-based tool that translates English text into International Sign Language using pre-rendered 3D sign videos and AI-generated gloss using Chatgpt.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published