SoundSigns is a comprehensive web-based application that translates spoken English into International Sign Language (ISL) in real-time. The system captures speech, converts it to text, translates it into ISL gloss, and displays the translation through a 3D animated avatar using pre-rendered video clips.
- Real-time Speech Recognition: Browser-based speech-to-text conversion using Web Speech API
- ISL Gloss Translation: Converts English text to International Sign Language gloss using ChatGPT API
- 3D Avatar Animation: Visual sign language representation through pre-rendered MP4 video clips
- Video Assembly: Seamless concatenation of individual sign videos into coherent sentences
- Interactive Interface: Clean, responsive UI with microphone controls and video playback
- Multi-format Support: Covers alphabet letters (A-Z), numbers (0-9), and common vocabulary
- Download Functionality: Save translated sign language videos for offline use
- Cross-browser Compatibility: Works on modern browsers supporting Web Speech API
The application follows a modular three-tier architecture:
- Frontend: React.js with Tailwind CSS handling user interaction and video processing
- Backend: Flask server managing API communications and text-to-gloss conversion
- Dataset: Curated collection of ~150 pre-rendered ISL sign videos
- Python 3.8+
- Node.js 14+
- OpenAI API Key
- Modern web browser with Web Speech API support (Chrome, Edge recommended)
- Install Python dependencies:
pip install sounddevice numpy openai flask flask-cors python-dotenv
- Create a
.env
file in thebackend/
directory:
OPENAI_API_KEY=your_openai_key_here
Security Note: Never commit the .env
file to version control.
- Navigate to the frontend directory and install dependencies:
cd frontend
npm install
- Start the frontend development server:
cd frontend
npm run dev
- In a separate terminal, start the backend server from the project root:
py backend/transcription.py
- Access the application at
http://localhost:3000
(or the port specified by your dev server)
- Voice Input: Click the microphone button and speak clearly in English
- Transcription: View the real-time speech-to-text conversion
- Translation: See the ISL gloss translation displayed
- Video Playback: Watch the 3D avatar perform the signed translation
- Controls: Use play, replay, and download buttons to control video playback
project-root/
├── backend/
│ ├── .env # Environment variables (not in version control)
│ └── transcription.py # Flask server and API logic
├── frontend/
│ ├── src/
│ │ ├── components/ # React components
│ │ └── App.jsx # Main application file
│ └── assets/
│ └── videos/ # Pre-rendered sign language videos
│ ├── letters/ # A-Z alphabet signs
│ ├── numbers/ # 0-9 numerical signs
│ └── words/ # Common vocabulary signs
- Frontend: React.js, Tailwind CSS, Web Speech API
- Backend: Python, Flask, Flask-CORS
- Translation: OpenAI GPT-3.5-turbo API
- Video Processing: Browser-based video concatenation
- Dataset: Pre-rendered MP4 videos with 3D ISL avatar
- Browser: Chrome, Edge, or other browsers with Web Speech API support
- Microphone: Required for speech input
- Internet Connection: Required for OpenAI API access
- Limited vocabulary dataset (~150 signs)
- Words not in the dataset are finger-spelled letter by letter
- Translation accuracy depends on ChatGPT's ISL gloss generation
- Requires a quiet environment for optimal speech recognition
- System latency of 3-5 seconds for the complete translation process
This project was developed as a completed academic capstone project and is no longer under active development.
At this time, we are not accepting contributions or pull requests.
Thank you for your interest and understanding.
The sign language video dataset is sourced from the open-source "Text-Speech to Sign Language Generator" project by JS-Coderr (2024), available on GitHub.
This project is the intellectual property of Ahmad Ataba, Waseem Saleem, and Braude Engineering College.
It was developed as a capstone project for academic purposes.
All rights reserved. Redistribution or commercial use is not permitted without explicit permission from the authors or the institution.
For technical issues or questions about the application, please refer to the project documentation or contact the development team.
Note: This application is designed for educational and accessibility purposes. For critical communication needs, professional sign language interpretation is recommended.