This Python project implements the classic game of Rock, Paper, Scissors. The player's hand gesture is captured via a webcam, while the computer's choice is randomly generated. The results are displayed on the screen. The project demonstrates hand gesture recognition using computer vision and machine learning techniques.
The game pipeline consists of two main steps:
- Hand Detection: The Mediapipe library detects 21 key points of the user's hand from webcam input, leveraging deep learning techniques developed in TensorFlow.
- Gesture Recognition: A machine learning model (Random Forest) predicts hand gestures based on the coordinates of the detected key points.
This project highlights how computer vision and machine learning techniques enable machines to understand and respond to human gestures.
The webcam captures the images using the OpenCV Python library, which handles hardware communication and returns image data as NumPy arrays.
Captured video frames are preprocessed and fed into the Mediapipe Hand Landmark Model. This model detects 21 hand-knuckle coordinates in hand regions, trained on a dataset of approximately 30,000 real-world and synthetic hand images.
The detected hand points undergo preprocessing steps:
- Data Augmentation: Flipped and rotated images account for left/right hand biases and varying movements.
- Normalization: Min-Max or Z-score normalization is applied to manage variations in hand positions relative to the screen.
A supervised learning approach is used to classify hand gestures (rock, paper, scissors).
- Training: Models (Decision Tree Classifier and Random Forest Classifier) were trained using preprocessed hand key points and corresponding labels.
- Results: The Random Forest Classifier achieved superior performance based on accuracy and F1 scores.
Real-time hand gesture predictions are made using the trained model. The game outcome is determined based on the predicted gesture and a computer-generated random choice.
The game logic is implemented in Python, and OpenCV handles image display. Due to OpenCV's limitations with certain special characters (e.g., French accents), text images are generated using the Text to Image module from the Pillow (PIL) library. These images are displayed with transparent backgrounds for seamless integration.
- English
- French
Languages can be selected in-game.
- Rock Paper Scissors Dataset 1
- RPS Augmented Dataset
- Rock Paper Scissors Images
- Webcam RPS Dataset
- Rock Paper Scissors Dataset
The gesture detection process uses key point coordinates and the Random Forest algorithm for classification. The Decision Tree algorithm was evaluated but yielded lower accuracy and F1 scores.
Documentation
Folder: Contains instructions for installing Python, running the project, and creating an executable for easy distribution.Experiments
Folder: Includes test cases and supporting functions useful for project improvement.Utils
Folder: Contains supporting files formain.py
.languages
Folder: Houses language-specific folders (English, French). Additional languages can be added.gui.py
: Functions for game visuals, instructions, and language selection.language.py
: Functions for language selection and generating corresponding text images.variables.py
: Defines global variables used throughout the project.
Images
Folder: Contains game images (non-language specific). Language-ready images can be generated using functions in theExperiments
folder.main.py
: Activates OpenCV and manages game logic.
Developed using Python 3.9.0. Detailed setup steps are available in the Documentation section.
- Francisco Perdigon Romero LinkedIn | GitHub
- Pierre Thibault LinkedIn
- Marie Nashed LinkedIn | GitHub
- Bhagya Chembakottu LinkedIn | GitHub
- Adapt image display to be relative to any screen size (current setup is 480x640).
- Optimize code to improve performance on low-end systems (e.g., skip detection for every frame).
- Ensure all game images are language-ready by default.
This project is licensed under the MIT License.