Skip to content

CaptiOCR - A real-time screen text extraction tool using Tesseract OCR. Capture, recognize, and log on-screen text dynamically. Future updates will include on-demand language installation, resizable selection areas, and live text overlays.

License

Notifications You must be signed in to change notification settings

carlosacchi/captiocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🖥️ CaptiOCR - Real-Time Screen Text Extraction

CodeQL

CaptiOCR is an open-source real-time screen text extraction tool that allows users to capture on-screen text dynamically using Tesseract OCR. With an easy-to-use interface, you can select an area of the screen and extract text continuously, making it ideal for live captions, subtitles, or instant text recognition.

🚀 Planned future enhancements include:

  • On-demand language installation for Tesseract OCR
  • Dynamically resizable selection area instead of a fixed rectangle
  • Improved multi-monitor support
  • Live text and Translation overlay feature

📌 Features

Real-time OCR processing using Tesseract OCR
Multi-language support (English, Italian, more in the future)
Logs and saves extracted text to a file
Hotkey Support: Ctrl+Q to stop capture
Named profile saving for different capture configurations
Optimized capture area selection for reduced text duplication
Debug logging for troubleshooting


🛠️ Prerequisites

Before installing, ensure you have the following:

  • Python 3.x installed
  • Tesseract OCR installed (Download it here)
  • Required Python libraries

👥 Installation

1️⃣ Install Python & Dependencies

pip install pytesseract pillow keyboard pyautogui tkinter pytesseract

2️⃣ Install Tesseract OCR

📉 Windows users:
Download & install Tesseract from here.
Then, update the script with the correct Tesseract path:

TESSDATA_PREFIX = r'C:\Program Files\Tesseract-OCR\tessdata'
TESSERACT_CMD = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

▶️ Running the Application

After installing the dependencies, clone the repository and run the script:

git clone https://github.com/CarloSacchi/CaptiOCR.git
cd CaptiOCR
python CaptiOCR.py

🖥️ How to Use

1️⃣ Select a language from the dropdown menu (Italian or English). 2️⃣ Click "Start (Select Area)" to open the selection window.

LiveOCR Start Capturing

3️⃣ Drag and Drop over the area you want to capture.

LiveOCR Start Capturing

LiveOCR Start Capturing

4️⃣ Press ENTER to start real-time OCR.

LiveOCR Start Capturing

5️⃣ Press Ctrl+Qor STOP to stop the capture.

6️⃣ Name your capture when prompted after stopping.

LiveOCR Start Capturing

Repeat for your next meeting.

📂 Captured text is saved in a file named in the "captures" folder, capture_YYYY-MM-DD-HH-MM-SS.txt.

LiveOCR Text Files

LiveOCR Text Files


🔍 Tips & Tricks

Optimizing OCR Accuracy

  • Choose the correct language: While OCR may sometimes work with the wrong language selected, choosing the correct language model significantly improves accuracy, especially with text containing special characters, accents, and language-specific patterns.

  • Optimize capture area: For better results and fewer duplications, select a narrow rectangular area that focuses precisely on the caption/subtitle area. An aspect ratio that is wider than it is tall usually works best for captions.

  • Minimum area size: Both width and height must be at least 50 pixels for OCR to function properly.

  • Saved profiles: Save your optimized capture areas for different programs (Teams, Zoom, etc.) using the "Save Settings" feature.

Advanced Features

  • Moving the capture area: Click and drag the capture window to reposition it during a capture session.

  • Debug logging: Enable the "Debug Logging" checkbox to record detailed operation information for troubleshooting.

  • Custom naming: When stopping a capture, you can assign a custom name to the output file for better organization.


📊 Roadmap

This project is open-source and under active development. Upcoming improvements include:

Architecture and Code Improvements

  • 📉 Modularized code structure for better maintainability
  • 📉 Improved logging framework with proper log levels
  • 📉 Better error handling and recovery
  • 📉 Enhanced thread management for improved stability

UI and Usability Enhancements

  • 📉 Enhanced interface with results panel and preview
  • 📉 Dynamic area selection (resizable & movable during capture)
  • 📉 Dark mode and theme options
  • 📉 Better multi-monitor support

Functional Enhancements

  • 📉 Expanded language options with auto-detection capability
  • 📉 Image preprocessing options for better OCR results
  • 📉 Multiple capture areas simultaneously
  • 📉 Live translator with DeepL or Google Translate API
  • 📉 Various export formats (PDF, HTML, Word)
  • 📉 Direct clipboard integration

Performance Optimization

  • 📉 Optimized memory usage during image processing
  • 📉 Parallel OCR processing for faster results
  • 📉 Cloud integration for storage and processing

If you want to contribute, feel free to submit pull requests or feature requests! 🚀


📚 License

This project is licensed under the MIT License, allowing free use and modification.

👤 Author: Carlo Sacchi

About

CaptiOCR - A real-time screen text extraction tool using Tesseract OCR. Capture, recognize, and log on-screen text dynamically. Future updates will include on-demand language installation, resizable selection areas, and live text overlays.

Topics

Resources

License

Stars

Watchers

Forks

Languages