OCR-Docker

Extract text from images & pdf files

OCR-Docker is a Python & Flask powered, easy to use system that helps us to easily extract text from images and pdf files in multiple languages.

Features

Extract text from images (png, jpg, tiff).
Extract text from pdf files (single or multiple pages).

Components and Frameworks used in TTS-STT

tesseract-ocr - open source ocr
tessdata - tesseract-ocr data models
ghostscript
imagemagick
pytesseract
Pillow
Image
Flask
Loguru
PyYAML

The OCR (Optical Character Recognition) feature is free thanks to tesseract-ocr which is an Open Source OCR project.

Installation

docker-compose from hub

version: "3.7"
services:
  ocr:
    image: techblog/ocr-docker:latest
    ports:
      - "8080:8080"
    container_name: tts-stt
    labels:
      - "com.ouroboros.enable=true"
    networks:
      - default
    restart: unless-stopped

Now, run docker-compose up -d to pull and run your container. Open your browser and navigate to your container ip address with port 8080, you should see the following screen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OCR-Docker

Extract text from images & pdf files

Features

Components and Frameworks used in TTS-STT

Installation

docker-compose from hub

Files

README.md

Latest commit

History

README.md

File metadata and controls

OCR-Docker

Extract text from images & pdf files

Features

Components and Frameworks used in TTS-STT

Installation

docker-compose from hub