Skip to content

Commit

Permalink
new image setups + docs
Browse files Browse the repository at this point in the history
  • Loading branch information
khufkens committed Feb 11, 2025
1 parent 8946b6f commit 5612da0
Show file tree
Hide file tree
Showing 4 changed files with 124 additions and 11 deletions.
40 changes: 40 additions & 0 deletions Dockerfile_kraken
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# For NVIDIA acceleration make sure to
# enable the NVIDIA container toolkit
# ubuntu/jammy is the default image,
# nvidia/cuda is the old nvidia image
# pytorch the newer pytorch image which
# might conflict with a tensorflow Install
# if acceleration is desired

#FROM ubuntu/jammy
#FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04
#FROM pytorch/pytorch:2.3.1-cuda12.1-cudnn8-devel

# copy package content
COPY environment_kraken.yml .

# Install base utilities
RUN apt-get update
RUN apt-get install -y build-essential wget software-properties-common git

# install libraries
RUN apt-get install -y libgl1 libavcodec-dev libavformat-dev libswscale-dev \
libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev \
libgtk2.0-dev libgtk-3-dev libpng-dev libjpeg-dev \
libopenexr-dev libtiff-dev libwebp-dev

# install miniconda
ENV CONDA_DIR /opt/conda
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
RUN /bin/bash ~/miniconda.sh -b -p /opt/conda

# recreate and activate the environment
# suppress TF log level output
RUN /opt/conda/bin/conda env create -f environment_kraken.yml
RUN echo "source activate weahtr" > ~/.bashrc
ENV PATH $CONDA_DIR/bin:$PATH

# Set the working directory on start
# assumes that people follow the directions!
WORKDIR /data
31 changes: 24 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,16 +70,22 @@ involved. Once build locally no further downloads will be required.
```bash
docker build -f Dockerfile -t weahtr .
```

The default install above provides support for [PyLaia](https://github.com/jpuigcerver/PyLaia)
and [Tesseract](https://tesseract-ocr.github.io/). If you want support for the
[Kraken](https://kraken.re/main/index.html) environment use the following code:

```bash
docker build -f Dockerfile_kraken -t weahtr .
```

> [!NOTE]
> Repeatedly building a docker image can result in a large cache being created
> easily 10x the data of the image (which in itself is multiple GB in size).
>
> If you find you are running out of storage space out of the blue, check the
> docker build cache, and prune it.
>
> Both the PyLaia and Kraken environments support various open source models
> You can list all available Kraken models by using the command line:
> ```bash
> docker buildx prune -f
> kraken list
> ```
> PyLaia models can be found on [HugginFace](https://huggingface.co/Teklia).
Make sure to have interfacing libraries running, when relying on different
docker base images.
Expand All @@ -101,6 +107,17 @@ For independent installs using conda
conda env create -f environment.yml
```

> [!NOTE]
> Repeatedly building a docker image can result in a large cache being created
> easily 10x the data of the image (which in itself is multiple GB in size).
>
> If you find you are running out of storage space out of the blue, check the
> docker build cache, and prune it.
>
> ```bash
> docker buildx prune -f
> ```
### Loading the package locally
For now, no online `pip` based install is supported. You can install the package
Expand Down
10 changes: 6 additions & 4 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
name: weahtr
channels:
- conda-forge
# - pytorch-nightly
- conda-forge # to be OSS stay away from default

# functional PyLaia setup

dependencies:
# General python settings
- python==3.8
- python
- conda
- wheel
- pip
Expand All @@ -28,10 +29,11 @@ dependencies:
- datasets
- transformers
- jiwer
- shapely
- coremltools
# OCR components
- pytesseract
- tesseract
- mittagessen::kraken
- pip:
- evaluate
- torch
Expand Down
54 changes: 54 additions & 0 deletions environment_kraken.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: weahtr
channels:
- conda-forge # to be OSS stay away from default

# Dependencies taken from the kraken repo:
# https://github.com/mittagessen/kraken/blob/main/environment_cuda.yml

dependencies:
# General python settings
- python>=3.9
- wheel
- pip
- python-bidi~=0.6.0
- lxml
- regex
- requests
- pyyaml
- click>=8.1
- numpy~=1.23
- pillow>=9.2.0
- scipy~=1.13.0
- jinja2~=3.0
- conda-forge::torchvision>=0.5.0
- conda-forge::pytorch~=2.4.0
- cudatoolkit>=9.2
- jsonschema
- scikit-learn~=1.2.1
- scikit-image~=0.24.0
- shapely>=2.0.6
- pyvips
- imagemagick>=7.1.0
- pyarrow
- importlib-resources>=1.3.0
- conda-forge::lightning~=2.4.0
- conda-forge::torchmetrics>=1.1.0
- conda-forge::threadpoolctl~=3.5.0
- pip
- albumentations
- rich
- setuptools>=36.6.0,<70.0.0
- transformers
- jiwer
- datasets
- tiktoken
- opencv
- pandas
- matplotlib
- pytesseract
- tesseract
- pip:
- coremltools~=8.1
- htrmopo
- platformdirs
- kraken

0 comments on commit 5612da0

Please sign in to comment.