Skip to content

Commit 0fa2633

Browse files
committed
revamping the docker to not require different images for the different pipelines
1 parent 5e823f6 commit 0fa2633

8 files changed

+137
-35
lines changed

Diff for: Dockerfile-baseenv

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# are we building cpu or gpu image?
2+
#
3+
ARG hardware="cpu"
4+
5+
FROM tensorflow/tensorflow:1.12.0-py3 as base-env-cpu
6+
SHELL ["/bin/bash", "-c"]
7+
WORKDIR /app
8+
COPY requirements-cpu.txt ./requirements.txt
9+
ENV tnpp_hw="cpu"
10+
#drop TF from requirements, it comes with the image
11+
RUN sed -i '/tensorflow/d' ./requirements.txt
12+
13+
FROM tensorflow/tensorflow:1.12.0-gpu-py3 as base-env-gpu
14+
SHELL ["/bin/bash", "-c"]
15+
WORKDIR /app
16+
COPY requirements-gpu.txt ./requirements.txt
17+
ENV tnpp_hw="gpu"
18+
#drop TF from requirements, it comes with the image
19+
RUN sed -i '/tensorflow/d' ./requirements.txt
20+
21+
FROM base-env-${hardware}
22+
SHELL ["/bin/bash", "-c"]
23+
WORKDIR /app
24+
RUN apt-get clean && apt-get update && apt-get install -y locales && locale-gen en_US.UTF-8
25+
ENV LC_ALL=en_US.UTF-8
26+
RUN pip3 install --no-cache-dir -r requirements.txt
27+
COPY *.py ./
28+
COPY Parser-v2 ./Parser-v2
29+
COPY tokenizer ./tokenizer
30+
COPY universal-lemmatizer ./universal-lemmatizer
31+
COPY docker_entry_point.sh list_models.sh ./
32+
EXPOSE 7689
33+
ENTRYPOINT ["./docker_entry_point.sh"]
34+
CMD ["stream","fi_tdt","parse_plaintext"]
35+

Diff for: Dockerfile-lang

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
### GRAB THE MODELS WE WANT, based on build-arg
2+
### I don't know how to tag the base-env using $hardware, so this needs to be built using a script
3+
4+
ARG hardware=cpu
5+
6+
FROM turkunlp/turku-neural-parser:base-${hardware}
7+
ARG models
8+
SHELL ["/bin/bash", "-c"]
9+
WORKDIR /app
10+
RUN echo "MODELS: $models"
11+
RUN for m in ${models} ; do echo "DOWNLOADING $m" ; python3 fetch_models.py $m ; done

Diff for: docker_build.sh

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
docker build -f Dockerfile-baseenv --build-arg hardware=gpu -t turkunlp/turku-neural-parser:base-gpu .
2+
docker build -f Dockerfile-baseenv --build-arg hardware=cpu -t turkunlp/turku-neural-parser:base-cpu .
3+
4+
models="fi_tdt en_ewt sv_talbanken"
5+
container_tag="fi-en-sv"
6+
7+
docker build -f Dockerfile-lang --build-arg hardware=gpu --build-arg models="$models" -t turkunlp/turku-neural-parser:$container_tag-gpu .
8+
docker build -f Dockerfile-lang --build-arg hardware=cpu --build-arg models="$models" -t turkunlp/turku-neural-parser:$container_tag-cpu .
9+

Diff for: docker_entry_point.sh

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/bin/bash
2+
3+
hw_environment=$tnpp_hw #should be cpu or gpu
4+
mode=$1 #should be stream or server
5+
modelname=$2 #should be one of the models installed into this docker, like fi_tdt
6+
pipeline=$3 #should be a name of a pipeline defined for this model like parse_plaintext
7+
8+
SERVER_PORT=7689 #...this is docker-internal, so doesn't matter
9+
10+
echo "DOCKER ENTRY HW" $tnpp_hw > /dev/stderr
11+
echo "DOCKER ENTRY ARGS" $* > /dev/stderr
12+
13+
if [[ "$hw_environment" == "cpu" ]]
14+
then
15+
gpu_arg="--gpu -1"
16+
elif [[ "$hw_environment" == "gpu" ]]
17+
then
18+
gpu_arg=" "
19+
fi
20+
21+
22+
if [[ "$mode" == "stream" ]]
23+
then
24+
python3 full_pipeline_stream.py $gpu_arg --conf-yaml models_${modelname}/pipelines.yaml $pipeline
25+
elif [[ "$mode" == "server" ]]
26+
then
27+
python3 full_pipeline_server.py $gpu_arg --host 0.0.0.0 --port $SERVER_PORT --conf-yaml models_${modelname}/pipelines.yaml $pipeline
28+
fi

Diff for: docs/docker.md

+36-31
Original file line numberDiff line numberDiff line change
@@ -4,71 +4,76 @@ layout: default
44

55
# Docker
66

7-
8-
We provide two flavors of Docker images: one which parses text from standard input and exits. Since it loads the model every time you run it, it is not suitable for repeated parsing of small chunks of text. The other flavor is server mode, which starts, loads the model and listens on a port where you can feed in chunks of text without incurring the overhead of model reloading.
9-
10-
117
<div class="alert" markdown="1">
128
#### Docker on OSX and Windows
139

1410
Docker on OSX and Windows is configured with a default tight memory limit which needs to be increased. Reaching this limit manifests itself by the Docker container hanging indefinitely. See <a href="https://github.com/TurkuNLP/Turku-neural-parser-pipeline/issues/15">issue #15</a>.
1511
</div>
1612

17-
# One-shot parser images
1813

19-
For a quick test on the pre-made Finnish image:
14+
We provide docker images for both cpu and gpu architectures. When launching a container based upon these images, it is possible to select:
2015

21-
echo "Minulla on koira." | docker run -i turkunlp/turku-neural-parser:finnish-cpu-plaintext-stdin
16+
* Whether to run the parser in a one-shot stdin-stdout streaming mode or a server mode which does not reload the model on every request
17+
* Which of the language models included with the image to use
18+
* Which pipeline from the model to run
2219

23-
And for English:
20+
# Ready-made images
2421

25-
echo "I don't have a goldfish." | docker run -i turkunlp/turku-neural-parser:english-cpu-plaintext-stdin
22+
Ready-made Docker images are published in the [TurkuNLP Docker Hub](https://hub.docker.com/r/turkunlp/turku-neural-parser/tags) where Docker can find them automatically. Currently there are images with the base parser environment for cpu and gpu, as well as an image with Finnish, Swedish, and English models, again for both cpu and gpu. To list the models and pipelines available in a particular image, you can run:
2623

27-
## Ready-made images
24+
docker run --entrypoint ./list_models.sh turkunlp/turku-neural-parser:fi-en-sv-cpu
2825

29-
Several ready-made Docker images are published in the [TurkuNLP Docker Hub](https://hub.docker.com/r/turkunlp/turku-neural-parser/tags) where Docker can find them automatically. Currently the ready-made images exist for Finnish and English.
26+
# Streaming mode - one-off parsing of text
3027

31-
* the `<language>-cpu-plaintext-stdin` images are most useful to one-off parse a text document on a standard computer without GPU acceleration. These are by far the easiest to use, but since the model is loaded every time you launch the parser, incuring a non-trivial startup delay, these images are not suitable for on-the-fly parsing
32-
* the `commonbase-cpu-latest` image contains the parser itself, but no models to save space, it is the basis for the language-specific images
28+
This is the simplest way to run the parser and is useful for one-off parsing of text. It is unsuitable for repeated requests, as running in this mode is subject to a major startup cost as the parser loads the large model. To parse using one of the pre-made images with Finnish, Swedish and English models:
3329

34-
### Running from a ready-made image
30+
echo "Minulla on koira." | docker run -i turkunlp/turku-neural-parser:fi-en-sv-cpu stream fi_tdt parse_plaintext > parsed.conllu
3531

36-
To simply test the parser:
32+
or if you have the NVidia-enabled docker, you can run the gpu version:
3733

38-
echo "Minulla on koira." | docker run -i turkunlp/turku-neural-parser:finnish-cpu-plaintext-stdin
34+
echo "Minulla on koira." | docker run --runtime=nvidia -i turkunlp/turku-neural-parser:fi-en-sv-gpu stream fi_tdt parse_plaintext > parsed.conllu
3935

40-
To one-off parse a single file:
36+
And for English (the only change being that we specify the `en_ewt` model instead of `fi_tdt`):
4137

42-
cat input.txt | docker run -i turkunlp/turku-neural-parser:finnish-cpu-plaintext-stdin > output.conllu
38+
echo "I don't have a goldfish." | docker run -i turkunlp/turku-neural-parser:fi-en-sv-cpu stream en_ewt parse_plaintext > parsed.conllu
4339

44-
## Images for other languages
4540

46-
Building a language-specific image is straightforward. For this you need to choose one of the available language models from [here](http://bionlp-www.utu.fi/dep-parser-models/). These models refer to the various treebanks available at [UniversalDependencies](https://universaldependencies.org). Let us choose French and the GSD treebank model. That means the model name is `fr_gsd` and to parse plain text documents you would use the `parse_plaintext` pipeline.
41+
The general command to run the parser in this mode is:
4742

48-
Build the Docker image like so:
43+
docker run -i [image] stream [language_model] [pipeline]
4944

50-
docker build -t "my_french_parser_plaintext" --build-arg "MODEL=fr_gsd" --build-arg "PIPELINE=parse_plaintext" -f Dockerfile https://github.com/TurkuNLP/Turku-neural-parser-pipeline.git
45+
# Server mode
5146

52-
And then you can parse French like so:
47+
In this mode, the parser loads the model once, and can subsequently respond to repeated requests using HTTP requests. For example, using the gpu version:
5348

54-
echo "Les carottes sont cuites" | docker run -i my_french_parser_plaintext
49+
docker run --runtime=nvidia -d -p 15000:7689 turkunlp/turku-neural-parser:fi-en-sv-gpu server en_ewt parse_plaintext
5550

56-
# Server mode images
51+
and on cpu
5752

58-
These are built much like the one-shot images, and we provide the English and Finnish images on DockerHub. The started containers listen to POST requests on port number 7689. Run like such:
53+
docker run -d -p 15000:7689 turkunlp/turku-neural-parser:fi-en-sv-cpu server en_ewt parse_plaintext
5954

60-
```
61-
docker run -d -p 15000:7689 turkunlp/turku-neural-parser:finnish-cpu-plaintext-server
62-
```
55+
will start the parser in server mode, using the English `en_ewt` model and `parse_plaintext` pipeline, and will listen on the local port 15000 for requests. Note: There is nothing magical about the port number 15000, you can set it to any suitable port number. You can query the running parser as follows:
6356

64-
This maps the port at which the Docker image listens to your localhost port 15000 (any free port number will do of course) so you can parse as follows:
6557

6658
```
67-
curl --request POST --header 'Content-Type: text/plain; charset=utf-8' --data-binary "Tämä on esimerkkilause" http://localhost:15000 > parsed.conllu
59+
curl --request POST --header 'Content-Type: text/plain; charset=utf-8' --data-binary "This is an example sentence, nothing more, nothing less." http://localhost:15000 > parsed.conllu
6860
```
6961

7062
or
7163

7264
```
7365
curl --request POST --header 'Content-Type: text/plain; charset=utf-8' --data-binary @input_text.txt http://localhost:15000 > parsed.conllu
7466
```
67+
68+
# Images for other languages
69+
70+
Building a language-specific image is straightforward. For this you need to choose one of the available language models from [here](http://bionlp-www.utu.fi/dep-parser-models/). These models refer to the various treebanks available at [UniversalDependencies](https://universaldependencies.org). Let us choose French and the GSD treebank model. That means the model name is `fr_gsd` and to parse plain text documents you would use the `parse_plaintext` pipeline.
71+
72+
Build the Docker image like so:
73+
74+
docker build -t "my_french_parser_plaintext" --build-arg models=fr_gsd --build-arg hardware=cpu -f Dockerfile-lang https://github.com/TurkuNLP/Turku-neural-parser-pipeline.git
75+
76+
And then you can parse French like so:
77+
78+
echo "Les carottes sont cuites" | docker run -i my_french_parser_plaintext stream fr_gsd parse_plaintext
79+

Diff for: list_models.sh

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/bash
2+
3+
for m in models_*
4+
do
5+
echo "---------------------------------"
6+
echo "MODEL:" ${m#models_}
7+
echo
8+
echo "Pipelines:"
9+
python3 full_pipeline_stream.py --conf-yaml $m/pipelines.yaml list
10+
echo
11+
echo
12+
echo
13+
done

Diff for: requirements-cpu.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ h5py
66
matplotlib
77
flask
88
numpy
9+
ufal.udpipe
910
pyyaml
1011

11-
ufal.udpipe
1212
configargparse
13-
flask
1413

1514
tensorflow==1.12.2
15+
1616
torch==0.4.1
1717
torchtext==0.3.1
1818
torchvision==0.2.1

Diff for: requirements-gpu.txt

+3-2
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,14 @@ keras
55
h5py
66
matplotlib
77
flask
8-
8+
numpy
99
ufal.udpipe
1010
pyyaml
11-
numpy
11+
1212
configargparse
1313

1414
tensorflow-gpu==1.12.2
15+
1516
torch==0.4.1
1617
torchtext==0.3.1
1718
torchvision==0.2.1

0 commit comments

Comments
 (0)