You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docker.md
+36-31
Original file line number
Diff line number
Diff line change
@@ -4,71 +4,76 @@ layout: default
4
4
5
5
# Docker
6
6
7
-
8
-
We provide two flavors of Docker images: one which parses text from standard input and exits. Since it loads the model every time you run it, it is not suitable for repeated parsing of small chunks of text. The other flavor is server mode, which starts, loads the model and listens on a port where you can feed in chunks of text without incurring the overhead of model reloading.
9
-
10
-
11
7
<divclass="alert"markdown="1">
12
8
#### Docker on OSX and Windows
13
9
14
10
Docker on OSX and Windows is configured with a default tight memory limit which needs to be increased. Reaching this limit manifests itself by the Docker container hanging indefinitely. See <ahref="https://github.com/TurkuNLP/Turku-neural-parser-pipeline/issues/15">issue #15</a>.
15
11
</div>
16
12
17
-
# One-shot parser images
18
13
19
-
For a quick test on the pre-made Finnish image:
14
+
We provide docker images for both cpu and gpu architectures. When launching a container based upon these images, it is possible to select:
20
15
21
-
echo "Minulla on koira." | docker run -i turkunlp/turku-neural-parser:finnish-cpu-plaintext-stdin
16
+
* Whether to run the parser in a one-shot stdin-stdout streaming mode or a server mode which does not reload the model on every request
17
+
* Which of the language models included with the image to use
18
+
* Which pipeline from the model to run
22
19
23
-
And for English:
20
+
# Ready-made images
24
21
25
-
echo "I don't have a goldfish." | docker run -i turkunlp/turku-neural-parser:english-cpu-plaintext-stdin
22
+
Ready-made Docker images are published in the [TurkuNLP Docker Hub](https://hub.docker.com/r/turkunlp/turku-neural-parser/tags) where Docker can find them automatically. Currently there are images with the base parser environment for cpu and gpu, as well as an image with Finnish, Swedish, and English models, again for both cpu and gpu. To list the models and pipelines available in a particular image, you can run:
26
23
27
-
## Ready-made images
24
+
docker run --entrypoint ./list_models.sh turkunlp/turku-neural-parser:fi-en-sv-cpu
28
25
29
-
Several ready-made Docker images are published in the [TurkuNLP Docker Hub](https://hub.docker.com/r/turkunlp/turku-neural-parser/tags) where Docker can find them automatically. Currently the ready-made images exist for Finnish and English.
26
+
# Streaming mode - one-off parsing of text
30
27
31
-
* the `<language>-cpu-plaintext-stdin` images are most useful to one-off parse a text document on a standard computer without GPU acceleration. These are by far the easiest to use, but since the model is loaded every time you launch the parser, incuring a non-trivial startup delay, these images are not suitable for on-the-fly parsing
32
-
* the `commonbase-cpu-latest` image contains the parser itself, but no models to save space, it is the basis for the language-specific images
28
+
This is the simplest way to run the parser and is useful for one-off parsing of text. It is unsuitable for repeated requests, as running in this mode is subject to a major startup cost as the parser loads the large model. To parse using one of the pre-made images with Finnish, Swedish and English models:
33
29
34
-
### Running from a ready-made image
30
+
echo "Minulla on koira." | docker run -i turkunlp/turku-neural-parser:fi-en-sv-cpu stream fi_tdt parse_plaintext > parsed.conllu
35
31
36
-
To simply test the parser:
32
+
or if you have the NVidia-enabled docker, you can run the gpu version:
37
33
38
-
echo "Minulla on koira." | docker run -i turkunlp/turku-neural-parser:finnish-cpu-plaintext-stdin
34
+
echo "Minulla on koira." | docker run --runtime=nvidia -i turkunlp/turku-neural-parser:fi-en-sv-gpu stream fi_tdt parse_plaintext > parsed.conllu
39
35
40
-
To one-off parse a single file:
36
+
And for English (the only change being that we specify the `en_ewt` model instead of `fi_tdt`):
41
37
42
-
cat input.txt | docker run -i turkunlp/turku-neural-parser:finnish-cpu-plaintext-stdin > output.conllu
38
+
echo "I don't have a goldfish." | docker run -i turkunlp/turku-neural-parser:fi-en-sv-cpu stream en_ewt parse_plaintext > parsed.conllu
43
39
44
-
## Images for other languages
45
40
46
-
Building a language-specific image is straightforward. For this you need to choose one of the available language models from [here](http://bionlp-www.utu.fi/dep-parser-models/). These models refer to the various treebanks available at [UniversalDependencies](https://universaldependencies.org). Let us choose French and the GSD treebank model. That means the model name is `fr_gsd` and to parse plain text documents you would use the `parse_plaintext` pipeline.
41
+
The general command to run the parser in this mode is:
47
42
48
-
Build the Docker image like so:
43
+
docker run -i [image] stream [language_model] [pipeline]
In this mode, the parser loads the model once, and can subsequently respond to repeated requests using HTTP requests. For example, using the gpu version:
53
48
54
-
echo "Les carottes sont cuites" | docker run -i my_french_parser_plaintext
49
+
docker run --runtime=nvidia -d -p 15000:7689 turkunlp/turku-neural-parser:fi-en-sv-gpu server en_ewt parse_plaintext
55
50
56
-
# Server mode images
51
+
and on cpu
57
52
58
-
These are built much like the one-shot images, and we provide the English and Finnish images on DockerHub. The started containers listen to POST requests on port number 7689. Run like such:
53
+
docker run -d -p 15000:7689 turkunlp/turku-neural-parser:fi-en-sv-cpu server en_ewt parse_plaintext
59
54
60
-
```
61
-
docker run -d -p 15000:7689 turkunlp/turku-neural-parser:finnish-cpu-plaintext-server
62
-
```
55
+
will start the parser in server mode, using the English `en_ewt` model and `parse_plaintext` pipeline, and will listen on the local port 15000 for requests. Note: There is nothing magical about the port number 15000, you can set it to any suitable port number. You can query the running parser as follows:
63
56
64
-
This maps the port at which the Docker image listens to your localhost port 15000 (any free port number will do of course) so you can parse as follows:
65
57
66
58
```
67
-
curl --request POST --header 'Content-Type: text/plain; charset=utf-8' --data-binary "Tämä on esimerkkilause" http://localhost:15000 > parsed.conllu
59
+
curl --request POST --header 'Content-Type: text/plain; charset=utf-8' --data-binary "This is an example sentence, nothing more, nothing less." http://localhost:15000 > parsed.conllu
Building a language-specific image is straightforward. For this you need to choose one of the available language models from [here](http://bionlp-www.utu.fi/dep-parser-models/). These models refer to the various treebanks available at [UniversalDependencies](https://universaldependencies.org). Let us choose French and the GSD treebank model. That means the model name is `fr_gsd` and to parse plain text documents you would use the `parse_plaintext` pipeline.
0 commit comments