A python wrapper for determining interpolation weights with SRILM. This package provides two features:
- Determine interpolation weights of multiple language models.
- Mix the models together in a single file.
Note that you should prepare for individual language models you want to interpolate with SRILM before using this package. Please also note that this script does NOT work without SRILM. You need to install SRILM on your machine first.
I have tested scripts on Linux and OS X (10.8.2). OS X users might
want to install gawk
since interpolator.py
(it calls internally
SRI's compute-best-mix
) depends on gawk
.
- SRILM
- python setuptools (optional)
The installation process is not necessary. Just download this package on your favorite place.
$ git clone git://github.com/tetsuok/py-srilm-interpolator.git
There are three steps to create an interpolated language model. To run python scripts, you need to prepare for a config file first.
Here is a config file format you should prepare for first:
[SRILM]
# This is a comment. You need to edit the following line.
home=/path/to/srilm
# You need to edit the order of language model according to
# the language models you want to mix together.
opt=-order 3 -unk
# Do not edit this option!
[perplexity]
opt=-debug 2
[devset]
# Edit the following path.
path=/path/to/devset
[language models]
# Edit the following path.
lm1=/path/to/lm1
lm2=/path/to/lm2
WARNING: Basically you can change only values in the config file except the options in the section "language models." In other words, do not change the name of 'sections' and 'options'; Python scripts will find the given sections and options.
Lines beginning with '#' or ';' are ignored and treated as comments.
Please see an example config file example/example.cfg
.
In most cases, it will suffice to copy example/example.cfg
and to
edit the path to SRILM, order of language models, and path to language
models.
I recommend using the absolute path to SRILM and language models when you edit path to them.
We assume that you have already prepared for a config file.
$ ls
config py-srilm-interpolator
$ py-srilm-interpolator/srilm/interpolator.py -c config [--cpus 2 ]
Please note that you need to specify -c config
.
$ py-srilm-interpolator/srilm/combiner.py -c config best-mix.ppl
Please note that you need to specify -c config
.
best-mix.ppl
is a file which has been generated by the previous step.
Joshua Building large LMs with SRILM
This code is distributed under the New BSD License. See the file LICENSE.