Training a model

The language model and lexicon can be created with the train utility:

$ ./citar-train corpus_train lexicon ngrams

This will create the lexicon and ngrams files. The trainer will read corpora in the Brown format (one sentence per line, words and tags are separated with a forward slash). You can now test the tagger with the command-line tag utility, which reads tokenized sentences from the standard input and prints the most probable tag sequence:

$ echo "The cat is on the mat ." | ./tag lexicon ngrams
The/AT cat/NN is/BEZ on/IN the/AT mat/NN ./.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training a model

Clone this wiki locally