-
Notifications
You must be signed in to change notification settings - Fork 13
Training a model
danieldk edited this page Oct 30, 2010
·
1 revision
The language model and lexicon can be created with the train utility:
$ ./citar-train corpus_train lexicon ngrams
This will create the lexicon and ngrams files. The trainer will read corpora in the Brown format (one sentence per line, words and tags are separated with a forward slash). You can now test the tagger with the command-line tag utility, which reads tokenized sentences from the standard input and prints the most probable tag sequence:
$ echo "The cat is on the mat ." | ./tag lexicon ngrams
The/AT cat/NN is/BEZ on/IN the/AT mat/NN ./.