This is a Part of Speech tagger written in Python, utilizing the Viterbi algorithm (an instantiation of Hidden Markov Models). It uses the Natural Language Toolkit and trains on Penn Treebank-tagged text files. It will use ten-fold cross validation to generate accuracy statistics, comparing its tagged sentences with the gold standard.
python hmm-tagger.py [--clean]
Pass in the --clean option to clean a Treebank file before running the tagger. This can be time consuming, so you can leave it off during future runs.