-
Notifications
You must be signed in to change notification settings - Fork 87
PHP Frequency Distribution
Nick Escobedo edited this page Aug 14, 2017
·
3 revisions
The frequency distribution is a great way to find out how frequently or in-frequently specific words are used in a body of text. The FreqDist class expects the tokens to be normalized prior to object instantiation.
$tokenizer = new GeneralTokenizer();
$tokens = $tokenizer->tokenize("time flies like an arrow and an arrow flies like time");
$freqDist = new FreqDist($tokens);
/*
* Get the Hapaxes, all the terms with a frequency count of 1
*/
$freqDist->getHapaxes();
/*
* get the corpus size
*/
$freqDist->getTotalTokens();
/**
* Get the size of the vocabulary
*/
$freqDist->getTotalUniqueTokens();