Skip to content

PHP Frequency Distribution

Nick Escobedo edited this page Aug 14, 2017 · 3 revisions

Frequency Distributions with PHP Text Analysis

The frequency distribution is a great way to find out how frequently or in-frequently specific words are used in a body of text. The FreqDist class expects the tokens to be normalized prior to object instantiation.

$tokenizer = new GeneralTokenizer();
$tokens = $tokenizer->tokenize("time flies like an arrow and an arrow flies like time");
$freqDist = new FreqDist($tokens);

/*
* Get the Hapaxes, all the terms with a frequency count of 1
*/
$freqDist->getHapaxes(); 

/*
* get the corpus size
*/ 
$freqDist->getTotalTokens();
/**
* Get the size of the vocabulary
*/
$freqDist->getTotalUniqueTokens();