Updated documentation.

yu1ming · Jul 1, 2014 · 696a7af · 696a7af
1 parent 920dab0
commit 696a7af
Showing 1 changed file with 64 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -3,6 +3,37 @@ Java Naive Bayes Classifier
 
 Nothing special. It works and is well documented, so you should get it running without wasting too much time searching for other alternatives on the net.
 
+Overview
+------------------
+
+I like talking about *features* and *categories*. Objects have features and may belong to a category. The classifier will try matching objects to their categories by looking at the objects' features. It does so by consulting its memory filled with knowledge gathered from training examples.
+
+Classifying a feature-set results in the highest product of 1) the probability of that category to occur and 2) the product of all the features' probabilities to occure in that category:
+
+```classify(feature1, ..., featureN) = argmax(P(category) * PROD(P(feature|category)))```
+
+This is a so-called maximum a posteriori estimation. Wikipedia actually does a good job explaining it: http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Probabilistic_model
+
+Learning from Examples
+------------------
+
+Add knowledge by telling the classifier, that these features belong to a specific category:
+
+```java
+String[] positiveText = "I love sunny days".split("\\s");
+bayes.learn("positive", Arrays.asList(positiveText));
+```
+
+Classify unknown objects
+------------------
+
+Use the gathered knowledge to classify unknown objects with their features. The classifier will return the category that the object most likely belongs to.
+
+```java
+String[] unknownText1 = "today is a sunny day".split("\\s");
+bayes.classify(Arrays.asList(unknownText1)).getCategory());
+```
+
 Example
 ------------------
 
@@ -37,16 +68,44 @@ System.out.println( // will output "negative"
     Arrays.asList(unknownText1));
 
 // Change the memory capacity. New learned classifications (using
-// learn method are stored in a queue with the size given here and
-// used to classify unknown sentences.
+// the learn method) are stored in a queue with the size given
+// here and used to classify unknown sentences.
 bayes.setMemoryCapacity(500);
 ```
 
 Forgetful learning
 ------------------
 
-This classifier is forgetful. This means, that the classifier will forget recent classifications it uses for future classifications after - defaulting to 200 - classifications learned.
-This will ensure, that the classifier can react to ongoing changes in the user's habbits.
+This classifier is forgetful. This means, that the classifier will forget recent classifications it uses for future classifications after - defaulting to 1.000 - classifications learned. This will ensure, that the classifier can react to ongoing changes in the user's habbits.
+
+
+Interface
+------------------
+The abstract ```Classifier<T, K>``` serves as a base for the concrete ```BayesClassifier<T, K>```. Here are its methods. Please also refer to the Javadoc.
+
+* ```void reset()``` Resets the learned feature and category counts.
+* ```Set<T> getFeatures()``` Returns a ```Set``` of features the classifier knows about.
+* ```Set<K> getCategories()``` Returns a ```Set``` of categories the classifier knows about.
+* ```int getCategoriesTotal()``` Retrieves the total number of categories the classifier knows about.
+* ```int getMemoryCapacity()``` Retrieves the memory's capacity.
+* ```void setMemoryCapacity(int memoryCapacity)``` Sets the memory's capacity.  If the new value is less than the old value, the memory will be truncated accordingly.
+* ```void incrementFeature(T feature, K category)``` Increments the count of a given feature in the given category.  This is equal to telling the classifier, that this feature has occurred in this category.
+* ```void incrementCategory(K category)``` Increments the count of a given category.  This is equal to telling the classifier, that this category has occurred once more.
+* ```void decrementFeature(T feature, K category)``` Decrements the count of a given feature in the given category.  This is equal to telling the classifier that this feature was classified once in the category.
+* ```void decrementCategory(K category)``` Decrements the count of a given category.  This is equal to telling the classifier, that this category has occurred once less.
+* ```int featureCount(T feature, K category)``` Retrieves the number of occurrences of the given feature in the given category.
+* ```int categoryCount(K category)``` Retrieves the number of occurrences of the given category.
+* ```float featureProbability(T feature, K category)``` (*implements* ```IFeatureProbability<T, K>.featureProbability```) Returns the probability that the given feature occurs in the given category.
+* ```float featureWeighedAverage(T feature, K category)``` Retrieves the weighed average ```P(feature|category)``` with overall weight of ```1.0``` and an assumed probability of ```0.5```. The probability defaults to the overall feature probability.
+* ```float featureWeighedAverage(T feature, K category, IFeatureProbability<T, K> calculator)``` Retrieves the weighed average ```P(feature|category)``` with overall weight of ```1.0```, an assumed probability of ```0.5``` and the given object to use for probability calculation.
+* ```float featureWeighedAverage(T feature, K category, IFeatureProbability<T, K> calculator, float weight)```Retrieves the weighed average ```P(feature|category)``` with the given weight and an assumed probability of ```0.5``` and the given object to use for probability calculation.
+* ```float featureWeighedAverage(T feature, K category, IFeatureProbability<T, K> calculator, float weight,  float assumedProbability)``` Retrieves the weighed average ```P(feature|category)``` with the given weight, the given assumed probability and the given object to use for probability calculation.
+* ```void learn(K category, Collection<T> features)``` Train the classifier by telling it that the given features resulted in the given category.
+* ```void learn(Classification<T, K> classification)``` Train the classifier by telling it that the given features resulted in the given category.
+
+The ```BayesClassifier<T, K>``` class implements the following abstract method:
+
+* ```Classification<T, K> classify(Collection<T> features)``` It will retrieve the most likely category for the features given and depends on the concrete classifier implementation.
 
 Possible Performance issues
 ------------------
@@ -58,7 +117,7 @@ Performance improvements, I am currently thinking of:
 The MIT License (MIT)
 ------------------
 
-Copyright (c) 2012 Philipp Nolte
+Copyright (c) 2012-2014 Philipp Nolte
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal