Warning: This project is under heavy development. Things will break!
First of all you will need a new classifier:
(require '[clj_naive_bayes.core :as nb])
(def my-classifier (nb/new-classifier {:name :ngram-nb :ngram-size 2 :ngram-type :multinomial}))
-
:name : Currently
:ngram-nb
,:multinomial-nb
and:binary-nb
are supported. (Default:multinomial-nb
) -
:ngram-size : Sets ngram size. (Default 2)
-
:ngram-type : Whether the ngram should be
:binary
or:multinomial
-
:boost-start : Boolean. (Default
false
). This flag has only effect with ngrams. -
:keep-sorted : Boolean. (Default
false
). With this flag on all tokens in ngram keys are stores in alphabetical order.
Suppose you have a training dataset. This should be a CSV file, consisting of
lines with <document,class>
or <document,class,count>
elements. In the
second case, the count
column should contain the number of occurences of each
sample. This is purely for space-saving purposes, so e.g. instead of using five
lines of the same <document,class>
pair, a single <document,class,5>
line
can be used instead.
(require '[clj_naive_bayes.train :as train])
(train/parallel-train-from-file my-classifier "resources/train.csv" :limit 400000)
Now we can try classifying a new document:
(nb/classify my-classifier "iphone 6s")
=> "40"
This could be useful for e.g. persisting the classifier:
(def out (nb/export a))
=> #'user/out
(keys out)
=> (:terms :cats)
(use 'clj_naive_bayes.core)
(use 'clj_naive_bayes.eval)
(def logs (parallel-classifications my-classifier "resources/test.json"))
Currently only file disk persistance is supported. Suppose you have a trained
classifier named my-classifier
you can write it to a file:
(use 'clj_naive_bayes.utils)
(persist-classifier my-classifier "resources/data.clj")
And later on load it:
(use 'clj_naive_bayes.utils)
(load-classifier my-classifier "resources/data.clj")
lein test
will run all tests.
lein test [TEST]
will run only tests in the TESTS namespaces.
lein kibit
will analyze code
lein marg
will produce documentation under /docs