clj-naive-bayes

Warning: This project is under heavy development. Things will break!

Usage

First of all you will need a new classifier:

(require '[clj_naive_bayes.core :as nb])

(def my-classifier (nb/new-classifier {:name :ngram-nb :ngram-size 2 :ngram-type :multinomial}))

Available options are

:name : Currently :ngram-nb, :multinomial-nb and :binary-nb are supported. (Default :multinomial-nb)
:ngram-size : Sets ngram size. (Default 2)
:ngram-type : Whether the ngram should be :binary or :multinomial
:boost-start : Boolean. (Default false). This flag has only effect with ngrams.
:keep-sorted : Boolean. (Default false). With this flag on all tokens in ngram keys are stores in alphabetical order.

Train

Suppose you have a training dataset. This should be a CSV file, consisting of lines with <document,class> or <document,class,count> elements. In the second case, the count column should contain the number of occurences of each sample. This is purely for space-saving purposes, so e.g. instead of using five lines of the same <document,class> pair, a single <document,class,5> line can be used instead.

(require '[clj_naive_bayes.train :as train])

(train/parallel-train-from-file my-classifier "resources/train.csv" :limit 400000)

Classify

Now we can try classifying a new document:

(nb/classify my-classifier "iphone 6s")
=> "40"

Export Probabilities to a Hashmap

This could be useful for e.g. persisting the classifier:

(def out (nb/export a))
=> #'user/out
(keys out)
=> (:terms :cats)

Evaluate Performance

(use 'clj_naive_bayes.core)
(use 'clj_naive_bayes.eval)

(def logs (parallel-classifications my-classifier "resources/test.json"))

Persist classifiers

Currently only file disk persistance is supported. Suppose you have a trained classifier named my-classifier you can write it to a file:

(use 'clj_naive_bayes.utils)

(persist-classifier my-classifier "resources/data.clj")

And later on load it:

(use 'clj_naive_bayes.utils)

  (load-classifier my-classifier "resources/data.clj")

Testing

lein test will run all tests.

lein test [TEST] will run only tests in the TESTS namespaces.

Tooling

Kibit

lein kibit will analyze code

Marginalia

lein marg will produce documentation under /docs

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
src/clj_naive_bayes		src/clj_naive_bayes
test/clj_naive_bayes		test/clj_naive_bayes
.gitignore		.gitignore
README.md		README.md
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clj-naive-bayes

Usage

Available options are

Train

Classify

Export Probabilities to a Hashmap

Evaluate Performance

Persist classifiers

Testing

Tooling

Kibit

Marginalia

About

Releases

Packages

Contributors 2

Languages

chief/clj-naive-bayes

Folders and files

Latest commit

History

Repository files navigation

clj-naive-bayes

Usage

Available options are

Train

Classify

Export Probabilities to a Hashmap

Evaluate Performance

Persist classifiers

Testing

Tooling

Kibit

Marginalia

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages