The Java-projects in this repository were originally written as part of a course. We were free to choose what to implement, so I implemented Hunt's algorithm from a brief description of the algorithm found in a book. The report, which is written in the style of a paper, can be found here:
There are several Java-programs in the repository. The main class for the most interesting one, which uses Hunt's Algorithm, can be found here:
It reads a CSV-file with training data, and produces a JSON-structure as output. This JSON-structure is a decision tree, that can be read and used in other programs.
The implementation only handles boolean (true/false) values.
The main method of LearnerClient calls methods on the Learner-class. The main method of the client in combination with the Learner-class is a good place to start reading to understand the implementation. Note that the class resides in a separate library project, and it can easily be used in other Java-applications.
- Java 7 JDK must be installed
- Maven must be installed.
- Maven download: https://maven.apache.org/download.cgi
- Maven install instructions: https://maven.apache.org/download.cgi#Installation_Instructions
- Ubuntu-users can install with apt-get
- Cd into the
simplehunts
directory - Run
mvn package
- Cd into the
target
directory - In the directory you will now have 4 executable jar-files
-
Dataset generator for training and evaluation data
java -jar csvgenerate.jar 20000 0.5 training.csv java -jar csvgenerate.jar 20000 0.5 evaluation.csv
The first parameter (20000) is for generating 20000 random records. The second parameter means a distribution of 50% true records (matching ME criteria). The third parameter is the output file name.
-
Learning algorithm (Hunt's algorithm)
java -jar learn.jar training.csv output.json
Reads the file training.csv and outputs a JSON-file with the decision tree. The filename param for JSON is optional.
-
Evaluation of accuracy
java -jar evaluate.jar output.json evaluation.csv
Reads the decision tree from the JSON-file and uses the evaluation CSV-file for calculating accuracy
-
Using the decision tree for asking questions
java -jar classify.jar output.json