Skip to content

Classify Package Tutorial

linfrank edited this page Aug 16, 2012 · 1 revision

Classify Package Tutorial

The classification package is for using MinorThird for general classification and categorization tasks - the input does not have to be text data. All you need is a class label and a list of features for each data point. For example, consider this data for whether to play tennis or not:

The example above is a sample of simple (non-sequential) data. b indicates that class labels are binary (i.e., it can only be POS or NEG). For multi-class (k-ary) data, use k instead of b.

The classify package can also evaluate sequential data. For example, to make the dataset above sequential, you may want to consider whether or not you played tennis the day before. In this case, you can list the dataset in sequences with a * between each sequence. For example:

b week1 NEG sunny humid temp=85
b week1 POS sunny humid temp=90
*
b week2 POS sunny dry temp=76
b week2 NEG sunny humid temp=80
...

IMPORTANT: to load a sequential dataset, you must specify the –type seq option before the –data option on the command line.

Using the Command Line

Train

You can train a classifier on given data:

$ java edu.cmu.minorthird.classify.Train –data simpleClassifyData.train –saveAs simpleClassify.ann

Note: to see what other options you have, to using the –help option.

The output should look like this:

Notice that there are no results since this is just training.

Test

To test the trained classifier on given data:

$ java edu.cmu.minorthird.classify.UI –type seq –op test –data sub_sig.test –classifierFile sub_sig_class.eval

Using the GUI

Train

To run an experiment in the GUI, first type the command:

$ java –Xmx500M edu.cmu.minorthird.classify.UI –gui

When the window appears, click the Edit button under Parameter Modification. This will make the Property Editor window appear. The _operation: field for the property editor should be set to trainTest, so you will want to change it to train:

Once again, make sure that you check the sequentialMode checkbox before specifying your dataset if your data is sequential. Remember, the data will not load if you do not check this button before specifying your data. Next specify the datasetFilename by clicking the Browse button next to the datasetFilename field and selecting your data file.

These next fields have defaults, but can be changed:

  • learner - You can specify the learner for this experiment by clicking on the pull down menu next to learnerInSequentialMode, NOT next to learner. Since this is a sequential experiment, only the learner from the learnerInSequentialMode menu will be used.
  • splitter - Specify the splitter by choosing one from the pull down menu next to splitter. You can change any of the splitter options by clicking the Edit button to the right of the pull down menu.

Now since this is a training experiment, you want to save the learned classifier so you can use it on future test data. To save your classifier, type what you would like to name the file with a .eval extension in saveAsFilename text field. For example, you can save your classifier in a file named myClassifier.eval. You do NOT need to specify the testDatasetFilename since this is only a training experiment.

Click OK to save these parameters, and click the Start Task button to start the experiment. When the experiment finishes you will notice that there are no results since this is only a training experiment. Your learned classifier will be saved in your current directory.

Test

If a GUI window is not currently open, first type the command:

$ java –Xmx500M edu.cmu.minorthird.classify.UI –gui

When the window appears, click the Edit button under Parameter Modification. This will make the Property Editor window appear. The _operation: field for the property editor should be set to trainTest or train, so you will want to change it to test.

Next you need to specify your classifier file name. To do this click the Browse button next to the classifierFilename text field and find the .eval file which you saved your classifier as.

Once again, make sure that you check the sequentialMode checkbox before specifying your dataset if your data is sequential. Remember, the data will not load if you do not check this box before specifying your data. Next specify the datasetFilename by clicking the Browse button next to the datasetFilename field and selecting your data file.

Click the OK button to save these parameter and click the Start Task button to run the experiment.

Training and Testing Using a Single Dataset

Using the Command Line

To learn what options are available from the command line type:

$ java –Xmx500M edu.cmu.minorthird.classify.UI –help

Lets first try running a classification experiment on a sequential dataset. When using a sequential dataset, it is important to use the –seq option so that the program can properly process the data. Make sure the –seq option is declared before the dataset.

To run an experiment, type:

$ java –Xmx500M edu.cmu.minorthird.classify.UI –op trainTest –seq –data sub_sig.data

Note: If you get a java.lang.reflect.InvocationTargetException it probably means that you did not specify the –seq option before the dataset.

The output of the program should look like this:

Using the GUI

To run an experiment in the GUI, first type the command:

$ java –Xmx500M edu.cmu.minorthird.classify.UI –gui

When the window appears, click the Edit button under Parameter Modification. This will make the Property Editor window appear. The _operation: field for the property editor should already be set to trainTest, which is what you want for this experiment. The next very important step is the check the sequentialMode checkbox towards the bottom of the window. If this button is not checked, you will not be able to properly load your data.

Note: if the data is taking a long time to load, the sequentialMode box is most likely inappropriately checked. Try closing the window and trying again.

Next specify the datasetFilename by clicking the Browse button next to the text field. Find the directory where you saved the file, select the file, and click Open.

These next fields have defaults, but can be changed:

  • learner - You can specify the learner for this experiment by clicking on the pull down menu next to learnerInSequentialMode, NOT next to learner. Since this is a sequential experiment, only the learner from the learnerInSequentialMode menu will be used.
  • splitter - Specify the splitter by choosing one from the pull down menu next to splitter. You can change any of the splitter options by clicking the Edit button to the right of the pull down menu.

Click the OK button to close the Property Editor and press the Start Task button to begin the experiment.