Skip to content
linfrank edited this page Aug 16, 2012 · 27 revisions

Prerequisites

The only third-party software that is required to run MinorThird is Java (version 1.5.0 or later). If you simply plan on conducting experiments using the provided tools you can just install the JRE. However, if you plan on compiling MinorThird yourself, making additions to the API's or using them in your software, then you will need the JDK as well as the Ant (version 1.6.5 or later) Java build utility. Follow the installation instructions provided with Java and Ant for their installation and configuration.

Getting a MinorThird Distribution

There are two ways to obtain the MinorThird distribution.

Option 1: Download the one of the latest package releases from the MinorThird repo on GitHub:

  • Download a full release of the system, including the source code, applications, and all required libraries , from the tags page. The file names are of the form YYYYMMDD.zip.
  • Download a full binary distribution in a single jar file containing the pre-compiled MinorThird packages as well as required libraries from the downloads page. The file names are of the form minorthird_YYYYMMDD.jar.
  • Download a classify binary distribution containing only the pre-compiled classification packages and required libraries from the downloads page. The file names are of the form minorthird-classify_YYYYMMDD.jar.

Option 2: Clone the MinorThird repository to get the most up-to-date (unreleased) changes to the code base.

  1. Install git.
  2. The GitHub bootcamp provides tutorials if you are new to git and GitHub.
  3. Clone the MinorThird repository at [email protected]:TeamCohen/MinorThird.git
  4. Once you have cloned the source, keep it updated with the git pull command.

Compiling The Source

If you have elected to download or checkout the MinorThird source and compile it yourself, then execute the following steps from a command shell:

  1. Change the current directory to where you checked out or unzipped the source (e.g., $MINORTHIRD). The rest of the steps assumes the current directory is $MINORTHIRD.
  2. Run the setup script for your operating system to set up the CLASSPATH environment variable:
  • Using Windows command prompt execute:
> script\setup
  • Using Cygwin execute:
> source script/setup.sh
  • Using Linux execute:
$ source script/setup.linux
  1. To compile the code run the following command:
$ ant build-clean
  1. To generate the Javadoc for the API run:
$ ant javadoc
  1. To test that everything was compiled successfully run:
$ ant test

Using MinorThird

Now that your have MinorThird installed you can begin using it to conduct classification and extraction experiments. The basic steps to conducting an experiment are:

  1. Train an annotator (classifier or extractor) on sample data.
  2. Run this annotator on test data and analyze its performance.
  3. Change the settings and repeat to find the optimal setting.
  4. Apply the best annotator on the "real" data of interest.

There are two ways to do the above steps in MinorThird: using its built-in UI tools or using it as a library via its API.

Using the MinorThird UI Tools

You can use the provided MinorThird UI tools to directly conduct an experiment. MinorThird provides many tools for executing one or more of the steps in the experiment process as well as utilities that combine some of the steps to make the process a little easier. All MinorThird tools are invoked via the command line, but can be used in one of two ways: graphically or through the command line.

To use the graphical version of any tool simply supply the -gui argument to command. Execute the following command for an example:

$ java edu.cmu.minorthird.ui.TrainExtractor -gui

A window should appear. This window is the main experiment control window for all GUI apps in MinorThird. The top section named Parameter Modification shows what program is being executed. Pressing the Edit button allows you to adjust the parameters of the program. The middle section contains the buttons that control the experiment. Once you have set all the options in the top section, press Start Task to execute the program you have chosen to run. Any output that the program generates will be printed to the bottom section labeled Error Messages and output. Finally, once the execution is complete the View Results button will be enabled. Clicking on this button will pop up a window that shows the results of your experiment. These controls are the same for virtually every program in the MinorThird suite.

Other arguments may be provided on the command line in addition to -gui. The invoked program will read the provided values and pre-populate the fields in the GUI. Execute the following command to see this in action:

$ java edu.cmu.minorthird.ui.TrainExtractor -labels sample1.train -gui

Once the window appears, click on the Edit button and notice that the field named labelsFilename in the baseParameters section is populated with the value supplied to the -labels argument on the command line.

To see the list of all possible arguments that a tool accepts simply provide the -help argument:

$ java edu.cmu.minorthird.ui.TrainExtractor -help

Once you are comfortable setting up experiments in the GUI, you will probably want to just supply all the parameters on the command line and skip the GUI altogether. Keep in mind that each tool has a set of minimal parameters required for proper execution. All of the arguments listed using -help, that are enclosed in square brackets ([ and ]) are optional, and the rest are required. Running from the command line is best demonstrated with the following examples:

$ java edu.cmu.minorthird.ui.TrainExtractor -labels sample1.train -spanType trueName -saveAs sample1.ann
$ java edu.cmu.minorthird.ui.TestExtractor -labels sample1.test -spanType trueName -loadFrom sample1.ann

The first command trains an annotator on the sample1.train dataset (this is a built-in dataset) and saves it in the current directory as sample1.ann. The -spanType argument tells the program to train the annotator to label spans of tokens that it thinks correspond to instances of trueName. The sample1.train dataset contains examples of these instances that are used to train the annotator. The second command tests this trained annotator (specified using the -loadFrom argument) against the sample1.test dataset (also a built-in dataset) and prints the performance to the screen. In this command the -spanType argument tells the program which labels to compare the annotators predictions to.

Step-by-step examples of using each of these tools is available in the [wiki:Tutorials Tutorials] section.

Using the MinorThird API

You can use the MinorThird libraries inside a custom java application to conduct experiments and analyze the results. The most powerful way to utilize the capabilities of MinorThird is to create, run and evaluate experiments inside your own custom application.

Some of the specific advantages of using MinorThird in this way are:

  • It allows you to present the MinorThird tools to a user as an integrated part of your application, with an interface that makes sense in the context of your application.
  • It allows you to automatically run multiple experiments concurrently or in succession using the results of a previous experiment to derive the parameters for the next.
  • You can automate the experiment process eliminating the need for human intervention.
  • You can store the results of experiments (statistics or annotations) in any form you choose (i.e., custom file format or relational database) instead of just the supported MinorThird formats.

The MinorThird API is broken up into 4 main packages:

  • edu.cmu.minorthird.classify
  • edu.cmu.minorthird.text
  • edu.cmu.minorthird.ui
  • edu.cmu.minorthird.util

See the Javadoc for a detailed description of the complete MinorThird API specification.

The basic steps to performing an experiment using the Minorthird API are:

  1. Load your data into a TextBase (extractor) or a DataSet (classifier).
  2. Instantiate an AnnotatorTeacher (extractor) or ClassifierTeacher (classifier).
  3. Configure the teacher.
  4. Instantiate an instance of AnnotatorLearner (extractor) or ClassifierLearner (classifier) the represents the desired learner algorithm.
  5. Configure the learner.
  6. Call teacher.train (learner) to create a trained extractor or classifier.