-
Notifications
You must be signed in to change notification settings - Fork 16
ApplyAnnotator Tutorial
ApplyAnnotator
adds labels to documents. These labels can be for a whole document (such as spam) or for portions of documents such as names or places. In other words, ApplyAnnotator
accepts the saved result (annotator) of both TrainClassifier
and TrainExtractor
. ApplyAnnotator
may be used on unlabeled documents or labeled documents, which may be helpful for comparing true labels to predicted labels.
This example will use the ClassifierAnnotator
that is saved from the TrainClassifier Tutorial. For a quick reference, here is how to create the annotator from the TrainClassifier Tutorial using the command line:
$ java –Xmx500M edu.cmu.minorthird.ui.TrainClassifier –labels sample3.train –spanType fun –saveAs sample3.ann
This classifier annotator labels documents that are "fun".
To run this type of task start with:
$ java –Xmx500M edu.cmu.minorthird.ui.ApplyAnnotator
Like all UI tasks, all the parameters for ApplyAnnotator may be specified either using the GUI or the command line. To use the GUI, simply type –gui
on the command line. It is also possible to mix and match where the parameters are specified. For example, one can specify two parameters on the command line and use the GUI to select the rest. For this reason, the step-by-step process for this experiment will first explain how to select a parameter value in the GUI and then how to set the same parameter on the command line.
To view a list of parameters and their functions run:
$ java –Xmx500M edu.cmu.minorthird.ui.ApplyAnnotator –help
or
$ java –Xmx500M edu.cmu.minorthird.ui.ApplyAnnotator –gui
Click on the Parameters
button next to Help
or and click on the ?
button next to each field in the Property Editor
to see what it is used for. If you are using the GUI, click the Edit
button next to ApplyAnnotator
. A Property Editor
window will appear:
There are three bunches of parameters to specify for this experiment. A collection of documents (labelsFilename
) and an Annotator (loadFrom
) are required. All other fields are optional. For more information about any of these fields, click on the ?
(help button) next to the field.
-
baseParameters
contains the options for loading the collection of documents. - GUI: enter
sample3.test
in thelabelsFilename
text field;sample3.test
contains labeled documents, but it is useful for comparing true labels to predicted labels. - Command Line: use the
–labels
option followed by the repository key or the directory of files to load. For this tutorial specify–labels sample3.test
. -
saveParameters
contains one parameter for specifying a file to save the result to. Saving is optional, but useful for using result in other experiments or for reference. It is useful to save in the formatLABELS_FILENAME.labels
whereLABELS_FILENAME
is the directory entered in thelabelsFilename
text field. This way MinorThird can automatically load the labels produced by this experiment in another MinorThird task. - GUI: type
sample3.labels
in thesaveAs
text field. - Command Line:
-saveAs sample3.labels
-
loadAnnotatorParams
contains one parameter for specifying the annotator to load. - GUI: enter
sample3.ann
(or the file name you chose for your annotator) in theloadFrom
text field. - Command Line:
-loadFrom sample3.ann
(or the file name you chose for your annotator) - Feel free to try changing any of the other parameters including the ones in
advanced options
. - GUI: Click on the help buttons to get a feeling for what each parameter does and how changing it may affect your results. Once all the parameters are set, click the
OK
button onProperty Editor
. - Command Line: Add other parameters to the command line (use
–help
option to see other parameter options). If there is an option that can be set in the GUI, but there is no specific parameter for setting it in the help parameter definition, the–other
option may be used. To see how to use this option, look at the Command Line Other Option Tutorial.
== Show Labeled Data ==
If you would like to view the input data for the annotator task, specify the showLabels
option. This will pop up the same TextBaseViewer that you would see if you ran ViewLabels on the data.
- GUI: press the
Show Labels
button. - Command Line: add
–showLabels
to the command line.
== Getting and Interpreting Results ==
- GUI: press
Start Task
underExecution Controls
to run the experiment. The task will vary in the amount of time it takes depending on the size of the data set and the annotator. When the task is finished, click on theView Results
button to see how MinorThird has labeled the data. This is the same window that appears when specifying the–showResult
option on the command line. - Command Line: specify
–showResult
.
Note: the annotator will output the type that was chosen for output in the TrainClassifier experiment; the default is _prediction
. To have the annotator output a more informative type (such as predicted_fun
) go back to TrainClassifier and either change the output text field in the Property Editor
or specify –output
on the command line.
In this case, there is both true data and classified data, so it is useful to compare fun
to _prediction
to see how the classifier performed. This is a simple example, so the classifier predicted every document correctly, which is why each document is highlighted green. In general if a document has the first type (in this case the true type fun
) and not the second type (predicted type _prediction
) it will highlight blue, if the document has the second type but not the first, it will appear red, and if the document has both the first and the second type, the document will appear green like above.
If there are no true type labels (the document were unlabeled before running ApplyAnnotator), try highlighting the SpanTypes or SpanProps to see which documents or words were predicted.