-
Notifications
You must be signed in to change notification settings - Fork 16
TestExtractor Tutorial
Extraction means extracting types within documents (such as names or places). TestExtractor
tasks take text data as input. For this example we will use sample1.test
as the testing data. These samples are built into the code, so they require no additional setup. To see how to label and load your own data for this task, look at the Labeling and Loading Data Tutorial.
This experiment loads an extractor which has been saved by running TrainExtractor
and tests the extractor's performance on labeled test data. The experiment outputs statistics on token and span precision, recall, and error rates.
- To run this type of task using the GUI do:
$ java –Xmx500M edu.cmu.minorthird.ui.TestExtractor –gui
-
A window will appear. To view and change the parameters of the experiment press the
Edit
button located next toTestExtractor
. AProperty Editor
will appear: -
To view what each parameter does and/or how to set it, click the
?
button next to each field. The parameters that must be entered for the experiment to run areadditionalParameters
(-loadFrom
),baseParameters
(-labels
) andsignalParameters
(-spanType
or–spanProp
). All other parameters either have defaults or are not required. There are 4 bunches of parameters that can be modified for running aTestExtractor
experiment:
-
Name the saved extractor in the
loadFrom
text field (if you do not have a saved extractor, look at the TrainExtractor Tutorial to see how to create one). -
Specify the testing data for the experiment must be entered by specifying a
labelsFilename
. Since the samples are built into the code,sample1.test
can simply be typed into the text field underlabelsFilename
to load the data. Note: data from a directory can be loaded by using theBrowse
button. -
To save the results from the experiment, enter a file to which to write the results in the
saveAs
text field. Note: this is optional, but useful for comparing results later. -
Once
labelsFilename
is specified, click theEdit
button next tosignalParamters
. Important:labelsFilename
must be specified BEFORE clickingEdit
. AnotherProperty Editor
will appear; selecttrueName
from the pull down menu. Then press theOK
button to closeProperty Editor for signalParameters
:
-
Feel free to try changing any of the other parameters including the ones in advanced options. Click on the help buttons to get a feeling for what each parameter does and how changing it may affect your results. Once all the parameters are set, click the
OK
button on theProperty Editor
. -
Press the
Show Labels
button if you would like to view the input data for the extraction task. This will pop up the sameTextBaseViewer
that you would see if you ranViewLabels
on the train data. -
Now press
Start Task
under execution controls. The task will vary in the amount of time it takes depending on the size of the data set, but extraction tasks usually take a minute or two. When the task is finished, the error rates will appear in the output text area along with the total time it took to run the experiment. -
Now that the experiment has run, the results can be seen. In order to look at the details of your results, click the
View Results
button in theExecution Controls
section. Click on theEvaluation
tab to see the precision rates of the experiment. UnlessshowTestDetails
has been deselected (in theadvancedOptions
menu of splitter parameters), there will be aFull Test Set
tab. When this tab is selected, one can compare whatever is labeled (in this case name) to what the learner predicted. When comparing green mean true positive, blue means false negative, and yellow mean false positive. You can also click on thespanTypes
tab and select a color and a spam type to highlight. Make sure that you reset controls before highlighting or comparing. After making a selection, clickApply
to see the result. -
To view the evaluation results for the experiment, click the
Evaluation
tab at the top of the window:
-
Precision
- # units predicted correctly / # units predicted -
Recall
- # of units predicted correctly / # total units -
F1
- overall evaluation of performance
- Press the
Clear Window
button to clear all output from the output and error messages window. This is useful if you would like to run another experiment.
- To get started using the command line for an extractor experiment do:
$ java –Xmx500M edu.cmu.minorthird.ui.TestExtractor –help
Note: You can enter as many command line arguments as you like along with the –gui
argument. This way you can use the command line to specify the parameters that you would like and use and use the GUI to set any additional parameters or view the results.
2. Show options: specifying these options allow one to pop up informative windows from the command line:
-
-showData
– interactively show the dataset in a new window -
-showLabels
– view the training data and its labels -
-showResult
– displays the experiment result in a new window
- The first thing you probably want to enter on the command line is the data you would like to test on. To do this type
–labels
and the repository key of the dataset you would like to use. For this experiment you should use the option–labels sample1.test
. - Now you either want to specify the saved extractor you would like to load with
–loadFrom sample1.ann
. - The next necessary parameter to name is either
spanProp
orspanType
. To specify this parameter, type–spanType TYPE
. For this datasetTYPE
can either be real or spam, so use:-spanType trueName
. - Specify complex parameters on the command line using the
–other
option. See the Command Line Other Option Tutorial for details.