Step 1: Create a new Maven project
File->New Project->Maven Project click "Next" check "Create a simple project (skip archetype selection)"
for Group Id, enter "edu.colorado.clear.tutorial" for Artifact Id, enter "ClearTkTutorial"
under Parent Project Group Id: org.cleartk Artifact Id: cleartk-release Version: 1.2.0
Step 2: Configure project
edit pom.xml in the root directory of ClearTkTutorial
Click on the dependencies tab and add the following:
groupId: org.cleartk, artifactId: cleartk-util groupId: org.cleartk, artifactId: cleartk-typesystem groupId: org.cleartk, artifactId: cleartk-ml groupId: org.cleartk, artifactId: cleartk-ml-opennlp-maxent groupId: org.cleartk, artifactId: cleartk-ml-opennlp-eval
Step 3: Download and copy necessary files:
Create package in src/main/java called edu.colorado.clear.tutorial
Copy TweetPosAnnotatorReader to edu.colorado.clear.tutorial package
Download tweet-pos data from
Unzip contents into the src/main/resources directory of your ClearTkTutorial project
Step 1: Create a new Maven project
File->New Project->Maven Project click "Next" check "Create a simple project (skip archetype selection)"
for Group Id, enter "edu.colorado.clear.tutorial" for Artifact Id, enter "ClearTkTutorial"
under Parent Project Group Id: org.cleartk Artifact Id: cleartk-release Version: 1.2.0
Step 2: Configure project
edit pom.xml in the root directory of ClearTkTutorial
Click on the dependencies tab and add the following:
groupId: org.cleartk, artifactId: cleartk-util groupId: org.cleartk, artifactId: cleartk-typesystem groupId: org.cleartk, artifactId: cleartk-ml groupId: org.cleartk, artifactId: cleartk-ml-opennlp-maxent groupId: org.cleartk, artifactId: cleartk-ml-opennlp-eval
Right click on your project -> Maven -> Update Project Configuration Right click on your project -> Maven -> Update Dependencies
Step 3: Download and copy necessary files:
Create package in src/main/java called edu.colorado.clear.tutorial.tweetpos
Copy TweetPosAnnotatorReader to edu.colorado.clear.tutorial.tweetpos package
Download tweet-pos data from
Unzip contents into the src/main/resources directory of your ClearTkTutorial project
This lesson should teach you how to query the CAS for Annotations and their associated properties (known as Features)
Create a new class called DumpTweetPos
Make it inherit from org.uimafit.component.JCasAnnotator_ImplBase
Now create a method with the following signature:
@Override public void process(JCas jCas) throws AnalysisEngineProcessException
This method should iterate over and print out all of the "Sentences" in the CAS. Hint: Try playing with
Once you have dumped out sentences, dump out all of the Tokens and their corresponding POS tags. Hint 1: Try playing with JCasUtil.selectCovered(...) Hint 2: Look at the methods within the Token class
This lesson should teach you how to create a pipeline that reads in the tweet-pos data into the CAS and then subsequently calls your DumpTweetPos annotator.
Create a new class called RunDumpTweetPos, check public static void(main String[] args) before clicking Finish
Look at the signatures for SimplePipeline.runPipeline(...).
- What do you need to call this method?
Create a CollectionReaderDescription object Hint 1: UriCollectionReader.getDescriptionFromFiles(...) Hint 2: Arrays.asList
Build an aggregate analysis engine. There are several reasons to use aggregate engines, that are beyond the scope of this lesson. For now start by creating a new AggregateBuilder
You will need to add descriptions of the following analysis engines (Annotators):
- UriToDocumentTextAnnotator
- TweetPosReaderAnnotator
- DumpTweetPos
This lesson should teach you to build your own classifier based annotators, by using CleartkAnnotator, Features, and Instances.
Create a new class called TweetPosTagger. Inherit from CleartkAnnotator
Implement the process method
Things to know Instance instance = new Instance() instance.addAll(...) new Feature(name, value) this.isTraining() this.dataWriter.write(instance) this.classifier.classify(instance)
Advanced stuff CleartkExtractor FeatureFunctionExtractor
This lesson will teach you how to organize your code and pipelines for evaluation of NLP components
Create a new class called TweetPosEval. Inherit from Evaluation_ImplBase<File, AnnotationStatistics>
Implement getCollectionReader Hint: Look at your RunDumpTweetPosPipeline and combine it with CollectionReaderFactory.createCollectionReader
Implement your training pipeline in the train() method Hint 1: This will look very similar to RunDumpTweetPosPipeline Hint 2: Train.main() is what actually packages your extracted data into a model.jar file
- Things to know:
- MaxenStringOutcomeDataWriter.class
- DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY,
- Things to know:
Implement your testing/evaluation pipeine in the test() method
Implement a main method that class your new evaluation class.