This project extracts features around the "it" pronoun from text. The features can then be fed to WEKA binary classification models to predict whether a specific instance of "it" is clause (referring to a clause) or nominal (referring to a specific noun phrase) anaphoric.
- This project was built and run using Java™ SE Runtime Environment (build 13+33)
- Build
pom.xml
using Maven to get all dependencies. - Run
main
function insrc/main/java/ca/uottawa/csi5137b/pipelines/FeatureExtractionPipeline.java
- The output file will be placed in the same
io
folder along with theinput.txt
.
src/main/resources/io/output.csv
Without labels, both classes merged into one file after removing duplicates and cleanup:
src/main/resources/io/input.txt
Split by label:
src/main/resources/data/clause.txt
and src/main/resources/data/nominal.txt
CSI 5137B Course at U Ottawa