"# InappropriateQueryDetection"
Assuming the raw data exists in the data_full directory with one query per line with format:
\t
where a query is considered offensive if the score is 0.2 or higher.
- To run this programs you will need following things.
- Java >=1.6
- Ant (to compile)
- libSVM to run (svm-train and svm-predict) commands
- etools for sampling tsv file
Use train file to extract features. The command is
sh run.sh io.FormatData
Usage: sh run.sh -Doperation=extractFeatures io.FormatData
OR
input-data - This is raw data explained above feature-map-output - Feature map will be outputted here in \t format
input-feature-map - This is the same feature map file acting as input svm-data-output - output file where the svm data will be stored
Once the features are extracted using Step:1 and feature map file is generated. Convert train and test file in SVM format by running following command:
sh run.sh -Doperation=svmFormat eval.Eval
Use the generated train data to train svm using libsvm. Issue following command,
./svm-train -c 0 -t 0
Classify test using the trained model using following command:
./svm-predict
Issue following command to generate P, R and F1 scores.
sh run.sh eval.Eval
This will generate output: sh run.sh eval.Eval
the same file which was used in Step:4 as the same file which was used in Step:4 as
If interested try different cost "c" from 0.1 to 1.0 with intermediate values such as (0.1, 0.3, 0.5, 0.7, 0.9)