This package serves as basis for the paper "ORCAS-I: Queries Annotated with Intent using Weak Supervision"
DOI of the paper: https://doi.org/10.1145/3477495.3531737
Create conda environment:
$ conda create --name intents_labelling python==3.8.12
Activate the environment:
$ source activate intents_labelling
Use pip to install requirements:
(intents_labelling) $ pip install -r requirements.txt
Install intents_labelling
package for development
(intents_labelling) $ pip install -e .
Install spacy
language model:
(intents_labelling) $ python -m spacy download en_core_web_lg
List of movie titles can be found here.
Put all data files in data/input/
directory.
Create a training set which will be a sample of ORCAS dataset. Filter out testset examples
(intents_labelling) $ python intents_labelling/create_train_file.py
Create snorkel annotations
(intents_labelling) $ python intents_labelling/main.py
Train Bert model
(intents_labelling) $ python intents_labelling/models/train_bert_classifier.py