A barebones (Distil)BERT pipeline for token classification tasks driven by catalyst.
- In your virtual environment run
pip install -e .
- Check experiment.py for loading train/test data. At the moment the pipeline assumes two JSON lines files containing
['content', 'tagged_attributes']
columns, wheretagged_attributes
is a list of substrings incontent
. - Possibly modify dataset.py to suit your data preprocessing needs. The pipeline makes assumption that there are two classes of tokens.
- Start training your model
catalyst-dl run -C bert_ner/config.yml
Run the following command to see metrics in Tensorboard
CUDA_VISIBLE_DEVICE="" tensorboard --logdir=./logs