-
Notifications
You must be signed in to change notification settings - Fork 42
TEES CNN BioNLP18
This page describes how to start using the Keras deep learning library with TEES. For more information please see the publication "Biomedical Event Extraction Using Convolutional Neural Networks and Dependency Parsing" (to appear) and the supplementary material.
The Keras based Detector classes are currently available in the latest TEES development branch. To get started, update your local copy of TEES to use this branch.
Keras models are specific for certain program versions, and although they can work with other versions, it's safer to use the exact Keras version for which a model was trained. You can install the version of Keras used by TEES with a command like:
pip install keras==2.0.8
The TEES CNN system relies on the biomedical word2vec embeddings dataset by Pyysalo et al.. Please download the required word2vec file. Note that this is a very large file, so downloading may take a while. After you have downloaded the file, add the required local settings variable using the TEES configuration program:
python configure.py --key W2VFILE --value /path/to/wikipedia-pubmed-and-PMC-w2v.bin
Alternatively, you can manually add to your TEES local settings file the required variable:
W2VFILE = '/path/to/wikipedia-pubmed-and-PMC-w2v.bin'
Download the model(s) you want to use from the B2Share record. Classification with these models works the same as with all other TEES models. To classify e.g. the GE09 development set with the corresponding CNN model, use the command:
python classify.py -m /path/to/GE09-single.zip -i GE09-devel -o [OUTSTEM]
Please note that the "*-mixed_ensemble" models cannot be used to classify the development set, as their component models have been trained on a randomly redivided sets of corpus training and development set documents.
Training a Keras CNN neural network model for the TEES system will effectly require hardware acceleration, although small test runs can be performed also on the CPU. Please see the Keras documentation for configuring the training process to use your GPU.
The interface for training TEES Keras CNN is the same as for regular TEES and can be used via train.py. However, the training options are of course different from the SVM ones. First of all, to train a Keras model, you need to use one of the available KerasDetectors (located in the Detectors subdirectory). You can define the detector class to import with the "--detector" switch, but in most cases you'll want to use a detector chosen automatically based on the type of the input corpus. To automatically use the most applicable Keras detector, you can use the generic "--detector keras" option.
When training a neural network it is not possible to refit the classifier to include the development dataset, so therefore training a separate development and test models is not possible. To train just a single model, you can use the options "--develModel model --testModel None".
The biggest change in using a Keras detector instead of the regular SVN ones is in the example options, used to configure the neural network structure instead of the SVM example generation. When you use train.py to train a Keras model, sensible default options are provided for supported corpora. To define just a few new parameter while using default options for the rest you can use the "override" and "override_all" example styles. The "override" style will replace styles for a single stage in a multi-stage classifier (e.g. edge, modifier) while the "override_all" will replace styles for all stages.
As an example, you could train a Keras CNN model for the GE09 corpus with the following command:
python train.py -t GE09 --detector keras --develModel model --testModel None -o [OUTDIR] --exampleStyle override_all:nf=256,512:path=0,2,4:do=0.1,0.2,0.5:dense=400,800:mods=20
Here the TEES train.py is called in much the same way as when using the original SVM system but with some new parameters as explained above. Next, we will look at how to use the "--exampleStyle" parameter to configure the CNN training process in detail.
The TEES Keras system does not use an exhaustive grid search to optimize parameters as they are usually too numerous for that. Instead, multiple values can be passed for the example style parameters and during training each model will be trained with a random combination of these parameters. We can see from the above example that several of the parameters (such as 'path' or 'dense' are given multiple values in a comma-separated list). The final style option, 'mods', defines how many individual neural network models you want the system to train for each classification stage. All of these models will be initialized with a random combination from the defined parameter lists, along with a randomized initial state of the neural network. At the end of the training process, the model with the best performance on the development set (highest task-specific F-score) will be used as the final model.
There are many different parameters that can be used to configure the Keras CNN model trained by the TEES detectors. For a detailed discussion of the neural network used in the system please refer to Björne et al. (2018). The overall network structure is the same for all classification tasks and is shown in the image below:
The model parameters affect the construction and parameters of the neural network. The following options can be defined as example style parameters.
Common parameters
Parameter | Default | Description |
---|---|---|
override | replace the classification task default values with the user-defined parameters | |
override_all | replace the default values for all classification tasks with the user-defined parameters | |
kernels | 1,3,5,7 | kernel sizes for the convolutional layers |
cact | 'relu' | convolutional layer activation |
nf | 32 | number of filters in the convolutional layers |
path | 3 | path depth for path embedding features |
do | 0.1 | dropout layers' rate |
dense | 400 | units in the dense layer |
wv | The W2VFILE value from the TEES settings file | word vector file path |
wv_mem | 100000 | Number of vectors to read in memory |
wv_map | 10000000 | Number of vectors to access as a memory mapped file |
skip | skip these embedding groups | |
limit | use only these embedding groups | |
de | 8 | Embedding group (path, POS, etc.) dimensionality |
cm | 'auto' | classification mode, one of 'binary', 'multilabel' or 'multiclass', or 'auto' for corpus annotation based |
epochs | 100 | max epochs to train each model for |
patience | 10 | early stopping patience (number of epochs) |
batch | 64 | training batch size |
lr | 0.001 | learning rate |
opt | 'adam' | optimizer |
mods | 1 | number of individual randomized models to train |
ens | 1 | The n-best models to use for an ensemble classifier |
train | The fraction of the combined training and development sets to use as the training set when building a mixed model ensemble |
Stage specific parameters
Detector | Parameter | Default | Description |
---|---|---|---|
entity, modifier | el | 21 | example length in tokens, centered on candidate token |
edge, unmerging | el | limit for example length in tokens | |
edge, unmerging | ol | 5 | outside length, number of tokens to include before and after edge or event |
unmerging | binary | False | Do not use separate event types in unmerging classification |
Model ensembles can be used to improve system performance and compensate for the randomness of the neural network training process model. For a detailed discussion of the ensemble system design and purpose please see Björne et al. (2018). In this document we will show how to train non-ensemble (single) models, ensemble models and mixed ensemble models. To train a regular, non-ensemble TEES Keras model you need to define only the number of models ('mods') example style. For example, if you use the value 'mods=20' then the neural network training process will be repeated 20 times with different random initial weights and different random parameter combinations. After all of these models have been trained, the best one is kept as the final classifier.
To train an ensemble system instead, you can define a value larger than 1 (and smaller or equal to 'mods') for the 'ens' parameter. For example, if you use the settings 'mods=20:ens=5' then the system will again train 20 randomized models, but keep the best five of these as a classifier ensemble. The classification predictions of the system will then be an average of the predictions of these five models.
Unlike classifiers such as SVM:s, a trained neural network model cannot be "refit" to include also the parameter optimization (development) dataset, since some external data is always required to detect the optimal epoch for stopping the training. Since a relatively large portion of the data is required for the development set, this limits the potential of the system to learn from all available data. The TEES ensemble system can be used to work around this limitation by randomly redistributing documents between the training and development sets for each model to be trained. To train such a mixed ensemble, you can use for example the parameters 'mods=20:ens=5:train=0.9'. Here again 20 models are trained and the best five are used as a model ensemble, but for each model the training and development sets are randomly mixed by redistributing the documents in a train:devel=9:1 proportion.
The commands for training the single, ensemble and mixed ensemble models for the GE09 corpus are therefore:
python train.py -t GE09 --detector keras --develModel model --testModel None -o [OUTDIR] --exampleStyle override_all:nf=256,512:path=0,2,4:do=0.1,0.2,0.5:dense=400,800:mods=20
python train.py -t GE09 --detector keras --develModel model --testModel None -o [OUTDIR] --exampleStyle override_all:nf=256,512:path=0,2,4:do=0.1,0.2,0.5:dense=400,800:mods=20:ens=5
python train.py -t GE09 --detector keras --develModel model --testModel None -o [OUTDIR] --exampleStyle override_all:nf=256,512:path=0,2,4:do=0.1,0.2,0.5:dense=400,800:mods=20:ens=5:train=0.9
For examples of how to train different Keras CNN models using the TEES system please see also the log files included in the supplementary material.
If you use the TEES CNN version, please cite the following publication:
@InProceedings{bjorne18event,
author = {Bj{\"o}rne, Jari and Salakoski, Tapio},
title = {Biomedical Event Extraction Using Convolutional Neural Networks and Dependency Parsing},
booktitle = {Proceedings of the BioNLP 2018 workshop},
year = {2018},
publisher = {Association for Computational Linguistics},
pages = {98--108},
location = {Melbourne, Australia},
url = {http://aclweb.org/anthology/W18-2311}
}
Please also cite all the resources you use via TEES, e.g. the word2vec embeddings dataset by Pyysalo et al..