TEES CNN BioNLP18

This page describes how to start using the Keras deep learning library with TEES. For more information please see the publication "Biomedical Event Extraction Using Convolutional Neural Networks and Dependency Parsing" (to appear) and the supplementary material.

Installation

The Keras based Detector classes are currently available in the latest TEES development branch. To get started, update your local copy of TEES to use this branch.

Installing Keras

Keras models are specific for certain program versions, and although they can work with other versions, it's safer to use the exact Keras version for which a model was trained. You can install the version of Keras used by TEES with a command like:

pip install keras==2.0.8

Installing Word Vectors

The TEES CNN system relies on the biomedical word2vec embeddings dataset by Pyysalo et al.. Please download the required word2vec file. Note that this is a very large file, so downloading may take a while. After you have downloaded the file, add the required local settings variable using the TEES configuration program:

python configure.py --key W2VFILE --value /path/to/wikipedia-pubmed-and-PMC-w2v.bin

Alternatively, you can manually add to your TEES local settings file the required variable:

W2VFILE = '/path/to/wikipedia-pubmed-and-PMC-w2v.bin'

Classifying with Pre-trained TEES Keras CNN models

Download the model(s) you want to use from the B2Share record. Classification with these models works the same as with all other TEES models. To classify e.g. the GE09 development set with the corresponding CNN model, use the command:

python classify.py -m /path/to/GE09-single.zip -i GE09-devel -o [OUTSTEM]

Please note that the "*-mixed_ensemble" models cannot be used to classify the development set, as their component models have been trained on a randomly redivided sets of corpus training and development set documents.

Training your own TEES Keras CNN classification models

Training a Keras CNN neural network model for the TEES system will effectly require hardware acceleration, although small test runs can be performed also on the CPU. Please see the Keras documentation for configuring the training process to use your GPU.

The interface for training TEES Keras CNN is the same as for regular TEES and can be used via train.py. However, the training options are of course different from the SVM ones. First of all, to train a Keras model, you need to use one of the available KerasDetectors (located in the Detectors subdirectory). You can define the detector class to import with the "--detector" switch, but in most cases you'll want to use a detector chosen automatically based on the type of the input corpus. To automatically use the most applicable Keras detector, you can use the generic "--detector keras" option.

When training a neural network it is not possible to refit the classifier to include the development dataset, so therefore training a separate development and test models is not possible. To train just a single model, you can use the options "--develModel model --testModel None".

The biggest change in using a Keras detector instead of the regular SVN ones is in the example options, used to configure the neural network structure instead of the SVM example generation. When you use train.py to train a Keras model, sensible default options are provided for supported corpora. To define just a few new parameter while using default options for the rest you can use the "override" and "override_all" example styles. The "override" style will replace styles for a single stage in a multi-stage classifier (e.g. edge, modifier) while the "override_all" will replace styles for all stages.

As an example, you could train a Keras CNN model for the GE09 corpus with the following command:

python train.py -t GE09 --detector keras --develModel model --testModel None -o [OUTDIR] --exampleStyle override_all:nf=256,512:path=0,2,4:do=0.1,0.2,0.5:dense=400,800:mods=20

Here the TEES train.py is called in much the same way as when using the original SVM system but with some new parameters as explained above. Next, we will look at how to use the "--exampleStyle" parameter to configure the CNN training process in detail.

Defining example style parameters for TEES Keras CNN models

The TEES Keras system does not use an exhaustive grid search to optimize parameters as they are usually too numerous for that. Instead, multiple values can be passed for the example style parameters and during training each model will be trained with a random combination of these parameters. We can see from the above example that several of the parameters (such as 'path' or 'dense' are given multiple values in a comma-separated list). The final style option, 'mods', defines how many individual neural network models you want the system to train for each classification stage. All of these models will be initialized with a random combination from the defined parameter lists, along with a randomized initial state of the neural network. At the end of the training process, the model with the best performance on the development set (highest task-specific F-score) will be used as the final model.

CNN model parameters

There are many different parameters that can be used to configure the Keras CNN model trained by the TEES detectors. For a detailed discussion of the neural network used in the system please refer to Björne et al. (2018). The overall network structure is the same for all classification tasks and is shown in the image below:

The model parameters affect the construction and parameters of the neural network. The following options can be defined as example style parameters.

Common parameters

Parameter	Default	Description
override		replace the classification task default values with the user-defined parameters
override_all		replace the default values for all classification tasks with the user-defined parameters
kernels	1,3,5,7	kernel sizes for the convolutional layers
cact	'relu'	convolutional layer activation
nf	32	number of filters in the convolutional layers
path	3	path depth for path embedding features
do	0.1	dropout layers' rate
dense	400	units in the dense layer
wv	The W2VFILE value from the TEES settings file	word vector file path
wv_mem	100000	Number of vectors to read in memory
wv_map	10000000	Number of vectors to access as a memory mapped file
skip		skip these embedding groups
limit		use only these embedding groups
de	8	Embedding group (path, POS, etc.) dimensionality
cm	'auto'	classification mode, one of 'binary', 'multilabel' or 'multiclass', or 'auto' for corpus annotation based
epochs	100	max epochs to train each model for
patience	10	early stopping patience (number of epochs)
batch	64	training batch size
lr	0.001	learning rate
opt	'adam'	optimizer
mods	1	number of individual randomized models to train
ens	1	The n-best models to use for an ensemble classifier
train		The fraction of the combined training and development sets to use as the training set when building a mixed model ensemble

Stage specific parameters

Detector	Parameter	Default	Description
entity, modifier	el	21	example length in tokens, centered on candidate token
edge, unmerging	el		limit for example length in tokens
edge, unmerging	ol	5	outside length, number of tokens to include before and after edge or event
unmerging	binary	False	Do not use separate event types in unmerging classification

Training ensemble models

Model ensembles can be used to improve system performance and compensate for the randomness of the neural network training process model. For a detailed discussion of the ensemble system design and purpose please see Björne et al. (2018). In this document we will show how to train non-ensemble (single) models, ensemble models and mixed ensemble models. To train a regular, non-ensemble TEES Keras model you need to define only the number of models ('mods') example style. For example, if you use the value 'mods=20' then the neural network training process will be repeated 20 times with different random initial weights and different random parameter combinations. After all of these models have been trained, the best one is kept as the final classifier.

To train an ensemble system instead, you can define a value larger than 1 (and smaller or equal to 'mods') for the 'ens' parameter. For example, if you use the settings 'mods=20:ens=5' then the system will again train 20 randomized models, but keep the best five of these as a classifier ensemble. The classification predictions of the system will then be an average of the predictions of these five models.

Unlike classifiers such as SVM:s, a trained neural network model cannot be "refit" to include also the parameter optimization (development) dataset, since some external data is always required to detect the optimal epoch for stopping the training. Since a relatively large portion of the data is required for the development set, this limits the potential of the system to learn from all available data. The TEES ensemble system can be used to work around this limitation by randomly redistributing documents between the training and development sets for each model to be trained. To train such a mixed ensemble, you can use for example the parameters 'mods=20:ens=5:train=0.9'. Here again 20 models are trained and the best five are used as a model ensemble, but for each model the training and development sets are randomly mixed by redistributing the documents in a train:devel=9:1 proportion.

The commands for training the single, ensemble and mixed ensemble models for the GE09 corpus are therefore:

python train.py -t GE09 --detector keras --develModel model --testModel None -o [OUTDIR] --exampleStyle override_all:nf=256,512:path=0,2,4:do=0.1,0.2,0.5:dense=400,800:mods=20

python train.py -t GE09 --detector keras --develModel model --testModel None -o [OUTDIR] --exampleStyle override_all:nf=256,512:path=0,2,4:do=0.1,0.2,0.5:dense=400,800:mods=20:ens=5

python train.py -t GE09 --detector keras --develModel model --testModel None -o [OUTDIR] --exampleStyle override_all:nf=256,512:path=0,2,4:do=0.1,0.2,0.5:dense=400,800:mods=20:ens=5:train=0.9

For examples of how to train different Keras CNN models using the TEES system please see also the log files included in the supplementary material.

Citing

If you use the TEES CNN version, please cite the following publication:

@InProceedings{bjorne18event,
  author = 	{Bj{\"o}rne, Jari and Salakoski, Tapio},
  title = 	{Biomedical Event Extraction Using Convolutional Neural Networks and Dependency Parsing},
  booktitle = 	{Proceedings of the BioNLP 2018 workshop},
  year = 	{2018},
  publisher = 	{Association for Computational Linguistics},
  pages = 	{98--108},
  location = 	{Melbourne, Australia},
  url = 	{http://aclweb.org/anthology/W18-2311}
}

Please also cite all the resources you use via TEES, e.g. the word2vec embeddings dataset by Pyysalo et al..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly