This is a deep neural network model written using Keras in Python. The model predicts the next word in a sentence from the preceding 12 words which are maintained in a shift register and presented to the model in parallel. The model has 4 dense layers with interspersed droput and ELU layers. word2vec is used to code words as 300-element vectors which form the input to the network. The output of the model has a probabalistic selector so that it can be adjusted by "temperature" to be more or less inventive.
The model is shown diagramatically in Network_Model.vsd
:
The model has been trained firstly on contributions from a large number of MPs, continuing until the validation loss stopped decreasing and then specifically on the contributions of Theresa May. Training on Theresa May was continued beyond the point of minimum validation loss so the model acts more like a memory, which is better for this demonstration.
The training vectors cover each sentence in the training set separately. At the beginning of a sentence, where there are fewer than 12 preceding words, '.'s are used as placeholders for no preceding text. When the model is used to predict from a few starting words, '.'s should likewise be used.
There are three steps to training and using the model:
- Run
build_dictionary.py
to take the input text inall_contributions.json
and create codings for the words used. Two codings are produced. The first is a simple integer code which is output to a pickle filecodetables.pickle
. This contains two dicts which map words to their code and vice versa. The second is a word2vec model. Each word is represented as a 300 number vector. The model is saved invmodel
../build_dictionary.py
- Run
train.py
with a JSON file argument to train the model on that data. This reads incodetables.pickle
andvmodel
. The first timetrain.py
is run, the filetrained_model.h5
will be created containing the Keras model and weights. Astrain.py
is run subsequent times, the existing model is read in and the weights updated. It is saved every 5 epochs during training. This allows training to be restarted from where a previous run left off, or for the model to be trained on different data in succession. As well as updatingtrained_model.h5
, a second filetrained_model_<nnn>_<train_accuracy>.h5
is saved to record each step of training. You can go back to one of these files as a starting point if the model subsequenly veers off. The run stops after 200 blocks of 5-epochs. If you change the word model, you must manually delete the h5 file and start again../train.py other_contributions.json ./train.py theresa_may_contributions.json ...
- Run
create_sentence.py
to test the model and create a sentence. You can give an argument with the starting words. For instance:If no argument is given, it produces sentences for a couple of predefined examples../create_sentence.py "Surely, the right honourable gentleman cannot believe"
The maybot code in a higher level directory can import create_sentence.py
to call create_sentence()
. To make this work smoothly, a dummy __init__.py
file has been created.
https://keras.io/
https://radimrehurek.com/gensim/models/word2vec.html
https://github.com/oxford-cs-deepnlp-2017/lectures