-
Notifications
You must be signed in to change notification settings - Fork 2
Pointer Generator
Joe Kawai edited this page Aug 1, 2018
·
7 revisions
As mentioned in the Seq2Seq page, the Pointer-Generator network attempts to fix a few issues with the Seq2Seq network.
- A Seq2Seq built on.
- The model includes a Pgen value which decides if a word should be copied or generated.
- Keeps track of the attention given by previously generated words (as shown in yellow).
- Penalizes the network for attending to the same parts of the source text again.
- Hence, prevents the repeat of phrases/words.
The original Github code for the Pointer-Generator model have been modified by the authors of this repository for the ease of training. The modified code may be retrieved from the following file.
Hyperparameter | Explaination | Optimal Value |
---|---|---|
max_batch | Used to stop training after a certain value. | Refer to the Training Notebook. |
data_path | Path to .bin files. | N.A. |
vocab_path | Path to vocab file. | N.A. |
mode | Alternate between train/eval/test modes. | N.A. |
single_pass | Only works in test mode. Allows the model to generate an attention visualizer. | N.A. |
log_root | Path to log file. | N.A. |
exp_name | Experiment Name. | N.A. |
max_enc_steps | Number of words the encoder will read. | Refer to the Training Notebook for training. 120 steps for test and eval. |
max_dec_steps | Number of words the decoder will read. | Refer to the Training Notebook for training. 400 steps for test and eval. |
vocab_size | Vocabulary size of the model. | 50K for English. 150K for Chinese. |
pointer-gen | Enables the Pointer-Generator Model. | N.A. |
coverage | Enables Coverage with the Pointer-Generator Model. | N.A. |
convert_to_coverage_model | Adds Coverage to a Pointer-Generator Model. | N.A. |
While there are other hyperparameters, we found the above to be the most important.
Completed by Melvin and Joe