Skip to content

Pointer Generator

Joe Kawai edited this page Aug 1, 2018 · 7 revisions

How it Works

As mentioned in the Seq2Seq page, the Pointer-Generator network attempts to fix a few issues with the Seq2Seq network.

Pointer-Generator Network

Pointer-Gen

  • A Seq2Seq built on.
  • The model includes a Pgen value which decides if a word should be copied or generated.

Coverage Mechanism

Coverage

  • Keeps track of the attention given by previously generated words (as shown in yellow).
  • Penalizes the network for attending to the same parts of the source text again.
  • Hence, prevents the repeat of phrases/words.

Running the Model

The original Github code for the Pointer-Generator model have been modified by the authors of this repository for the ease of training. The modified code may be retrieved from the following file.

Important Hyperparameters

Hyperparameter Explaination Optimal Value
max_batch Used to stop training after a certain value. Refer to the Training Notebook.
data_path Path to .bin files. N.A.
vocab_path Path to vocab file. N.A.
mode Alternate between train/eval/test modes. N.A.
single_pass Only works in test mode. Allows the model to generate an attention visulizer. N.A.
log_root
exp_name
max_enc_steps Number of words the encoder will read. Refer to the Training Notebook for training. 120 steps for test and eval.
max_dec_steps Number of words the decoder will read. Refer to the Training Notebook for training. 400 steps for test and eval.
vocab_size Vocabulary size of model. 50K for English. 150K for Chinese.
pointer-gen Enables the Pointer-Generator Model. N.A.
coverage Enables Coverage with the Pointer-Generator Model. N.A.
convert_to_coverage_model Adds Coverage to a Pointer-Generator Model. N.A.

While there are other hyperparameters, we found the above to be the most important.

Links to Jupyter Notebooks

Training Notebook

Evaluation Notebook

Testing Notebook

Other Resources & Dependencies