Pointer Generator

How it Works

As mentioned in the Seq2Seq page, the Pointer-Generator network attempts to fix a few issues with the Seq2Seq network.

Pointer-Generator Network

Pointer-Gen

A Seq2Seq built on.
The model includes a Pgen value which decides if a word should be copied or generated.

Coverage Mechanism

Coverage

Keeps track of the attention given by previously generated words (as shown in yellow).
Penalizes the network for attending to the same parts of the source text again.
Hence, prevents the repeat of phrases/words.

Running the Model

The original Github code for the Pointer-Generator model have been modified by the authors of this repository for the ease of training. The modified code may be retrieved from the following file.

Important Hyperparameters

Hyperparameter	Explaination	Optimal Value
max_batch	Used to stop training after a certain value.	Refer to the Training Notebook.
data_path	Path to .bin files.	N.A.
vocab_path	Path to vocab file.	N.A.
mode	Alternate between train/eval/test modes.	N.A.
single_pass	Only works in test mode. Allows the model to generate an attention visualizer.	N.A.
log_root	Path to log file.	N.A.
exp_name	Experiment Name.	N.A.
max_enc_steps	Number of words the encoder will read.	Refer to the Training Notebook for training. 120 steps for test and eval.
max_dec_steps	Number of words the decoder will read.	Refer to the Training Notebook for training. 400 steps for test and eval.
vocab_size	Vocabulary size of the model.	50K for English. 150K for Chinese.
pointer-gen	Enables the Pointer-Generator Model.	N.A.
coverage	Enables Coverage with the Pointer-Generator Model.	N.A.
convert_to_coverage_model	Adds Coverage to a Pointer-Generator Model.	N.A.

While there are other hyperparameters, we found the above to be the most important.

Links to Jupyter Notebooks

Training Notebook

Evaluation Notebook

Testing Notebook

Other Resources & Dependencies

Completed by Melvin and Joe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly