Neural Machine Translation implemented in PyTorch

This is a PyTorch implementation of Effective Approaches to Attention-based Neural Machine Translation using scheduled sampling to improve the parameter estimation process. It uses tab-delimited bilingual sentence pairs acquired from here to train predictive language models.

Implementation Architecture

The model is trained end-to-end using stacked RNNs for sequence encoding and decoding. The decoder is additionally conditioned on a context vector for predicting the next constituent token in the sequence. This vector is computed using an attention mechanism at each time step. Intuitively, the decoder is attempting to leverage information conglomerated by the encoder by deciding the relevancy of each encoding at each time step of the decoding process.

Results

Input Sequence (English)	Output Sequence (Spanish)
how are you doing	estas haciendo
i am going to the store	voy a la tienda
she is a scientist	ella es cientifico
he is an engineer	el es un ingeniero
i am going out to the city	voy al la de la ciudad
i am running out of ideas	me estoy quedando sin ideas

Prerequisites

Python 3.+
PyTorch 0.4+
NumPy 1.10+

Usage

To train a new language model invoke train.py with the desired language abbreviation you would like to translate english to. For instance, english can be translated to by specifying 'afr' as input. 'afr.txt' in the data directory will be used. Other languages can be acquired from here. (default of input language is english.)

./train.sh

To translate an input sequence in english into another language, invoke eval.py and specify the desired language and sentence. The program will exit if the language model parameters are not found in the data directory or if the language prefix is mistyped.

./eval.sh

Files

attention.py

Attention nn module that is responsible for computing the alignment scores.
attention_decoder.py

Recurrent neural network that makes use of gated recurrent units to translate encoded inputs using attention.
encoder.py

Recurrent neural network that encodes a given input sequence.
etl.py

Helper functions for data extraction, transformation, and loading.
eval.py

Script for evaluating the sequence-to-sequence model.
helpers.py

General helper functions.
language.py

Class that keeps record of some corpus. Attributes such as vocabulary counts and tokens are stored within instances of this class.
train.py

Script for training a new sequence-to-sequence model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!