NLP: English-Dutch Machine Translator

About

This project creates several models for translating English to Dutch and vis-a-vis. After creation, all models are tested and compared to each other - especially how they respectively perform depending on the the different word embeding techniques. This project's models are:

a word based neural network (for English to Dutch and for Dutch to English)
a word based neural network with attention
a character based machine translation model

Requirements and installations

install Phyton 3
download English-Dutch corpus from statmt.org
add folder with name data and insert the previously downloaded data: europarl-v7.nl-en.en and europarl-v7.nl-en.nl into it

Get started

create virtual environment by entering in terminal: py -3 -m venv .venv
activate newly created environment by entering in terminal: .venv\scripts\activate
download packages by entering in terminal with active virtual environment: python -m pip install pandas, numpy, keras, tensorflow dataframe_image, gensim, keras-self-attention, seaborn
run phyton file by doing right click on respective file and selecting "Run Phyton file in terminal" or alternatively by entering in terminal witch active virtual environment: python file_name.p

Data

The data for this project is based on “European Parliament Proceedings Parallel Corpus 1996-2011”. The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 21 European languages: Romanic (French, Italian, Spanish, Portuguese, Romanian), Germanic (English, Dutch, German, Danish, Swedish), Slavik (Bulgarian, Czech, Polish, Slovak, Slovene), Finni-Ugric (Finnish, Hungarian, Estonian), Baltic (Latvian, Lithuanian), and Greek. All models in this project are based on the parallel data containing English and Dutch sentences from https://www.statmt.org/europarl/v7/nl-en.tgz. You can find more details about the data in https://www.statmt.org/europarl/. Please note that only 10% of the data is used to train and test all models since the dataset is too large. Said 10% is randomly selected after import.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
documentation		documentation
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
machine-translation.py		machine-translation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP: English-Dutch Machine Translator

About

Requirements and installations

Get started

Data

About

Releases

Packages

Languages

sid030sid/machine-translation

Folders and files

Latest commit

History

Repository files navigation

NLP: English-Dutch Machine Translator

About

Requirements and installations

Get started

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages