Skip to content

Machine translation in poems domain preserving rhythm and rhyme

Notifications You must be signed in to change notification settings

q0o0p/Poem2Poem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Poem2Poem

Machine translation in poems domain preserving rhythm and rhyme

Inspired by DeepSpeare project:

https://arxiv.org/abs/1807.03491

https://github.com/jhlau/deepspeare

And by YSDA NLP course:

https://github.com/yandexdataschool/nlp_course

Also there is research paper by Marjan Ghazvininejad, Yejin Choi and Kevin Knight:

https://aclweb.org/anthology/N18-2011

But I have heard about it only after finishing this project

Goal

This project is aimed at writing a program which can automatically translate poems in one language to another (English -> Russian as example) so that translated text has rhythm and rhyme. Ideally, the program would capture poetic meter and rhyme patterns from original poem and reproduce them in translated one. However, note that such perfectly translated version generally doesn't exist in nature at all. It means that we have to find some trade-off between rhythm, rhyme, fluency, adequacy, etc. of translation.

Examples of results

"The Road Not Taken"
by Robert Frost
Automatically generated translation
containing some rhyme and rhythm
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
две дороги поднялись на желтые дрова
и жаль что я мог не путешествовать
и одним путником я был со мной
и взглянула вниз вниз так как я смог
туда где он погнулся под букетом

Method

Model consists of two parts: translation model and poetic meter model. They are trained simultaneously using multitask approach. Both are implemented as Encoder-Decoder architecture with Attention mechanism. Despite this fact, actually they differ much. Idea of Meter model and most of code of its loss function is borrowed from DeepSpeare project. Inference function is highly flexible and supports many modes that can be combined to each other independently:

  • With or without rhythm
  • With or without rhyme
  • Type of rhyme
  • Type of sampling and number of samples

Data

There is no parallel corpus of poems large enough to be used for training such model. Four datasets were used to train and fine-tune this model:

  • OpenSubtitles parallel English->Russian corpus
  • Parallel corpus of songs parsed by me from the Internet special for this project (Sourse text is English, target text is Russian; Russian translation of song doesn't contain rhyme and rhythm)
  • Parallel corpus of Russian classic poems translated to English by the model with the same architecture but trained on OpenSubtitles from Russian to English - Backtranslation approach.
  • Small corpus of Shakespeare sonnets translated to Russian by Marshak

How to use

This section will be added soon

Acknowledgements

https://github.com/jhlau/deepspeare

https://github.com/yandexdataschool/nlp_course

https://github.com/IlyaGusev/rupo

About

Machine translation in poems domain preserving rhythm and rhyme

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages