To Do

Jump to bottom

esalesky edited this page Mar 10, 2018 · 12 revisions

baseline:

attn decoder: cuda + minibatch capable
cross-entropy loss
linear between embeds and hidden (personally I'd like to try this with and without to compare)
maxibatches
beam search (beam size=5)

not baseline:

morph-tag data, bpe it

Clone this wiki locally