Skip to content
esalesky edited this page Mar 10, 2018 · 12 revisions

baseline:

  • attn decoder: cuda + minibatch capable
  • cross-entropy loss
  • linear between embeds and hidden (personally I'd like to try this with and without to compare)
  • maxibatches
  • beam search (beam size=5)

not baseline:

  • morph-tag data, bpe it
Clone this wiki locally