Skip to content
Andrew Runge edited this page Mar 8, 2018 · 12 revisions
  • prepare actual IWSLT data (bpe)
  • attn decoder: cuda capable + minibatching
  • decoder forward: set to teacher forcing
  • decoder forward: separate generate and forward (bc of teacher forcing)
Clone this wiki locally