-
Notifications
You must be signed in to change notification settings - Fork 0
causal decoder based on convolutions only (no attention): can be applied to ubbounded sequence lengths; the prediction of the next token depends on *all* previous tokens; allows autoregressive sampling; highly gpu-parralellizable; trained with teacher forcing;
sliorde/conv-decoder
About
causal decoder based on convolutions only (no attention): can be applied to ubbounded sequence lengths; the prediction of the next token depends on *all* previous tokens; allows autoregressive sampling; highly gpu-parralellizable; trained with teacher forcing;