From 9695d466e98a41ed9e163717ff36159b9c6dff6f Mon Sep 17 00:00:00 2001 From: MassEast <72736286+MassEast@users.noreply.github.com> Date: Wed, 19 Jun 2024 09:34:35 +0200 Subject: [PATCH] Transformer encoder -> Transformer decoder In section 11.9.3 Decoder-Only, it should say "GPT pretraining with a Transformer decoder" instead of "GPT pretraining with a Transformer encoder", just as depicted in Fig. 11.9.6 --- .../large-pretraining-transformers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter_attention-mechanisms-and-transformers/large-pretraining-transformers.md b/chapter_attention-mechanisms-and-transformers/large-pretraining-transformers.md index 2c4e05f29c..bda1aea38a 100644 --- a/chapter_attention-mechanisms-and-transformers/large-pretraining-transformers.md +++ b/chapter_attention-mechanisms-and-transformers/large-pretraining-transformers.md @@ -270,7 +270,7 @@ as its backbone :cite:`Radford.Narasimhan.Salimans.ea.2018`. Following the autoregressive language model training as described in :numref:`subsec_partitioning-seqs`, :numref:`fig_gpt-decoder-only` illustrates -GPT pretraining with a Transformer encoder, +GPT pretraining with a Transformer decoder, where the target sequence is the input sequence shifted by one token. Note that the attention pattern in the Transformer decoder enforces that each token can only attend to its past tokens