Skip to content

Latest commit

 

History

History
38 lines (17 loc) · 1.07 KB

File metadata and controls

38 lines (17 loc) · 1.07 KB

Open AI 的GPT系列模型

GPT模型的关键点:

  • 采用了无监督预训练+下游任务微调(整个模型微调)的模式。不同任务微调的方式不同。
  • 采用Transformer网络结构建立语言模型,相比bi-lstm可以更好的capture长程信息。
  • fine-tune过程中也用了LM(Language Model)作为auxiliary objective。

reference:

GPT: Improving Language Understanding by Generative Pre-Training

GPT-2: Language Models are Unsupervised Multitask Learners

GPT-3: Language Models are Few-Shot Learners

GPT模型的基本结构:

(图ref:https://www.cnblogs.com/robert-dlut/p/9824346.html)

image-20210629184720877

image-20210629184746213

image-20210629184757560