-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlongcontext
206 lines (142 loc) · 7.77 KB
/
longcontext
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
Łukasz Kaiser
https://scholar.google.com/citations?hl=en&user=JWmiQR0AAAAJ&view_op=list_works&sortby=pubdate
Generating Wikipedia by Summarizing Long Sequences
https://arxiv.org/abs/1801.10198
Generating wikipedia by summarizing long sequences笔记
https://blog.csdn.net/Tuka2000/article/details/107127304/
学界 | 谷歌大脑提出通过多文档摘要方法生成维基百科,可处理较长序列
https://www.sohu.com/a/222192603_129720
Learning to Remember Rare Events
https://arxiv.org/abs/1703.03129
阅读笔记:Learning to Remember Rare Events
https://blog.csdn.net/weixin_43874380/article/details/111211712
2015-2019年摘要模型(Summarization Model)发展综述(二)
https://zhuanlan.zhihu.com/p/138282654
Mohammad Bavarian
https://scholar.google.com/citations?hl=en&user=uMg7CEAAAAAJ&view_op=list_works&sortby=pubdate
Efficient Training of Language Models to Fill in the Middle
https://arxiv.org/abs/2207.14255
Evaluating Large Language Models Trained on Code
https://arxiv.org/abs/2107.03374
Copilot: 评估在代码上训练的大型语言模型
https://zhuanlan.zhihu.com/p/571373422
https://www.hub.com/openai/human-eval
【AI4Code】CodeX:《Evaluating Large Language Models Trained on Code》(OpenAI)
https://blog.csdn.net/yanguang1470/article/details/125862215
Heewoo Jun
https://www.semanticscholar.org/author/Heewoo-Jun/35450887
https://deepai.org/publication/scaling-laws-for-autoregressive-generative-modeling
Deep Learning Scaling is Predictable, Empirically
https://www.semanticscholar.org/paper/Deep-Learning-Scaling-is-Predictable%2C-Empirically-Hestness-Narang/a1c922be467d1c0c64b963e65dae41778b81b2a0
Scaling Laws for Autoregressive Generative Modeling
https://arxiv.org/abs/2010.14701
生成模型中的分布增强
https://zhuanlan.zhihu.com/p/553430457
稀疏Transformer(Sparse Transformer)
https://zhuanlan.zhihu.com/p/504609631
Generating Long Sequences with Sparse Transformers
https://arxiv.org/abs/1904.10509
SPARSE TRANSFORMER浅析
https://zhuanlan.zhihu.com/p/259591644
Open AI新研究补齐Transformer短板,将可预测序列长度提高30倍
https://baijiahao.baidu.com/s?id=1631672936541592253&wfr=spider&for=pc
Coherence boosting: When your pretrained language model is not paying enough attention
https://arxiv.org/abs/2110.08294
Memorizing Transformers
https://arxiv.org/abs/2203.08913
谷歌提出 RNN 版 Transformer,或为长文本建模的当前最优解
https://blog.csdn.net/xixiaoyaoww/article/details/123911465
Google团队发布,一文概览Transformer模型的17大高效变种
https://mp.weixin.qq.com/s?__biz=MzI1MjQ2OTQ3Ng==&mid=2247529437&idx=1&sn=cba6ebc4025c2c92244f35c4a9577609&chksm=e9e17a56de96f340bb5e2b392b6d7bd1eb70fe9000ea7fab4472b538b67ccb01700f50a06a97&scene=27
Transformer变体(Routing Transformer,Linformer,Big Bird)
https://blog.csdn.net/qq_39388410/article/details/113528697
Efficient Content-Based Sparse Attention with Routing Transformers
https://arxiv.org/abs/2003.05997
SPARSE TRANSFORMER浅析
https://zhuanlan.zhihu.com/p/259591644
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
https://arxiv.org/abs/2302.06218
RWKV
https://github.com/BlinkDL/RWKV-LM
野心勃勃的RNN——RWKV语言模型及其100行代码极简实现
https://zhuanlan.zhihu.com/p/620469303
RWKV:一种鱼和熊掌兼得的线性transformer模型
https://zhuanlan.zhihu.com/p/437714049
RWKV-v2-RNN 原理:超越 Transformer,实现 O(T) 的语言建模
https://zhuanlan.zhihu.com/p/514840332
最新研究!Transformer的Token可拓展至100多万,精度高,兼容性好(含源码)
https://zhuanlan.zhihu.com/p/624892711
https://arxiv.org/abs/2304.11062
https://github.com/booydar/t5-experiments/tree/scaling-report
十分钟读懂旋转编码(RoPE)
https://zhuanlan.zhihu.com/p/647109286
论文分享:新型注意力算法FlashAttention
https://zhuanlan.zhihu.com/p/618533434
为内存塞不下Transformer犯愁?OpenAI应用AI研究负责人写了份指南
https://new.qq.com/rain/a/20230205A02SY000
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
https://arxiv.org/abs/2302.06218
论文阅读 Long-range Language Modeling with Self-retrieval
https://zhuanlan.zhihu.com/p/640710751
https://arxiv.org/abs/2306.13421
超长上下文处理:基于Transformer上下文处理常见方法梳理
https://blog.csdn.net/wwlsm_zql/article/details/131646700
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
https://arxiv.org/abs/2308.16137
LM-Infinite: 一种简单有效的大模型即时长度泛化,解决更长文本推理问题
https://blog.csdn.net/qq_27590277/article/details/132680459
【新智元导读】大模型上下文从此不再受限!港中文贾佳亚团队联手MIT发布了全新超长文本扩展技术LongLoRA,只需2行代码,让LLM看小说,读论文,轻松拿捏。
https://zhuanlan.zhihu.com/p/660284802
微软新作,引入LongNet将Transformers上下文长度扩充到10亿
https://baijiahao.baidu.com/s?id=1770804050075385670&wfr=spider&for=pc
不受窗口长度限制的长文本生成全新思路:利用模型参数储存上文信息
https://zhuanlan.zhihu.com/p/679713147
With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation
https://arxiv.org/abs/2401.11504
清华NLP组发布InfLLM:无需额外训练,「1024K超长上下文」100%召回
https://www.163.com/dy/article/IT0J17JK0511ABV6.html
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory
https://arxiv.org/abs/2402.04617
https://github.com/thunlp/InfLLM
Transformer升级之路:16、“复盘”长度外推技术
https://kexue.fm/archives/9948
LLM长上下文的问题
https://zhuanlan.zhihu.com/p/684924585
手把手教你高效训练一个256K长文本大模型(附代码)
https://zhuanlan.zhihu.com/p/678107461
ring attention + flash attention:超长上下文之路
https://zhuanlan.zhihu.com/p/683714620
伯克利 | 提出Ring Attention,Transformer分块,最高支持100M上下文!
https://zhuanlan.zhihu.com/p/660354607
今日Arxiv最热NLP大模型论文:Llama-2上下文扩大48倍的方法来了,港大发布,无需训练
https://zhuanlan.zhihu.com/p/684990340
LongLoRA:高效扩展预训练大型语言模型,实现100k上下文,节省16倍计算成本
https://zhuanlan.zhihu.com/p/663720038
DeepSpeed Ulysses: 训练极长序列Transformer模型的系统优化
https://zhuanlan.zhihu.com/p/652206513
陈丹琦团队新作:Llama-2上下文扩展至128k,10倍吞吐量仅需1/6内存
https://zhuanlan.zhihu.com/p/684546559
构建一个小的编码器来提升大模型处理长文本的能力
https://zhuanlan.zhihu.com/p/685556231
Long-Context Language Modeling with Parallel Context Encoding
https://arxiv.org/abs/2402.16617
https://github.com/princeton-nlp/cepe
提升LLM的“长篇阅读”能力:LongAlign助力提升长文理解
https://zhuanlan.zhihu.com/p/682324807
LongAlign: A Recipe for Long Context Alignment of Large Language Models
https://arxiv.org/abs/2401.18058
https://github.com/thudm/longalign
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
https://arxiv.org/abs/2404.07143
符尧博士-全栈Transformer推理优化第二季:部署长上下文模型-翻译
https://zhuanlan.zhihu.com/p/697244539
论文YaRN: Efficient Context Window Extension of Large Language Models笔记
https://zhuanlan.zhihu.com/p/683863159
LLM中Long Context技术解析
https://zhuanlan.zhihu.com/p/689394585
解锁大模型长上下文能力
https://zhuanlan.zhihu.com/p/696226537
鄂维南院士领衔新作:大模型不止有RAG、参数存储,还有第3种记忆
https://zhuanlan.zhihu.com/p/707921545
Memory3 : Language Modeling with Explicit Memory
https://arxiv.org/abs/2407.01178