-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathindexold.html
268 lines (153 loc) · 9.71 KB
/
indexold.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
<title>TTIC 31230: Fundamentals of Deep Learning</title>
<header>TTIC 31230: Fundamentals of Deep Learning</header>
<p> David McAllester</p>
<p> Revised from winter 2020</p>
<!-- <p style="color:red"> Last Lecture Canceled. Prof. McAllester appears to have some mild illness and it seems best to err on the side of the safety of the students.</p> -->
<p>Lectures Slides and Problems:</p>
<ol>
<li> Introduction</li>
<ol type = "A">
<li><a href = 01intro/history.pdf> The History of Deep Learning and Moore's Law of AI</a></li>
<li><a href = 01intro/fundamentals.pdf> The Fundamental Equations of Deep Learning</a></li>
<li><a href = 01intro/problems.pdf>Problems</a></li>
</ol>
<li> Frameworks and Back-Propagation</li>
<ol type = "A">
<li><a href = 02MLP/frameworks.pdf> Deep Learning Frameworks</a></li>
<li><a href = 02MLP/Backprop.pdf> Backpropagation for Scalar Source Code</a></li>
<li><a href = 02MLP/backprop2.pdf> Backpropagation for Tensor Source Code</a></li>
<li><a href = 02MLP/minibatching.pdf> Minibatching: The Batch Index</a></li>
<li><a href = 02MLP/EDFslides.pdf> The Educational Framework (EDF)</a></li>
<li><a href = 02MLP/problems.pdf> Problems</a></li>
<li><a href = 02MLP/edf.py> EDF source code</a> 150 lines of Python/NumPy</li>
<li><a href = 02MLP/PS1.zip> MNIST in EDF problem set</a></li>
<li><a href = https://pytorch.org/tutorials/ > PyTorch tutorial</a></li>
</ol>
<li>Vision: Convolutional Neural Networks (CNNs)</li>
<ol type = "A">
<li><a href = 03CNNs/Einstein.pdf> Einstein Notation</li>
<li><a href = 03CNNs/CNNs.pdf> CNNs</li>
<li><a href = 03CNNs/trainability.pdf> Trainability: Relu, Initialization, Batch Normalization and Residual Connections (ResNet)</li>
<li><a href = 03CNNs/CNNb.html> Invariant Theory (optional)</li>
<li><a href = 03CNNs/problems.pdf> Problems</a></li>
<li><a href = https://pytorch.org/docs/stable/nn.functional.html?highlight=convolution>Pytorch Convolution Functions</a></li>
</ol>
<li> Natural Language Processing</li>
<ol type = 'A'>
<li><a href = 05Rnns/LangModels.pdf> Language Modeling</a></li>
<li><a href = 05RNNs/RNNs.pdf> Recurrent Neural Networks (RNNs)</a></li>
<li><a href = 05RNNs/Translation.pdf> Machine Translation and Attention</a></li>
<li><a href = 05RNNs/Transformer.pdf> The Transformer</a></li>
<li><a href = 05RNNs/Phrases.pdf> Statistical Machine Translation (optional)</a></li>
<li><a href = 05RNNs/problems.pdf>Problems</a></li>
<!--
<li>References</li>
<ol type = "i">
<li><a href = https://arxiv.org/abs/1409.3215> Original sequence to sequence paper </a></li>
<li><a href = https://arxiv.org/abs/1409.0473> Original attention paper </a></li>
<li><a href = https://arxiv.org/abs/1611.04558> Google's Revolution in Machine Translation </a></li>
<li><a href = https://arxiv.org/abs/1706.03762> Attention is all you need</a></li>
</ol>
-->
</ol>
<li>Stochastic Gradient Descent</li>
<ol type = "A">
<li><a href = 06SGD/Classical.pdf> The Classical Convergence Theorem</a></li>
<li><a href = 06SGD/Decoupling1.pdf> Decoupling the Learning Rate from the Batch Size</a></li>
<li><a href = 06SGD/Momentum.pdf> Momentum as a Running Average and Decoupled Momentum</a></li>
<li><a href = 06SGD/RMS.pdf> RMSProp, and Adam and Decoupled Versions</a></li>
<li><a href = 06SGD/flow.pdf> Gradient Flow</a></li>
<li><a href = 06SGD/Heat.pdf> Heat Capacity with Loss as Energy and Learning Rate as Temperature</a></li>
<li><a href = 06SGD/Langevin.pdf> Continuous Time Noise and Stationary Parameter Densities</a></li>
<li><a href = 06SGD/SGDproblems.pdf> Problems</a></li>
<!-- <li><a href = 06SGD/safe.pdf> Slides on a Quenching Algorithm</a></li> -->
<!--
<li> References: </li>
<ol type = "i">
<li><a href = http://ruder.io/optimizing-gradient-descent/ > Blog post on SGD variants</a></li>
<li><a href = https://arxiv.org/abs/1706.02677> Training Resnt-50 on Imagenet in one hour</a></li>
<li><a href = https://openreview.net/pdf?id=B1Yy1BxCZ > Paper on batch size scaling of the learning rate and momentum parameter</a></li>
<li><a href = https://arxiv.org/abs/1511.06807> Adding Gradient Noise</a></li>
<li><a href = https://arxiv.org/abs/1704.00109> Temperature Cycling in SGD </a></li>
<li><a href = https://arxiv.org/abs/1206.1901> MCMC with momentum</a></li>
</ol> -->
</ol>
<li>Generalization and Regularization</li>
<ol type= 'A'>
<li><a href = 07regularization/Early.pdf>Early Stopping, Shrinkage and Decoupled Shrinkage</a></li>
<li><a href = 07regularization/PCABayes.pdf>PAC-Bayes Generalization Theory</a></li>
<li><a href = 07regularization/Implicit.pdf>Implicit Regularization</a></li>
<li><a href = 07regularization/Double.pdf>Double Descent</a></li>
<li><a href = 07regularization/REGproblems.pdf> Problems</a></li>
<li><a href = https://arxiv.org/abs/1307.2118> PAC-Bayes Tutorial </a></li>
</ol>
<li>Deep Graphical Models</li>
<ol type = 'A'>
<li><a href = 09GraphicalModels/DGMs1.pdf> Exponential Softmax</a></li>
<li><a href = 09GraphicalModels/CTC.pdf> Speech Recognition: Connectionist Temporal Classification (CTC)</a></li>
<li><a href = 09GraphicalModels/DGMs2.pdf> Backprogation for Exponential Softmax: The Model Marginals</a></li>
<li><a href = 09GraphicalModels/MCMC.pdf> Monte-Carlo Markov Chain (MCMC) Sampling</a></li>
<li><a href = 09GraphicalModels/MCMC.pdf> Pseudo-Likelihood and Contrastive Divergence</a></li>
<li><a href = 09GraphicalModels/Loopy.pdf> Loopy Belief Propagation (Loopy BP)</a></li>
<li><a href = 09GraphicalModels/Contrastive.pdf> Noise Contrastive Estimation</a></li>
<li><a href = 09GraphicalModels/DGMproblems.pdf> Problems</a></li>
</ol>
<li>Generative Adversarial Networks (GANs)</li>
<ol>
<li><a href = 08InfoTheory/information.pdf>Perils of Differential Entropy</a></li>
<li><a href = 14Gans/Gans.pdf> Overview and Timeline of GAN Development</a></li>
<li><a href = 14Gans/Patch.pdf>Replacing the Loss Gradient with the Margin Gradient.</a></li>
<li><a href = 14Gans/Jensen.pdf>Optimal Discrimination and Jensen-Shannon Divergence</a></li>
<li><a href = 14Gans/Contrastive.pdf>Contrastive GANs</a></li>
<li><a href = 14GANs/GANproblems.pdf> Problems</a></li>
</ol>
<li>Autoencoders</li>
<ol>
<!-- <li><a href = 08InfoTheory/info2problems.pdf> Problems /a></li> -->
<li><a href = 11AutoEncoders/Rate.pdf> Rate-Distortion Autoencoders (RDAs) </a></li>
<li><a href = 11AutoEncoders/Noisy.pdf> Noisy Channel RDAs </a></li>
<li><a href = 11AutoEncoders/GaussianRDAs.pdf> Gaussian Noisy Channel RDAs </a></li>
<li><a href = 11AutoEncoders/Latent.pdf> Interpretability of Latent Variables</a></li>
<li><a href = 11AutoEncoders/ELBO.pdf> The Evidence Lower Bound (ELBO) and Variational Autoencoders (VAEs)</a></li>
<li><a href = 11AutoEncoders/GaussianVAEs.pdf> Gaussian VAEs </a></li>
<li><a href = 11AutoEncoders/Collapse.pdf> Posterior Collapse, VAE Non-Identifiability, and beta-VAEs </a></li>
<li><a href = 11AutoEncoders/VQVAE.pdf> Vector Quantized VAEs </a></li>
<li><a href = 11AutoEncoders/Rateproblems.pdf> Problems</a></li>
</ol>
<li>Pretraining</li>
<ol>
<li><a href = pretraining/NLPpretraining.pdf> Pretraining for NLP</a></li>
<li><a href = pretraining/supervised.pdf> Supervised ImageNet Pertraining</a></li>
<li><a href = pretraining/self.pdf>Self-Supervised Pretraining for Vision</a></li>
<li><a href = pretraining/CPC.pdf>Contrastive Predictive Coding</a></li>
<li><a href = pretraining/MI.pdf>Mutual Information Coding</a></li>
<li><a href = pretraining/PREproblems.pdf> Problems</a></li>
</ol>
<li> Reinforcement Learning (RL)</li>
<ol>
<li><a href = 15RL/RL.pdf> Basic Definitions, Q-learning, Deep Q Networks (DQN) for Atari</a></li>
<li><a href = 15RL/REINFORCE.pdf> The REINFORCE algorithm, Actor-Critic algorithms, A3C for Atari </a></li>
<li><a href = 15RL/RLproblems.pdf> Problems</a></li>
</ol>
<li> AlphaZero and AlphaStar</li>
<ol>
<li><a href = 16alpha/alphago.pdf> Background Algorithms</a></li>
<li><a href = 16alpha/algorithm.pdf> The AlphaZero Training Algorithm</a></li>
<li><a href = 16alpha/results.pdf> Some Quantitative Empirical Results</a></li>
<li><a href = 16alpha/analysis.pdf> The Policy as a Q-Function</a></li>
<li><a href = 16alpha/alphabeta.pdf> What Happened to alpha-beta?</a></li>
<li><a href = 16alpha/alphastar.pdf> AlphaStar</a></li>
<li><a href = 16alpha/alphaproblems.pdf> Problems</a></li>
</ol>
<!-- <li><a href = 13SGD2/SGD2.html> Gradients as Dual Vectors, Hessian-Vector Products, and Information Geometry </a></li> -->
<!-- <li><a href = 17Interpretation/Interp.html> The Black Box Problem</a></li> -->
<li>The Quest for Artificial General Intelligence (AGI)</li>
<ol>
<li><a href = 18AGI/arch.pdf> The Free Lunch Theorem and The Intelligence Explosion</a></li>
<li><a href = 18AGI/classical.pdf> Representing Functions with Shallow Circuits: The Classical Universality Theorems </a></li>
<li><a href = 18AGI/circuits.pdf> Representing Functions with Deep Circuits: Circuit Complexity Theory </a></li>
<li><a href = 18AGI/programs.pdf> Representing Functions with Programs: Python, Assembler and the Turing Tarpit </a></li>
<li><a href = 18AGI/logic.pdf> Representing Functions and Knowledge with Logic </a></li>
<li><a href = 18AGI/NLP.pdf> Representing Choices and Knowledge with Natural Language </a></li>
</ol>
</ol>