@@ -1197,9 +1197,10 @@ <h2>Lecture Monday September 30, 2024<a class="headerlink" href="#lecture-monday
1197
1197
< ol class ="simple ">
1198
1198
< li > < p > Stochastic Gradient descent with examples and automatic differentiation</ p > </ li >
1199
1199
< li > < p > If we get time, we start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</ p > </ li >
1200
+ < li > < p > < a class ="reference external " href ="https://youtu.be/jdJoOrCIdII "> Video of lecture</ a > </ p > </ li >
1201
+ < li > < p > Whiteboard notes at < a class ="reference external " href ="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf "> https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf</ a > </ p > </ li >
1200
1202
</ ol >
1201
- <!-- * [Video of lecture](https://youtu.be/75pr3hKY20U) -->
1202
- <!-- * "Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2023/NotesOct5.pdf> --> </ div >
1203
+ </ div >
1203
1204
< div class ="section " id ="suggested-readings-and-videos ">
1204
1205
< h2 > Suggested readings and videos< a class ="headerlink " href ="#suggested-readings-and-videos " title ="Permalink to this headline "> ¶</ a > </ h2 >
1205
1206
< p > < strong > Readings and Videos:</ strong > </ p >
@@ -1288,17 +1289,17 @@ <h2>Simple implementation of GD for OLS, Ridge and Lasso<a class="headerlink" hr
1288
1289
</ div >
1289
1290
< div class ="cell_output docutils container ">
1290
1291
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > Parameters for OLS using gradient descent
1291
- [[3.870337 ]
1292
- [3.34871778 ]
1293
- [4.83363025 ]]
1292
+ [[3.53732988 ]
1293
+ [4.06277313 ]
1294
+ [4.51728978 ]]
1294
1295
Parameters for Ridge using gradient descent
1295
- [[3.66951679 ]
1296
- [3.76942272 ]
1297
- [4.63546236 ]]
1296
+ [[3.62671667 ]
1297
+ [3.79849321 ]
1298
+ [4.63207408 ]]
1298
1299
Parameters for Lasso using gradient descent
1299
- [[3.65759901 ]
1300
- [3.87732748 ]
1301
- [4.58735893 ]]
1300
+ [[4.1547556 ]
1301
+ [2.58013829 ]
1302
+ [5.20288581 ]]
1302
1303
</ pre > </ div >
1303
1304
</ div >
1304
1305
</ div >
@@ -1350,11 +1351,11 @@ <h2>But none of these can compete with Newton’s method<a class="headerlink" hr
1350
1351
[[4.]
1351
1352
[3.]
1352
1353
[5.]]
1353
- 0 [-23.78254537 ] [-30.51512235 ]
1354
- 1 [1.30917499e-14 ] [-1.01327148e-14 ]
1355
- 2 [9.76996262e-16 ] [1.28164443e-15 ]
1356
- 3 [-7.28306304e-16 ] [-8.17359583e -16]
1357
- 4 [-7.63833441e -16] [-1.32422234e-15 ]
1354
+ 0 [-27.10587277 ] [-37.81035098 ]
1355
+ 1 [1.16209264e-13 ] [2.08824304e-13 ]
1356
+ 2 [8.8817842e-17 ] [2.62242138e-16 ]
1357
+ 3 [-1.77635684e-17 ] [-1.50600514e -16]
1358
+ 4 [-3.37507799e -16] [-2.280464e-16 ]
1358
1359
beta from own Newton code
1359
1360
[[4.]
1360
1361
[3.]
@@ -1696,15 +1697,15 @@ <h2>Code with a Number of Minibatches which varies<a class="headerlink" href="#c
1696
1697
</ div >
1697
1698
< div class ="cell_output docutils container ">
1698
1699
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > Own inversion
1699
- [[4.12220276 ]
1700
- [2.95115528 ]]
1701
- Eigenvalues of Hessian Matrix:[0.31601423 3.93074023 ]
1700
+ [[4.30637636 ]
1701
+ [2.56947078 ]]
1702
+ Eigenvalues of Hessian Matrix:[0.30000613 4.12600095 ]
1702
1703
theta from own gd
1703
- [[4.12220276 ]
1704
- [2.95115528 ]]
1704
+ [[4.30637636 ]
1705
+ [2.56947078 ]]
1705
1706
theta from own sdg
1706
- [[4.12206013 ]
1707
- [2.92370959 ]]
1707
+ [[4.34759233 ]
1708
+ [2.63142731 ]]
1708
1709
</ pre > </ div >
1709
1710
</ div >
1710
1711
< img alt ="_images/week40_34_1.png " src ="_images/week40_34_1.png " />
@@ -2418,12 +2419,12 @@ <h2>Using Autograd with OLS<a class="headerlink" href="#using-autograd-with-ols"
2418
2419
</ div >
2419
2420
< div class ="cell_output docutils container ">
2420
2421
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > Own inversion
2421
- [[3.62286539 ]
2422
- [3.36754671 ]]
2423
- Eigenvalues of Hessian Matrix:[0.30134885 4.08702578 ]
2422
+ [[3.88064505 ]
2423
+ [2.99072374 ]]
2424
+ Eigenvalues of Hessian Matrix:[0.30141906 4.73853838 ]
2424
2425
theta from own gd
2425
- [[3.62286539 ]
2426
- [3.36754671 ]]
2426
+ [[3.88064505 ]
2427
+ [2.99072374 ]]
2427
2428
</ pre > </ div >
2428
2429
</ div >
2429
2430
< img alt ="_images/week40_100_1.png " src ="_images/week40_100_1.png " />
@@ -2494,73 +2495,73 @@ <h2>Same code but now with momentum gradient descent<a class="headerlink" href="
2494
2495
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > Own inversion
2495
2496
[[4.]
2496
2497
[3.]]
2497
- Eigenvalues of Hessian Matrix:[0.27124938 4.56804783 ]
2498
- 0 [-8.64009263 ] [-10.94092702 ]
2499
- 1 [0.18925625 ] [-0.15527976 ]
2500
- 2 [0.17801827 ] [-0.14605929 ]
2501
- 3 [0.16744759 ] [-0.13738633 ]
2502
- 4 [0.1575046 ] [-0.12922837 ]
2503
- 5 [0.14815202 ] [-0.12155483 ]
2504
- 6 [0.1393548 ] [-0.11433693 ]
2505
- 7 [0.13107995 ] [-0.10754764 ]
2506
- 8 [0.12329646 ] [-0.10116149 ]
2507
- 9 [0.11597515 ] [-0.09515455 ]
2508
- 10 [0.10908858 ] [-0.0895043 ]
2509
- 11 [0.10261093 ] [-0.08418956 ]
2510
- 12 [0.09651792 ] [-0.07919041 ]
2511
- 13 [0.09078672 ] [-0.0744881 ]
2512
- 14 [0.08539583 ] [-0.07006502 ]
2513
- 15 [0.08032505 ] [-0.06590458 ]
2514
- 16 [0.07555537 ] [-0.06199119 ]
2515
- 17 [0.07106891 ] [-0.05831017 ]
2516
- 18 [0.06684886 ] [-0.05484773 ]
2517
- 19 [0.06287939 ] [-0.05159088 ]
2518
- 20 [0.05914563 ] [-0.04852743 ]
2519
- 21 [0.05563358 ] [-0.04564589 ]
2520
- 22 [0.05233008 ] [-0.04293545 ]
2521
- 23 [0.04922273 ] [-0.04038595 ]
2522
- 24 [0.0462999 ] [-0.03798785 ]
2523
- 25 [0.04355062 ] [-0.03573214 ]
2524
- 26 [0.0409646 ] [-0.03361037 ]
2525
- 27 [0.03853213 ] [-0.0316146 ]
2526
- 28 [0.03624411 ] [-0.02973733 ]
2527
- 29 [0.03409194 ] [-0.02797154 ]
2498
+ Eigenvalues of Hessian Matrix:[0.31702609 3.84351715 ]
2499
+ 0 [-13.26083712 ] [-12.67834752 ]
2500
+ 1 [-0.55020405 ] [0.5257011 ]
2501
+ 2 [-0.50482139 ] [0.48233952 ]
2502
+ 3 [-0.46318204 ] [0.44255455 ]
2503
+ 4 [-0.42497724 ] [0.40605118 ]
2504
+ 5 [-0.38992371 ] [0.37255873 ]
2505
+ 6 [-0.3577615 ] [0.34182884 ]
2506
+ 7 [-0.32825214 ] [0.31363366 ]
2507
+ 8 [-0.30117681 ] [0.28776411 ]
2508
+ 9 [-0.27633474 ] [0.26402837 ]
2509
+ 10 [-0.25354173 ] [0.24225043 ]
2510
+ 11 [-0.23262877 ] [0.22226881 ]
2511
+ 12 [-0.21344077 ] [0.20393534 ]
2512
+ 13 [-0.19583547 ] [0.18711407 ]
2513
+ 14 [-0.17968231 ] [0.17168028 ]
2514
+ 15 [-0.16486151 ] [0.15751952 ]
2515
+ 16 [-0.15126318 ] [0.14452678 ]
2516
+ 17 [-0.13878649 ] [0.13260573 ]
2517
+ 18 [-0.12733892 ] [0.12166797 ]
2518
+ 19 [-0.11683558 ] [0.11163239 ]
2519
+ 20 [-0.10719859 ] [0.10242458 ]
2520
+ 21 [-0.0983565 ] [0.09397626 ]
2521
+ 22 [-0.09024373 ] [0.08622478 ]
2522
+ 23 [-0.08280012 ] [0.07911268 ]
2523
+ 24 [-0.07597049 ] [0.0725872 ]
2524
+ 25 [-0.06970419 ] [0.06659997 ]
2525
+ 26 [-0.06395476 ] [0.06110658 ]
2526
+ 27 [-0.05867956 ] [0.0560663 ]
2527
+ 28 [-0.05383947 ] [0.05144177 ]
2528
+ 29 [-0.04939861 ] [0.04719868 ]
2528
2529
theta from own gd
2529
- [[4.11822173 ]
2530
- [2.90300219 ]]
2531
- 0 [0.03206757 ] [-0.0263106 ]
2532
- 1 [0.03016341 ] [-0.02474828 ]
2533
- 2 [0.02780107 ] [-0.02281004 ]
2534
- 3 [0.02544154 ] [-0.02087411 ]
2535
- 4 [0.02322297 ] [-0.01905384 ]
2536
- 5 [0.02117843 ] [-0.01737634 ]
2537
- 6 [0.0193075 ] [-0.01584129 ]
2538
- 7 [0.01759974 ] [-0.01444013 ]
2539
- 8 [0.01604235 ] [-0.01316233 ]
2540
- 9 [0.01462254 ] [-0.01199741 ]
2541
- 10 [0.01332832 ] [-0.01093553 ]
2542
- 11 [0.01214862 ] [-0.00996762 ]
2543
- 12 [0.01107333 ] [-0.00908537 ]
2544
- 13 [0.01009321 ] [-0.00828121 ]
2545
- 14 [0.00919984 ] [-0.00754823 ]
2546
- 15 [0.00838555 ] [-0.00688012 ]
2547
- 16 [0.00764333 ] [-0.00627115 ]
2548
- 17 [0.0069668 ] [-0.00571608 ]
2549
- 18 [0.00635016 ] [-0.00521014 ]
2550
- 19 [0.00578809 ] [-0.00474898 ]
2551
- 20 [0.00527578 ] [-0.00432864 ]
2552
- 21 [0.00480881 ] [-0.0039455 ]
2553
- 22 [0.00438318 ] [-0.00359628 ]
2554
- 23 [0.00399521 ] [-0.00327797 ]
2555
- 24 [0.00364159 ] [-0.00298783 ]
2556
- 25 [0.00331927 ] [-0.00272337 ]
2557
- 26 [0.00302547 ] [-0.00248232 ]
2558
- 27 [0.00275768 ] [-0.0022626 ]
2559
- 28 [0.00251359 ] [-0.00206234 ]
2560
- 29 [0.00229111 ] [-0.0018798 ]
2530
+ [[3.85703369 ]
2531
+ [3.13659941 ]]
2532
+ 0 [-0.04532405 ] [0.04330558 ]
2533
+ 1 [-0.04158557 ] [0.03973359 ]
2534
+ 2 [-0.03703391 ] [0.03538463 ]
2535
+ 3 [-0.03261373 ] [0.0311613 ]
2536
+ 4 [-0.02859759 ] [0.02732402 ]
2537
+ 5 [-0.02503392 ] [0.02391906 ]
2538
+ 6 [-0.02189994 ] [0.02092464 ]
2539
+ 7 [-0.01915337 ] [0.01830039 ]
2540
+ 8 [-0.01674956 ] [0.01600363 ]
2541
+ 9 [-0.01464686 ] [0.01399457 ]
2542
+ 10 [-0.01280793 ] [0.01223754 ]
2543
+ 11 [-0.01119981 ] [0.01070103 ]
2544
+ 12 [-0.00979357 ] [0.00935742 ]
2545
+ 13 [-0.0085639 ] [0.00818251 ]
2546
+ 14 [-0.00748862 ] [0.00715512 ]
2547
+ 15 [-0.00654835 ] [0.00625672 ]
2548
+ 16 [-0.00572613 ] [0.00547113 ]
2549
+ 17 [-0.00500716 ] [0.00478417 ]
2550
+ 18 [-0.00437846 ] [0.00418347 ]
2551
+ 19 [-0.0038287 ] [0.00365819 ]
2552
+ 20 [-0.00334797 ] [0.00319887 ]
2553
+ 21 [-0.0029276 ] [0.00279722 ]
2554
+ 22 [-0.00256001 ] [0.002446 ]
2555
+ 23 [-0.00223857 ] [0.00213888 ]
2556
+ 24 [-0.0019575 ] [0.00187032 ]
2557
+ 25 [-0.00171171 ] [0.00163548 ]
2558
+ 26 [-0.00149679 ] [0.00143013 ]
2559
+ 27 [-0.00130885 ] [0.00125057 ]
2560
+ 28 [-0.00114451 ] [0.00109354 ]
2561
+ 29 [-0.00100081 ] [0.00095624 ]
2561
2562
theta from own gd wth momentum
2562
- [[4.0076989 ]
2563
- [2.99368326 ]]
2563
+ [[3.99723951 ]
2564
+ [3.00263755 ]]
2564
2565
</ pre > </ div >
2565
2566
</ div >
2566
2567
</ div >
@@ -2649,18 +2650,18 @@ <h2>Including Stochastic Gradient Descent with Autograd<a class="headerlink" hre
2649
2650
</ div >
2650
2651
< div class ="cell_output docutils container ">
2651
2652
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > Own inversion
2652
- [[3.4870934 ]
2653
- [3.55042779 ]]
2654
- Eigenvalues of Hessian Matrix:[0.28793787 4.56453869 ]
2653
+ [[3.95935439 ]
2654
+ [3.09360113 ]]
2655
+ Eigenvalues of Hessian Matrix:[0.32606139 4.1033455 ]
2655
2656
theta from own gd
2656
- [[3.4870934 ]
2657
- [3.55042779 ]]
2657
+ [[3.95935439 ]
2658
+ [3.09360113 ]]
2658
2659
</ pre > </ div >
2659
2660
</ div >
2660
2661
< img alt ="_images/week40_104_1.png " src ="_images/week40_104_1.png " />
2661
2662
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > theta from own sdg
2662
- [[3.48900413 ]
2663
- [3.56763673 ]]
2663
+ [[3.96916206 ]
2664
+ [3.10276112 ]]
2664
2665
</ pre > </ div >
2665
2666
</ div >
2666
2667
</ div >
@@ -2742,15 +2743,15 @@ <h2>Same code but now with momentum gradient descent<a class="headerlink" href="
2742
2743
</ div >
2743
2744
< div class ="cell_output docutils container ">
2744
2745
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > Own inversion
2745
- [[4.11571507 ]
2746
- [2.86067818 ]]
2747
- Eigenvalues of Hessian Matrix:[0.25849623 4.84605507 ]
2746
+ [[4.16532977 ]
2747
+ [2.86943859 ]]
2748
+ Eigenvalues of Hessian Matrix:[0.29633608 4.22128142 ]
2748
2749
theta from own gd
2749
- [[4.10888197 ]
2750
- [2.86602331 ]]
2750
+ [[4.16445858 ]
2751
+ [2.87020155 ]]
2751
2752
theta from own sdg with momentum
2752
- [[4.15361856 ]
2753
- [2.92376655 ]]
2753
+ [[4.149433 ]
2754
+ [2.89654756 ]]
2754
2755
</ pre > </ div >
2755
2756
</ div >
2756
2757
</ div >
@@ -2819,9 +2820,9 @@ <h2>Similar (second order function now) problem but now with AdaGrad<a class="he
2819
2820
</ pre > </ div >
2820
2821
</ div >
2821
2822
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > theta from own AdaGrad
2822
- [[1.99968451 ]
2823
- [3.001795 ]
2824
- [3.99848215 ]]
2823
+ [[1.99415296 ]
2824
+ [3.03274273 ]
2825
+ [3.96850191 ]]
2825
2826
</ pre > </ div >
2826
2827
</ div >
2827
2828
</ div >
@@ -2897,9 +2898,9 @@ <h2>RMSprop for adaptive learning rate with Stochastic Gradient Descent<a class=
2897
2898
</ pre > </ div >
2898
2899
</ div >
2899
2900
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > theta from own RMSprop
2900
- [[1.99887767 ]
2901
- [3.01261048 ]
2902
- [3.99294056 ]]
2901
+ [[2.00022969 ]
2902
+ [2.99954278 ]
2903
+ [4.00046631 ]]
2903
2904
</ pre > </ div >
2904
2905
</ div >
2905
2906
</ div >
@@ -2979,9 +2980,9 @@ <h2>And finally <a class="reference external" href="https://arxiv.org/pdf/1412.6
2979
2980
</ pre > </ div >
2980
2981
</ div >
2981
2982
< div class ="output stream highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > theta from own ADAM
2982
- [[1.99996755 ]
2983
- [3.00016535 ]
2984
- [3.99977375 ]]
2983
+ [[1.99991245 ]
2984
+ [3.00042721 ]
2985
+ [3.99954434 ]]
2985
2986
</ pre > </ div >
2986
2987
</ div >
2987
2988
</ div >
@@ -3102,7 +3103,7 @@ <h3>A warm-up example<a class="headerlink" href="#a-warm-up-example" title="Perm
3102
3103
return asarray(x, dtype=self.dtype)
3103
3104
</ pre > </ div >
3104
3105
</ div >
3105
- < div class ="output text_plain highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > [<matplotlib.lines.Line2D at 0x117c17be0 >]
3106
+ < div class ="output text_plain highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > [<matplotlib.lines.Line2D at 0x11975e550 >]
3106
3107
</ pre > </ div >
3107
3108
</ div >
3108
3109
< img alt ="_images/week40_120_2.png " src ="_images/week40_120_2.png " />
@@ -3137,7 +3138,7 @@ <h3>A more advanced example<a class="headerlink" href="#a-more-advanced-example"
3137
3138
</ div >
3138
3139
</ div >
3139
3140
< div class ="cell_output docutils container ">
3140
- < div class ="output text_plain highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > <matplotlib.collections.PathCollection at 0x117893340 >
3141
+ < div class ="output text_plain highlight-myst-ansi notranslate "> < div class ="highlight "> < pre > < span > </ span > <matplotlib.collections.PathCollection at 0x11b160a60 >
3141
3142
</ pre > </ div >
3142
3143
</ div >
3143
3144
< img alt ="_images/week40_122_1.png " src ="_images/week40_122_1.png " />
0 commit comments