forked from carpentries-incubator/deep-learning-intro
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path3-monitor-the-model.html
1852 lines (1808 loc) · 128 KB
/
3-monitor-the-model.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<!-- START: inst/pkgdown/templates/layout.html --><!-- Generated by pkgdown: do not edit by hand --><html lang="en" data-bs-theme="auto"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="utf-8"><title>Introduction to deep learning: Monitor the training process</title><meta name="viewport" content="width=device-width, initial-scale=1"><script src="assets/themetoggle.js"></script><link rel="stylesheet" type="text/css" href="assets/styles.css"><script src="assets/scripts.js" type="text/javascript"></script><!-- mathjax --><script type="text/x-mathjax-config">
MathJax.Hub.Config({
config: ["MMLorHTML.js"],
jax: ["input/TeX","input/MathML","output/HTML-CSS","output/NativeMML", "output/PreviewHTML"],
extensions: ["tex2jax.js","mml2jax.js","MathMenu.js","MathZoom.js", "fast-preview.js", "AssistiveMML.js", "a11y/accessibility-menu.js"],
TeX: {
extensions: ["AMSmath.js","AMSsymbols.js","noErrors.js","noUndefined.js"]
},
tex2jax: {
inlineMath: [['\\(', '\\)']],
displayMath: [ ['$$','$$'], ['\\[', '\\]'] ],
processEscapes: true
}
});
</script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><!-- Responsive Favicon for The Carpentries --><link rel="apple-touch-icon" sizes="180x180" href="favicons/incubator/apple-touch-icon.png"><link rel="icon" type="image/png" sizes="32x32" href="favicons/incubator/favicon-32x32.png"><link rel="icon" type="image/png" sizes="16x16" href="favicons/incubator/favicon-16x16.png"><link rel="manifest" href="favicons/incubator/site.webmanifest"><link rel="mask-icon" href="favicons/incubator/safari-pinned-tab.svg" color="#5bbad5"><meta name="msapplication-TileColor" content="#da532c"><meta name="theme-color" media="(prefers-color-scheme: light)" content="white"><meta name="theme-color" media="(prefers-color-scheme: dark)" content="black"></head><body>
<header id="top" class="navbar navbar-expand-md top-nav incubator"><svg xmlns="http://www.w3.org/2000/svg" class="d-none"><symbol id="check2" viewbox="0 0 16 16"><path d="M13.854 3.646a.5.5 0 0 1 0 .708l-7 7a.5.5 0 0 1-.708 0l-3.5-3.5a.5.5 0 1 1 .708-.708L6.5 10.293l6.646-6.647a.5.5 0 0 1 .708 0z"></path></symbol><symbol id="circle-half" viewbox="0 0 16 16"><path d="M8 15A7 7 0 1 0 8 1v14zm0 1A8 8 0 1 1 8 0a8 8 0 0 1 0 16z"></path></symbol><symbol id="moon-stars-fill" viewbox="0 0 16 16"><path d="M6 .278a.768.768 0 0 1 .08.858 7.208 7.208 0 0 0-.878 3.46c0 4.021 3.278 7.277 7.318 7.277.527 0 1.04-.055 1.533-.16a.787.787 0 0 1 .81.316.733.733 0 0 1-.031.893A8.349 8.349 0 0 1 8.344 16C3.734 16 0 12.286 0 7.71 0 4.266 2.114 1.312 5.124.06A.752.752 0 0 1 6 .278z"></path><path d="M10.794 3.148a.217.217 0 0 1 .412 0l.387 1.162c.173.518.579.924 1.097 1.097l1.162.387a.217.217 0 0 1 0 .412l-1.162.387a1.734 1.734 0 0 0-1.097 1.097l-.387 1.162a.217.217 0 0 1-.412 0l-.387-1.162A1.734 1.734 0 0 0 9.31 6.593l-1.162-.387a.217.217 0 0 1 0-.412l1.162-.387a1.734 1.734 0 0 0 1.097-1.097l.387-1.162zM13.863.099a.145.145 0 0 1 .274 0l.258.774c.115.346.386.617.732.732l.774.258a.145.145 0 0 1 0 .274l-.774.258a1.156 1.156 0 0 0-.732.732l-.258.774a.145.145 0 0 1-.274 0l-.258-.774a1.156 1.156 0 0 0-.732-.732l-.774-.258a.145.145 0 0 1 0-.274l.774-.258c.346-.115.617-.386.732-.732L13.863.1z"></path></symbol><symbol id="sun-fill" viewbox="0 0 16 16"><path d="M8 12a4 4 0 1 0 0-8 4 4 0 0 0 0 8zM8 0a.5.5 0 0 1 .5.5v2a.5.5 0 0 1-1 0v-2A.5.5 0 0 1 8 0zm0 13a.5.5 0 0 1 .5.5v2a.5.5 0 0 1-1 0v-2A.5.5 0 0 1 8 13zm8-5a.5.5 0 0 1-.5.5h-2a.5.5 0 0 1 0-1h2a.5.5 0 0 1 .5.5zM3 8a.5.5 0 0 1-.5.5h-2a.5.5 0 0 1 0-1h2A.5.5 0 0 1 3 8zm10.657-5.657a.5.5 0 0 1 0 .707l-1.414 1.415a.5.5 0 1 1-.707-.708l1.414-1.414a.5.5 0 0 1 .707 0zm-9.193 9.193a.5.5 0 0 1 0 .707L3.05 13.657a.5.5 0 0 1-.707-.707l1.414-1.414a.5.5 0 0 1 .707 0zm9.193 2.121a.5.5 0 0 1-.707 0l-1.414-1.414a.5.5 0 0 1 .707-.707l1.414 1.414a.5.5 0 0 1 0 .707zM4.464 4.465a.5.5 0 0 1-.707 0L2.343 3.05a.5.5 0 1 1 .707-.707l1.414 1.414a.5.5 0 0 1 0 .708z"></path></symbol></svg><a class="visually-hidden-focusable skip-link" href="#main-content">Skip to main content</a>
<div class="container-fluid top-nav-container">
<div class="col-md-8">
<div class="large-logo">
<img id="incubator-logo" alt="Carpentries Incubator" src="assets/images/incubator-logo.svg"></div>
</div>
<div class="selector-container">
<div id="theme-selector">
<li class="nav-item dropdown" id="theme-button-list">
<button class="btn btn-link nav-link px-0 px-lg-2 dropdown-toggle d-flex align-items-center" id="bd-theme" type="button" aria-expanded="false" data-bs-toggle="dropdown" data-bs-display="static" aria-label="Toggle theme (auto)">
<svg class="bi my-1 theme-icon-active"><use href="#circle-half"></use></svg><i data-feather="chevron-down"></i>
</button>
<ul class="dropdown-menu dropdown-menu-end" aria-labelledby="bd-theme-text"><li>
<button type="button" class="btn dropdown-item d-flex align-items-center" data-bs-theme-value="light" aria-pressed="false">
<svg class="bi me-2 theme-icon"><use href="#sun-fill"></use></svg>
Light
<svg class="bi ms-auto d-none"><use href="#check2"></use></svg></button>
</li>
<li>
<button type="button" class="btn dropdown-item d-flex align-items-center" data-bs-theme-value="dark" aria-pressed="false">
<svg class="bi me-2 theme-icon"><use href="#moon-stars-fill"></use></svg>
Dark
<svg class="bi ms-auto d-none"><use href="#check2"></use></svg></button>
</li>
<li>
<button type="button" class="btn dropdown-item d-flex align-items-center active" data-bs-theme-value="auto" aria-pressed="true">
<svg class="bi me-2 theme-icon"><use href="#circle-half"></use></svg>
Auto
<svg class="bi ms-auto d-none"><use href="#check2"></use></svg></button>
</li>
</ul></li>
</div>
<div class="dropdown" id="instructor-dropdown">
<button class="btn btn-secondary dropdown-toggle bordered-button" type="button" id="dropdownMenu1" data-bs-toggle="dropdown" aria-expanded="false">
<i aria-hidden="true" class="icon" data-feather="eye"></i> Learner View <i data-feather="chevron-down"></i>
</button>
<ul class="dropdown-menu" aria-labelledby="dropdownMenu1"><li><button class="dropdown-item" type="button" onclick="window.location.href='instructor/3-monitor-the-model.html';">Instructor View</button></li>
</ul></div>
</div>
</div>
<hr></header><nav class="navbar navbar-expand-xl bottom-nav incubator" aria-label="Main Navigation"><div class="container-fluid nav-container">
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle Navigation">
<span class="navbar-toggler-icon"></span>
<span class="menu-title">Menu</span>
</button>
<div class="nav-logo">
<img class="small-logo" alt="Carpentries Incubator" src="assets/images/incubator-logo-sm.svg"></div>
<div class="lesson-title-md">
Introduction to deep learning
</div>
<div class="search-icon-sm">
<!-- TODO: do not show until we have search
<i role="img" aria-label="Search the All In One page" data-feather="search"></i>
-->
</div>
<div class="desktop-nav">
<ul class="navbar-nav me-auto mb-2 mb-lg-0"><li class="nav-item">
<span class="lesson-title">
Introduction to deep learning
</span>
</li>
<li class="nav-item">
<a class="nav-link" href="key-points.html">Key Points</a>
</li>
<li class="nav-item">
<a class="nav-link" href="reference.html#glossary">Glossary</a>
</li>
<li class="nav-item">
<a class="nav-link" href="profiles.html">Learner Profiles</a>
</li>
<li class="nav-item dropdown">
<button class="nav-link dropdown-toggle" id="navbarDropdown" data-bs-toggle="dropdown" aria-expanded="false">
More <i data-feather="chevron-down"></i>
</button>
<ul class="dropdown-menu" aria-labelledby="navbarDropdown"><li><a class="dropdown-item" href="reference.html">Reference</a></li>
</ul></li>
</ul></div>
<!--
<form class="d-flex col-md-2 search-form">
<fieldset disabled>
<input class="form-control me-2 searchbox" type="search" placeholder="" aria-label="">
<button class="btn btn-outline-success tablet-search-button" type="submit">
<i class="search-icon" data-feather="search" role="img" aria-label="Search the All In One page"></i>
</button>
</fieldset>
</form>
-->
<a id="search-button" class="btn btn-primary" href="aio.html" role="button" aria-label="Search the All In One page">Search the All In One page</a>
</div><!--/div.container-fluid -->
</nav><div class="col-md-12 mobile-title">
Introduction to deep learning
</div>
<aside class="col-md-12 lesson-progress"><div style="width: 31%" class="percentage">
31%
</div>
<div class="progress incubator">
<div class="progress-bar incubator" role="progressbar" style="width: 31%" aria-valuenow="31" aria-label="Lesson Progress" aria-valuemin="0" aria-valuemax="100">
</div>
</div>
</aside><div class="container">
<div class="row">
<!-- START: inst/pkgdown/templates/navbar.html -->
<div id="sidebar-col" class="col-lg-4">
<div id="sidebar" class="sidebar">
<nav aria-labelledby="flush-headingEleven"><button role="button" aria-label="close menu" alt="close menu" aria-expanded="true" aria-controls="sidebar" class="collapse-toggle" data-collapse="Collapse " data-episodes="Episodes ">
<i class="search-icon" data-feather="x" role="img"></i>
</button>
<div class="sidebar-inner">
<div class="row mobile-row" id="theme-row-mobile">
<div class="col" id="theme-selector">
<li class="nav-item dropdown" id="theme-button-list">
<button class="btn btn-link nav-link px-0 px-lg-2 dropdown-toggle d-flex align-items-center" id="bd-theme" type="button" aria-expanded="false" data-bs-toggle="dropdown" data-bs-display="static" aria-label="Toggle theme (auto)">
<svg class="bi my-1 theme-icon-active"><use href="#circle-half"></use></svg><span class="d-lg-none ms-1" id="bd-theme-text">Toggle Theme</span>
</button>
<ul class="dropdown-menu dropdown-menu-right" aria-labelledby="bd-theme-text"><li>
<button type="button" class="btn dropdown-item d-flex align-items-center" data-bs-theme-value="light" aria-pressed="false">
<svg class="bi me-2 theme-icon"><use href="#sun-fill"></use></svg>
Light
<svg class="bi ms-auto d-none"><use href="#check2"></use></svg></button>
</li>
<li>
<button type="button" class="btn dropdown-item d-flex align-items-center" data-bs-theme-value="dark" aria-pressed="false">
<svg class="bi me-2 theme-icon"><use href="#moon-stars-fill"></use></svg>
Dark
<svg class="bi ms-auto d-none"><use href="#check2"></use></svg></button>
</li>
<li>
<button type="button" class="btn dropdown-item d-flex align-items-center active" data-bs-theme-value="auto" aria-pressed="true">
<svg class="bi me-2 theme-icon"><use href="#circle-half"></use></svg>
Auto
<svg class="bi ms-auto d-none"><use href="#check2"></use></svg></button>
</li>
</ul></li>
</div>
</div>
<div class="row mobile-row">
<div class="col">
<div class="sidenav-view-selector">
<div class="accordion accordion-flush" id="accordionFlush9">
<div class="accordion-item">
<h2 class="accordion-header" id="flush-headingNine">
<button class="accordion-button collapsed" id="instructor" type="button" data-bs-toggle="collapse" data-bs-target="#flush-collapseNine" aria-expanded="false" aria-controls="flush-collapseNine">
<i id="eye" aria-hidden="true" class="icon" data-feather="eye"></i> Learner View
</button>
</h2>
<div id="flush-collapseNine" class="accordion-collapse collapse" aria-labelledby="flush-headingNine" data-bs-parent="#accordionFlush2">
<div class="accordion-body">
<a href="instructor/3-monitor-the-model.html">Instructor View</a>
</div>
</div>
</div><!--/div.accordion-item-->
</div><!--/div.accordion-flush-->
</div><!--div.sidenav-view-selector -->
</div><!--/div.col -->
<hr></div><!--/div.mobile-row -->
<div class="accordion accordion-flush" id="accordionFlush11">
<div class="accordion-item">
<button id="chapters" class="accordion-button show" type="button" data-bs-toggle="collapse" data-bs-target="#flush-collapseEleven" aria-expanded="false" aria-controls="flush-collapseEleven">
<h2 class="accordion-header chapters" id="flush-headingEleven">
EPISODES
</h2>
</button>
<div id="flush-collapseEleven" class="accordion-collapse show collapse" aria-labelledby="flush-headingEleven" data-bs-parent="#accordionFlush11">
<div class="accordion-body">
<div class="accordion accordion-flush" id="accordionFlush1">
<div class="accordion-item">
<div class="accordion-header" id="flush-heading1">
<a href="index.html">Summary and Setup</a>
</div><!--/div.accordion-header-->
</div><!--/div.accordion-item-->
</div><!--/div.accordion-flush-->
<div class="accordion accordion-flush" id="accordionFlush2">
<div class="accordion-item">
<div class="accordion-header" id="flush-heading2">
<a href="1-introduction.html">1. Introduction</a>
</div><!--/div.accordion-header-->
</div><!--/div.accordion-item-->
</div><!--/div.accordion-flush-->
<div class="accordion accordion-flush" id="accordionFlush3">
<div class="accordion-item">
<div class="accordion-header" id="flush-heading3">
<a href="2-keras.html">2. Classification by a neural network using Keras</a>
</div><!--/div.accordion-header-->
</div><!--/div.accordion-item-->
</div><!--/div.accordion-flush-->
<div class="accordion accordion-flush" id="accordionFlushcurrent">
<div class="accordion-item">
<div class="accordion-header" id="flush-headingcurrent">
<button class="accordion-button" type="button" data-bs-toggle="collapse" data-bs-target="#flush-collapsecurrent" aria-expanded="true" aria-controls="flush-collapsecurrent">
<span class="visually-hidden">Current Chapter</span>
<span class="current-chapter">
3. Monitor the training process
</span>
</button>
</div><!--/div.accordion-header-->
<div id="flush-collapsecurrent" class="accordion-collapse collapse show" aria-labelledby="flush-headingcurrent" data-bs-parent="#accordionFlushcurrent">
<div class="accordion-body">
<ul><li><a href="#formulate-outline-the-problem-weather-prediction">1. Formulate / Outline the problem: weather prediction</a></li>
<li><a href="#identify-inputs-and-outputs">2. Identify inputs and outputs</a></li>
<li><a href="#prepare-data">3. Prepare data</a></li>
<li><a href="#choose-a-pretrained-model-or-start-building-architecture-from-scratch">4. Choose a pretrained model or start building architecture from
scratch</a></li>
<li><a href="#intermezzo-how-do-neural-networks-learn">Intermezzo: How do neural networks learn?</a></li>
<li><a href="#choose-a-loss-function-and-optimizer">5. Choose a loss function and optimizer</a></li>
<li><a href="#train-the-model">6. Train the model</a></li>
<li><a href="#perform-a-predictionclassification">7. Perform a Prediction/Classification</a></li>
<li><a href="#measure-performance">8. Measure performance</a></li>
<li><a href="#refine-the-model">9. Refine the model</a></li>
<li><a href="#save-model">10. Save model</a></li>
<li><a href="#outlook">Outlook</a></li>
</ul></div><!--/div.accordion-body-->
</div><!--/div.accordion-collapse-->
</div><!--/div.accordion-item-->
</div><!--/div.accordion-flush-->
<div class="accordion accordion-flush" id="accordionFlush5">
<div class="accordion-item">
<div class="accordion-header" id="flush-heading5">
<a href="4-advanced-layer-types.html">4. Advanced layer types</a>
</div><!--/div.accordion-header-->
</div><!--/div.accordion-item-->
</div><!--/div.accordion-flush-->
<div class="accordion accordion-flush" id="accordionFlush6">
<div class="accordion-item">
<div class="accordion-header" id="flush-heading6">
<a href="5-transfer-learning.html">5. Transfer learning</a>
</div><!--/div.accordion-header-->
</div><!--/div.accordion-item-->
</div><!--/div.accordion-flush-->
<div class="accordion accordion-flush" id="accordionFlush7">
<div class="accordion-item">
<div class="accordion-header" id="flush-heading7">
<a href="6-outlook.html">6. Outlook</a>
</div><!--/div.accordion-header-->
</div><!--/div.accordion-item-->
</div><!--/div.accordion-flush-->
</div>
</div>
</div>
<hr class="half-width"><div class="accordion accordion-flush lesson-resources" id="accordionFlush12">
<div class="accordion-item">
<h2 class="accordion-header" id="flush-headingTwelve">
<button class="accordion-button collapsed" id="lesson-resources" type="button" data-bs-toggle="collapse" data-bs-target="#flush-collapseTwelve" aria-expanded="false" aria-controls="flush-collapseTwelve">
RESOURCES
</button>
</h2>
<div id="flush-collapseTwelve" class="accordion-collapse collapse" aria-labelledby="flush-headingTwelve" data-bs-parent="#accordionFlush12">
<div class="accordion-body">
<ul><li>
<a href="key-points.html">Key Points</a>
</li>
<li>
<a href="reference.html#glossary">Glossary</a>
</li>
<li>
<a href="profiles.html">Learner Profiles</a>
</li>
<li><a href="reference.html">Reference</a></li>
</ul></div>
</div>
</div>
</div>
<hr class="half-width lesson-resources"><a href="aio.html">See all in one page</a>
<hr class="d-none d-sm-block d-md-none"><div class="d-grid gap-1">
</div>
</div><!-- /div.accordion -->
</div><!-- /div.sidebar-inner -->
</nav></div><!-- /div.sidebar -->
</div><!-- /div.sidebar-col -->
<!-- END: inst/pkgdown/templates/navbar.html-->
<!-- START: inst/pkgdown/templates/content-instructor.html -->
<div class="col-xl-8 col-lg-12 primary-content">
<nav class="lesson-content mx-md-4" aria-label="Previous and Next Chapter"><!-- content for small screens --><div class="d-block d-sm-block d-md-none">
<a class="chapter-link" href="2-keras.html"><i aria-hidden="true" class="small-arrow" data-feather="arrow-left"></i>Previous</a>
<a class="chapter-link float-end" href="4-advanced-layer-types.html">Next<i aria-hidden="true" class="small-arrow" data-feather="arrow-right"></i></a>
</div>
<!-- content for large screens -->
<div class="d-none d-sm-none d-md-block">
<a class="chapter-link" href="2-keras.html" rel="prev">
<i aria-hidden="true" class="small-arrow" data-feather="arrow-left"></i>
Previous: Classification by a
</a>
<a class="chapter-link float-end" href="4-advanced-layer-types.html" rel="next">
Next: Advanced layer types...
<i aria-hidden="true" class="small-arrow" data-feather="arrow-right"></i>
</a>
</div>
<hr></nav><main id="main-content" class="main-content"><div class="container lesson-content">
<h1>Monitor the training process</h1>
<p>Last updated on 2024-12-03 |
<a href="https://github.com/carpentries-incubator/deep-learning-intro/edit/main/episodes/3-monitor-the-model.Rmd" class="external-link">Edit this page <i aria-hidden="true" data-feather="edit"></i></a></p>
<div class="text-end">
<button role="button" aria-pressed="false" tabindex="0" id="expand-code" class="pull-right" data-expand="Expand All Solutions " data-collapse="Collapse All Solutions "> Expand All Solutions <i aria-hidden="true" data-feather="plus"></i></button>
</div>
<div class="overview card">
<h2 class="card-header">Overview</h2>
<div class="row g-0">
<div class="col-md-4">
<div class="card-body">
<div class="inner">
<h3 class="card-title">Questions</h3>
<ul><li>How do I create a neural network for a regression task?</li>
<li>How does optimization work?</li>
<li>How do I monitor the training process?</li>
<li>How do I detect (and avoid) overfitting?</li>
<li>What are common options to improve the model performance?</li>
</ul></div>
</div>
</div>
<div class="col-md-8">
<div class="card-body">
<div class="inner bordered">
<h3 class="card-title">Objectives</h3>
<ul><li>Explain the importance of keeping your test set clean, by validating
on the validation set instead of the test set</li>
<li>Use the data splits to plot the training process</li>
<li>Explain how optimization works</li>
<li>Design a neural network for a regression task</li>
<li>Measure the performance of your deep neural network</li>
<li>Interpret the training plots to recognize overfitting</li>
<li>Use normalization as preparation step for deep learning</li>
<li>Implement basic strategies to prevent overfitting</li>
</ul></div>
</div>
</div>
</div>
</div>
<p>In this episode we will explore how to monitor the training progress,
evaluate our the model predictions and finetune the model to avoid
over-fitting. For that we will use a more complicated weather
data-set.</p>
<section><h2 class="section-heading" id="formulate-outline-the-problem-weather-prediction">1. Formulate / Outline the problem: weather prediction<a class="anchor" aria-label="anchor" href="#formulate-outline-the-problem-weather-prediction"></a></h2>
<hr class="half-width"><p>Here we want to work with the <em>weather prediction dataset</em>
(the light version) which can be <a href="https://doi.org/10.5281/zenodo.5071376" class="external-link">downloaded from
Zenodo</a>. It contains daily weather observations from 11 different
European cities or places through the years 2000 to 2010. For all
locations the data contains the variables ‘mean temperature’, ‘max
temperature’, and ‘min temperature’. In addition, for multiple
locations, the following variables are provided: ‘cloud_cover’,
‘wind_speed’, ‘wind_gust’, ‘humidity’, ‘pressure’, ‘global_radiation’,
‘precipitation’, ‘sunshine’, but not all of them are provided for every
location. A more extensive description of the dataset including the
different physical units is given in accompanying metadata file. The
full dataset comprises of 10 years (3654 days) of collected weather data
across Europe.</p>
<figure><img src="fig/03_weather_prediction_dataset_map.png" alt="18 European locations in the weather prediction dataset" class="figure mx-auto d-block"><div class="figcaption">European locations in the weather prediction
dataset</div>
</figure><p>A very common task with weather data is to make a prediction about
the weather sometime in the future, say the next day. In this episode,
we will try to predict tomorrow’s sunshine hours, a
challenging-to-predict feature, using a neural network with the
available weather data for one location: BASEL.</p>
</section><section><h2 class="section-heading" id="identify-inputs-and-outputs">2. Identify inputs and outputs<a class="anchor" aria-label="anchor" href="#identify-inputs-and-outputs"></a></h2>
<hr class="half-width"><div class="section level3">
<h3 id="import-dataset">Import Dataset<a class="anchor" aria-label="anchor" href="#import-dataset"></a></h3>
<p>We will now import and explore the weather data-set:</p>
<div id="load-the-data" class="callout">
<div class="callout-square">
<i class="callout-icon" data-feather="bell"></i>
</div>
<div id="load-the-data" class="callout-inner">
<h3 class="callout-title">Load the data</h3>
<div class="callout-content">
<p>If you have not downloaded the data yet, you can also load it
directly from Zenodo:</p>
<div class="codewrapper sourceCode" id="cb1">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a>data <span class="op">=</span> pd.read_csv(<span class="st">"https://zenodo.org/record/5071376/files/weather_prediction_dataset_light.csv?download=1"</span>)</span></code></pre>
</div>
</div>
</div>
</div>
<div class="codewrapper sourceCode" id="cb2">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" tabindex="-1"></a>filename_data <span class="op">=</span> <span class="st">"weather_prediction_dataset_light.csv"</span></span>
<span id="cb2-4"><a href="#cb2-4" tabindex="-1"></a>data <span class="op">=</span> pd.read_csv(filename_data)</span>
<span id="cb2-5"><a href="#cb2-5" tabindex="-1"></a>data.head()</span></code></pre>
</div>
<table class="table"><colgroup><col width="7%"><col width="7%"><col width="17%"><col width="16%"><col width="21%"><col width="14%"><col width="14%"></colgroup><thead><tr class="header"><th align="right"></th>
<th align="right">DATE</th>
<th align="right">MONTH</th>
<th align="right">BASEL_cloud_cover</th>
<th align="right">BASEL_humidity</th>
<th align="right">BASEL_pressure</th>
<th align="right">…</th>
</tr></thead><tbody><tr class="odd"><td align="right">0</td>
<td align="right">20000101</td>
<td align="right">1</td>
<td align="right">8</td>
<td align="right">0.89</td>
<td align="right">1.0286</td>
<td align="right">…</td>
</tr><tr class="even"><td align="right">1</td>
<td align="right">20000102</td>
<td align="right">1</td>
<td align="right">8</td>
<td align="right">0.87</td>
<td align="right">1.0318</td>
<td align="right">…</td>
</tr><tr class="odd"><td align="right">2</td>
<td align="right">20000103</td>
<td align="right">1</td>
<td align="right">5</td>
<td align="right">0.81</td>
<td align="right">1.0314</td>
<td align="right">…</td>
</tr><tr class="even"><td align="right">3</td>
<td align="right">20000104</td>
<td align="right">1</td>
<td align="right">7</td>
<td align="right">0.79</td>
<td align="right">1.0262</td>
<td align="right">…</td>
</tr><tr class="odd"><td align="right">4</td>
<td align="right">20000105</td>
<td align="right">1</td>
<td align="right">5</td>
<td align="right">0.90</td>
<td align="right">1.0246</td>
<td align="right">…</td>
</tr></tbody></table></div>
<div class="section level3">
<h3 id="brief-exploration-of-the-data">Brief exploration of the data<a class="anchor" aria-label="anchor" href="#brief-exploration-of-the-data"></a></h3>
<p>Let us start with a quick look at the type of features that we find
in the data.</p>
<div class="codewrapper sourceCode" id="cb3">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a>data.columns</span></code></pre>
</div>
<div class="codewrapper">
<h3 class="code-label">OUTPUT<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="output" tabindex="0"><code>Index(['DATE', 'MONTH', 'BASEL_cloud_cover', 'BASEL_humidity',
'BASEL_pressure', 'BASEL_global_radiation', 'BASEL_precipitation',
'BASEL_sunshine', 'BASEL_temp_mean', 'BASEL_temp_min', 'BASEL_temp_max',
...
'SONNBLICK_temp_min', 'SONNBLICK_temp_max', 'TOURS_humidity',
'TOURS_pressure', 'TOURS_global_radiation', 'TOURS_precipitation',
'TOURS_temp_mean', 'TOURS_temp_min', 'TOURS_temp_max'],
dtype='object')</code></pre>
</div>
<p>There is a total of 9 different measured variables (global_radiation,
humidity, etcetera)</p>
<p>Let’s have a look at the shape of the dataset:</p>
<div class="codewrapper sourceCode" id="cb5">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" tabindex="-1"></a>data.shape</span></code></pre>
</div>
<div class="codewrapper">
<h3 class="code-label">OUTPUT<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="output" tabindex="0"><code>(3654, 91)</code></pre>
</div>
<p>This will give both the number of samples (3654) and the number of
features (89 + month + date).</p>
</div>
</section><section><h2 class="section-heading" id="prepare-data">3. Prepare data<a class="anchor" aria-label="anchor" href="#prepare-data"></a></h2>
<hr class="half-width"><div class="section level3">
<h3 id="select-a-subset-and-split-into-data-x-and-labels-y">Select a subset and split into data (X) and labels (y)<a class="anchor" aria-label="anchor" href="#select-a-subset-and-split-into-data-x-and-labels-y"></a></h3>
<p>The full dataset comprises of 10 years (3654 days) from which we will
select only the first 3 years. The present dataset is sorted by “DATE”,
so for each row <code>i</code> in the table we can pick a corresponding
feature and location from row <code>i+1</code> that we later want to
predict with our model. As outlined in step 1, we would like to predict
the sunshine hours for the location: BASEL.</p>
<div class="codewrapper sourceCode" id="cb7">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" tabindex="-1"></a>nr_rows <span class="op">=</span> <span class="dv">365</span><span class="op">*</span><span class="dv">3</span> <span class="co"># 3 years</span></span>
<span id="cb7-2"><a href="#cb7-2" tabindex="-1"></a><span class="co"># data</span></span>
<span id="cb7-3"><a href="#cb7-3" tabindex="-1"></a>X_data <span class="op">=</span> data.loc[:nr_rows] <span class="co"># Select first 3 years</span></span>
<span id="cb7-4"><a href="#cb7-4" tabindex="-1"></a>X_data <span class="op">=</span> X_data.drop(columns<span class="op">=</span>[<span class="st">'DATE'</span>, <span class="st">'MONTH'</span>]) <span class="co"># Drop date and month column</span></span>
<span id="cb7-5"><a href="#cb7-5" tabindex="-1"></a></span>
<span id="cb7-6"><a href="#cb7-6" tabindex="-1"></a><span class="co"># labels (sunshine hours the next day)</span></span>
<span id="cb7-7"><a href="#cb7-7" tabindex="-1"></a>y_data <span class="op">=</span> data.loc[<span class="dv">1</span>:(nr_rows <span class="op">+</span> <span class="dv">1</span>)][<span class="st">"BASEL_sunshine"</span>]</span></code></pre>
</div>
<p>In general, it is important to check if the data contains any
unexpected values such as <code>9999</code> or <code>NaN</code> or
<code>NoneType</code>. You can use the pandas
<code>data.describe()</code> or <code>data.isnull()</code> function for
this. If so, such values must be removed or replaced. In the present
case the data is luckily well prepared and shouldn’t contain such
values, so that this step can be omitted.</p>
</div>
<div class="section level3">
<h3 id="split-data-and-labels-into-training-validation-and-test-set">Split data and labels into training, validation, and test set<a class="anchor" aria-label="anchor" href="#split-data-and-labels-into-training-validation-and-test-set"></a></h3>
<p>As with classical machine learning techniques, it is required in deep
learning to split off a hold-out <em>test set</em> which remains
untouched during model training and tuning. It is later used to evaluate
the model performance. On top, we will also split off an additional
<em>validation set</em>, the reason of which will hopefully become
clearer later in this lesson.</p>
<p>To make our lives a bit easier, we employ a trick to create these 3
datasets, <code>training set</code>, <code>test set</code> and
<code>validation set</code>, by calling the
<code>train_test_split</code> method of <code>scikit-learn</code>
twice.</p>
<p>First we create the training set and leave the remainder of 30 % of
the data to the two hold-out sets.</p>
<div class="codewrapper sourceCode" id="cb8">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> train_test_split</span>
<span id="cb8-2"><a href="#cb8-2" tabindex="-1"></a></span>
<span id="cb8-3"><a href="#cb8-3" tabindex="-1"></a>X_train, X_holdout, y_train, y_holdout <span class="op">=</span> train_test_split(X_data, y_data, test_size<span class="op">=</span><span class="fl">0.3</span>, random_state<span class="op">=</span><span class="dv">0</span>)</span></code></pre>
</div>
<p>Now we split the 30 % of the data in two equal sized parts.</p>
<div class="codewrapper sourceCode" id="cb9">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" tabindex="-1"></a>X_val, X_test, y_val, y_test <span class="op">=</span> train_test_split(X_holdout, y_holdout, test_size<span class="op">=</span><span class="fl">0.5</span>, random_state<span class="op">=</span><span class="dv">0</span>)</span></code></pre>
</div>
<p>Setting the <code>random_state</code> to <code>0</code> is a
short-hand at this point. Note however, that changing this seed of the
pseudo-random number generator will also change the composition of your
data sets. For the sake of reproducibility, this is one example of a
parameters that should not change at all.</p>
</div>
</section><section><h2 class="section-heading" id="choose-a-pretrained-model-or-start-building-architecture-from-scratch">4. Choose a pretrained model or start building architecture from
scratch<a class="anchor" aria-label="anchor" href="#choose-a-pretrained-model-or-start-building-architecture-from-scratch"></a></h2>
<hr class="half-width"><div class="section level3">
<h3 id="regression-and-classification">Regression and classification<a class="anchor" aria-label="anchor" href="#regression-and-classification"></a></h3>
<p>In episode 2 we trained a dense neural network on a
<em>classification task</em>. For this one hot encoding was used
together with a <code>Categorical Crossentropy</code> loss function.
This measured how close the distribution of the neural network outputs
corresponds to the distribution of the three values in the one hot
encoding. Now we want to work on a <em>regression task</em>, thus not
predicting a class label (or integer number) for a datapoint. In
regression, we predict one (and sometimes many) values of a feature.
This is typically a floating point number.</p>
<div id="exercise-architecture-of-the-network" class="callout challenge">
<div class="callout-square">
<i class="callout-icon" data-feather="zap"></i>
</div>
<div id="exercise-architecture-of-the-network" class="callout-inner">
<h3 class="callout-title">Exercise: Architecture of the network</h3>
<div class="callout-content">
<p>As we want to design a neural network architecture for a regression
task, see if you can first come up with the answers to the following
questions:</p>
<ol style="list-style-type: decimal"><li>What must be the dimension of our input layer?</li>
<li>We want to output the prediction of a single number. The output
layer of the NN hence cannot be the same as for the classification task
earlier. This is because the <code>softmax</code> activation being used
had a concrete meaning with respect to the class labels which is not
needed here. What output layer design would you choose for regression?
Hint: A layer with <code>relu</code> activation, with
<code>sigmoid</code> activation or no activation at all?</li>
<li>(Optional) How would we change the model if we would like to output
a prediction of the precipitation in Basel in <em>addition</em> to the
sunshine hours?</li>
</ol></div>
</div>
</div>
<div id="accordionSolution1" class="accordion challenge-accordion accordion-flush">
<div class="accordion-item">
<button class="accordion-button solution-button collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#collapseSolution1" aria-expanded="false" aria-controls="collapseSolution1">
<h4 class="accordion-header" id="headingSolution1"> Show me the solution </h4>
</button>
<div id="collapseSolution1" class="accordion-collapse collapse" aria-labelledby="headingSolution1" data-bs-parent="#accordionSolution1">
<div class="accordion-body">
<ol style="list-style-type: decimal"><li>The shape of the input layer has to correspond to the number of
features in our data: 89</li>
<li>The output is a single value per prediction, so the output layer can
consist of a dense layer with only one node. The <em>softmax</em>
activiation function works well for a classification task, but here we
do not want to restrict the possible outcomes to the range of zero and
one. In fact, we can omit the activation in the output layer.</li>
<li>The output layer should have 2 neurons, one for each number that we
try to predict. Our y_train (and val and test) then becomes a
(n_samples, 2) matrix.</li>
</ol></div>
</div>
</div>
</div>
<p>In our example we want to predict the sunshine hours in Basel (or any
other place in the dataset) for tomorrow based on the weather data of
all 18 locations today. <code>BASEL_sunshine</code> is a floating point
value (i.e. <code>float64</code>). The network should hence output a
single float value which is why the last layer of our network will only
consist of a single node.</p>
<p>We compose a network of two hidden layers to start off with
something. We go by a scheme with 100 neurons in the first hidden layer
and 50 neurons in the second layer. As activation function we settle on
the <code>relu</code> function as a it proved very robust and widely
used. To make our live easier later, we wrap the definition of the
network in a method called <code>create_nn</code>.</p>
<div class="codewrapper sourceCode" id="cb10">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" tabindex="-1"></a><span class="im">from</span> tensorflow <span class="im">import</span> keras</span>
<span id="cb10-2"><a href="#cb10-2" tabindex="-1"></a></span>
<span id="cb10-3"><a href="#cb10-3" tabindex="-1"></a><span class="kw">def</span> create_nn():</span>
<span id="cb10-4"><a href="#cb10-4" tabindex="-1"></a> <span class="co"># Input layer</span></span>
<span id="cb10-5"><a href="#cb10-5" tabindex="-1"></a> inputs <span class="op">=</span> keras.Input(shape<span class="op">=</span>(X_data.shape[<span class="dv">1</span>],), name<span class="op">=</span><span class="st">'input'</span>)</span>
<span id="cb10-6"><a href="#cb10-6" tabindex="-1"></a></span>
<span id="cb10-7"><a href="#cb10-7" tabindex="-1"></a> <span class="co"># Dense layers</span></span>
<span id="cb10-8"><a href="#cb10-8" tabindex="-1"></a> layers_dense <span class="op">=</span> keras.layers.Dense(<span class="dv">100</span>, <span class="st">'relu'</span>)(inputs)</span>
<span id="cb10-9"><a href="#cb10-9" tabindex="-1"></a> layers_dense <span class="op">=</span> keras.layers.Dense(<span class="dv">50</span>, <span class="st">'relu'</span>)(layers_dense)</span>
<span id="cb10-10"><a href="#cb10-10" tabindex="-1"></a></span>
<span id="cb10-11"><a href="#cb10-11" tabindex="-1"></a> <span class="co"># Output layer</span></span>
<span id="cb10-12"><a href="#cb10-12" tabindex="-1"></a> outputs <span class="op">=</span> keras.layers.Dense(<span class="dv">1</span>)(layers_dense)</span>
<span id="cb10-13"><a href="#cb10-13" tabindex="-1"></a></span>
<span id="cb10-14"><a href="#cb10-14" tabindex="-1"></a> <span class="cf">return</span> keras.Model(inputs<span class="op">=</span>inputs, outputs<span class="op">=</span>outputs, name<span class="op">=</span><span class="st">"weather_prediction_model"</span>)</span>
<span id="cb10-15"><a href="#cb10-15" tabindex="-1"></a></span>
<span id="cb10-16"><a href="#cb10-16" tabindex="-1"></a>model <span class="op">=</span> create_nn()</span></code></pre>
</div>
<p>The shape of the input layer has to correspond to the number of
features in our data: <code>89</code>. We use
<code>X_data.shape[1]</code> to obtain this value dynamically</p>
<p>The output layer here is a dense layer with only 1 node. And we here
have chosen to use <em>no activation function</em>. While we might use
<em>softmax</em> for a classification task, here we do not want to
restrict the possible outcomes for a start.</p>
<p>In addition, we have here chosen to write the network creation as a
function so that we can use it later again to initiate new models.</p>
<p>Let us check how our model looks like by calling the
<code>summary</code> method.</p>
<div class="codewrapper sourceCode" id="cb11">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" tabindex="-1"></a>model.summary()</span></code></pre>
</div>
<div class="codewrapper">
<h3 class="code-label">OUTPUT<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="output" tabindex="0"><code>Model: "weather_prediction_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, 89)] 0
_________________________________________________________________
dense (Dense) (None, 100) 9000
_________________________________________________________________
dense_1 (Dense) (None, 50) 5050
_________________________________________________________________
dense_2 (Dense) (None, 1) 51
=================================================================
Total params: 14,101
Trainable params: 14,101
Non-trainable params: 0</code></pre>
</div>
<p>When compiling the model we can define a few very important aspects.
We will discuss them now in more detail.</p>
</div>
</section><section><h2 class="section-heading" id="intermezzo-how-do-neural-networks-learn">Intermezzo: How do neural networks learn?<a class="anchor" aria-label="anchor" href="#intermezzo-how-do-neural-networks-learn"></a></h2>
<hr class="half-width"><p>In the introduction we learned about the loss function: it quantifies
the total error of the predictions made by the model. During model
training we aim to find the model parameters that minimize the loss.
This is called optimization, but how does optimization actually
work?</p>
<div class="section level3">
<h3 id="gradient-descent">Gradient descent<a class="anchor" aria-label="anchor" href="#gradient-descent"></a></h3>
<p>Gradient descent is a widely used optimization algorithm, most other
optimization algorithms are based on it. It works as follows: Imagine a
neural network with only one neuron. Take a look at the figure below.
The plot shows the loss as a function of the weight of the neuron. As
you can see there is a global loss minimum, we would like to find the
weight at this point in the parabola. To do this, we initialize the
model weight with some random value. Then we compute the gradient of the
loss function with respect to the weight. This tells us how much the
loss function will change if we change the weight by a small amount.
Then, we update the weight by taking a small step in the direction of
the negative gradient, so down the slope. This will slightly decrease
the loss. This process is repeated until the loss function reaches a
minimum. The size of the step that is taken in each iteration is called
the ‘learning rate’.</p>
<figure><img src="fig/03_gradient_descent.png" alt="Plot of the loss as a function of the weights. Through gradient descent the global loss minimum is found" class="figure mx-auto d-block"></figure></div>
<div class="section level3">
<h3 id="batch-gradient-descent">Batch gradient descent<a class="anchor" aria-label="anchor" href="#batch-gradient-descent"></a></h3>
<p>You could use the entire training dataset to perform one learning
step in gradient descent, which would mean that one epoch equals one
learning step. In practice, in each learning step we only use a subset
of the training data to compute the loss and the gradients. This subset
is called a ‘batch’, the number of samples in one batch is called the
‘batch size’.</p>
<div id="exercise-gradient-descent" class="callout challenge">
<div class="callout-square">
<i class="callout-icon" data-feather="zap"></i>
</div>
<div id="exercise-gradient-descent" class="callout-inner">
<h3 class="callout-title">Exercise: Gradient descent</h3>
<div class="callout-content">
<p>Answer the following questions:</p>
<div class="section level3">
<h3 id="what-is-the-goal-of-optimization">1. What is the goal of optimization?<a class="anchor" aria-label="anchor" href="#what-is-the-goal-of-optimization"></a></h3>
<ul><li>A. To find the weights that maximize the loss function</li>
<li>B. To find the weights that minimize the loss function</li>
</ul></div>
<div class="section level3">
<h3 id="what-happens-in-one-gradient-descent-step">2. What happens in one gradient descent step?<a class="anchor" aria-label="anchor" href="#what-happens-in-one-gradient-descent-step"></a></h3>
<ul><li>A. The weights are adjusted so that we move in the direction of the
gradient, so up the slope of the loss function</li>
<li>B. The weights are adjusted so that we move in the direction of the
gradient, so down the slope of the loss function</li>
<li>C. The weights are adjusted so that we move in the direction of the
negative gradient, so up the slope of the loss function</li>
<li>D. The weights are adjusted so that we move in the direction of the
negative gradient, so down the slope of the loss function</li>
</ul></div>
<div class="section level3">
<h3 id="when-the-batch-size-is-increased">3. When the batch size is increased:<a class="anchor" aria-label="anchor" href="#when-the-batch-size-is-increased"></a></h3>
<p>(multiple answers might apply)</p>
<ul><li>A. The number of samples in an epoch also increases</li>
<li>B. The number of batches in an epoch goes down</li>
<li>C. The training progress is more jumpy, because more samples are
consulted in each update step (one batch).</li>
<li>D. The memory load (memory as in computer hardware) of the training
process is increased</li>
</ul></div>
</div>
</div>
</div>
<div id="accordionSolution2" class="accordion challenge-accordion accordion-flush">
<div class="accordion-item">
<button class="accordion-button solution-button collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#collapseSolution2" aria-expanded="false" aria-controls="collapseSolution2">
<h4 class="accordion-header" id="headingSolution2"> Show me the solution </h4>
</button>
<div id="collapseSolution2" class="accordion-collapse collapse" aria-labelledby="headingSolution2" data-bs-parent="#accordionSolution2">
<div class="accordion-body">
<ol style="list-style-type: decimal"><li><p>Correct answer: B. To find the weights that minimize the loss
function. The loss function quantifies the total error of the network,
we want to have the smallest error as possible, hence we minimize the
loss.</p></li>
<li><p>Correct answer: D The weights are adjusted so that we move in the
direction of the negative gradient, so down the slope of the loss
function. We want to move towards the global minimum, so in the opposite
direction of the gradient.</p></li>
<li>
<p>Correct answer: B & D</p>
<ul><li>A. The number of samples in an epoch also increases
(<strong>incorrect</strong>, an epoch is always defined as passing
through the training data for one cycle)</li>
<li>B. The number of batches in an epoch goes down
(<strong>correct</strong>, the number of batches is the samples in an
epoch divided by the batch size)</li>
<li>C. The training progress is more jumpy, because more samples are
consulted in each update step (one batch). (<strong>incorrect</strong>,
more samples are consulted in each update step, but this makes the
progress less jumpy since you get a more accurate estimate of the loss
in the entire dataset)</li>
<li>D. The memory load (memory as in computer hardware) of the training
process is increased (<strong>correct</strong>, the data is begin loaded
one batch at a time, so more samples means more memory usage)</li>
</ul></li>
</ol></div>
</div>
</div>
</div>
</div>
</section><section><h2 class="section-heading" id="choose-a-loss-function-and-optimizer">5. Choose a loss function and optimizer<a class="anchor" aria-label="anchor" href="#choose-a-loss-function-and-optimizer"></a></h2>
<hr class="half-width"><div class="section level3">
<h3 id="loss-function">Loss function<a class="anchor" aria-label="anchor" href="#loss-function"></a></h3>
<p>The loss is what the neural network will be optimized on during
training, so choosing a suitable loss function is crucial for training
neural networks. In the given case we want to stimulate that the
predicted values are as close as possible to the true values. This is
commonly done by using the <em>mean squared error</em> (mse) or the
<em>mean absolute error</em> (mae), both of which should work OK in this
case. Often, mse is preferred over mae because it “punishes” large
prediction errors more severely. In Keras this is implemented in the
<code>keras.losses.MeanSquaredError</code> class (see Keras
documentation: <a href="https://keras.io/api/losses/" class="external-link uri">https://keras.io/api/losses/</a>). This can be provided into
the <code>model.compile</code> method with the <code>loss</code>
parameter and setting it to <code>mse</code>, e.g.</p>
<!--cce:skip-->
<div class="codewrapper sourceCode" id="cb13">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" tabindex="-1"></a>model.<span class="bu">compile</span>(loss<span class="op">=</span><span class="st">'mse'</span>)</span></code></pre>
</div>
</div>
<div class="section level3">
<h3 id="optimizer">Optimizer<a class="anchor" aria-label="anchor" href="#optimizer"></a></h3>
<p>Somewhat coupled to the loss function is the <em>optimizer</em> that
we want to use. The <em>optimizer</em> here refers to the algorithm with
which the model learns to optimize on the provided loss function. A
basic example for such an optimizer would be <em>stochastic gradient
descent</em>. For now, we can largely skip this step and pick one of the
most common optimizers that works well for most tasks: the <em>Adam
optimizer</em>. Similar to activation functions, the choice of optimizer
depends on the problem you are trying to solve, your model architecture
and your data. <em>Adam</em> is a good starting point though, which is
why we chose it.</p>
<!--cce:skip-->
<div class="codewrapper sourceCode" id="cb14">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" tabindex="-1"></a>model.<span class="bu">compile</span>(optimizer<span class="op">=</span><span class="st">'adam'</span>,</span>
<span id="cb14-2"><a href="#cb14-2" tabindex="-1"></a> loss<span class="op">=</span><span class="st">'mse'</span>)</span></code></pre>
</div>
</div>
<div class="section level3">
<h3 id="metrics">Metrics<a class="anchor" aria-label="anchor" href="#metrics"></a></h3>
<p>In our first example (episode 2) we plotted the progression of the
loss during training. That is indeed a good first indicator if things
are working alright, i.e. if the loss is indeed decreasing as it should
with the number of epochs. However, when models become more complicated
then also the loss functions often become less intuitive. That is why it
is good practice to monitor the training process with additional, more
intuitive metrics. They are not used to optimize the model, but are
simply recorded during training.</p>
<p>With Keras, such additional metrics can be added via
<code>metrics=[...]</code> parameter and can contain one or multiple
metrics of interest. Here we could for instance chose <code>mae</code>
(<a href="https://glosario.carpentries.org/en/#mean_absolute_error" class="external-link">mean
absolute error</a>), or the the <a href="https://glosario.carpentries.org/en/#root_mean_squared_error" class="external-link"><em>root
mean squared error</em> (RMSE)</a> which unlike the <em>mse</em> has the
same units as the predicted values. For the sake of units, we choose the
latter.</p>
<div class="codewrapper sourceCode" id="cb15">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" tabindex="-1"></a>model.<span class="bu">compile</span>(optimizer<span class="op">=</span><span class="st">'adam'</span>,</span>
<span id="cb15-2"><a href="#cb15-2" tabindex="-1"></a> loss<span class="op">=</span><span class="st">'mse'</span>,</span>
<span id="cb15-3"><a href="#cb15-3" tabindex="-1"></a> metrics<span class="op">=</span>[keras.metrics.RootMeanSquaredError()])</span></code></pre>
</div>
<p>Let’s create a <code>compile_model</code> function to easily compile
the model throughout this lesson:</p>
<div class="codewrapper sourceCode" id="cb16">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" tabindex="-1"></a><span class="kw">def</span> compile_model(model):</span>
<span id="cb16-2"><a href="#cb16-2" tabindex="-1"></a> model.<span class="bu">compile</span>(optimizer<span class="op">=</span><span class="st">'adam'</span>,</span>
<span id="cb16-3"><a href="#cb16-3" tabindex="-1"></a> loss<span class="op">=</span><span class="st">'mse'</span>,</span>
<span id="cb16-4"><a href="#cb16-4" tabindex="-1"></a> metrics<span class="op">=</span>[keras.metrics.RootMeanSquaredError()])</span>
<span id="cb16-5"><a href="#cb16-5" tabindex="-1"></a>compile_model(model)</span></code></pre>
</div>
<p>With this, we complete the compilation of our network and are ready
to start training.</p>
</div>
</section><section><h2 class="section-heading" id="train-the-model">6. Train the model<a class="anchor" aria-label="anchor" href="#train-the-model"></a></h2>
<hr class="half-width"><p>Now that we created and compiled our dense neural network, we can
start training it. One additional concept we need to introduce though,
is the <code>batch_size</code>. This defines how many samples from the
training data will be used to estimate the error gradient before the
model weights are updated. Larger batches will produce better, more
accurate gradient estimates but also less frequent updates of the
weights. Here we are going to use a batch size of 32 which is a common
starting point.</p>
<div class="codewrapper sourceCode" id="cb17">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" tabindex="-1"></a>history <span class="op">=</span> model.fit(X_train, y_train,</span>
<span id="cb17-2"><a href="#cb17-2" tabindex="-1"></a> batch_size<span class="op">=</span><span class="dv">32</span>,</span>
<span id="cb17-3"><a href="#cb17-3" tabindex="-1"></a> epochs<span class="op">=</span><span class="dv">200</span>,</span>
<span id="cb17-4"><a href="#cb17-4" tabindex="-1"></a> verbose<span class="op">=</span><span class="dv">2</span>)</span></code></pre>
</div>
<p>We can plot the training process using the <code>history</code>
object returned from the model training. We will create a function for
it, because we will make use of this more often in this lesson!</p>
<div class="codewrapper sourceCode" id="cb18">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
<span id="cb18-2"><a href="#cb18-2" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb18-3"><a href="#cb18-3" tabindex="-1"></a></span>
<span id="cb18-4"><a href="#cb18-4" tabindex="-1"></a><span class="kw">def</span> plot_history(history, metrics):</span>
<span id="cb18-5"><a href="#cb18-5" tabindex="-1"></a> <span class="co">"""</span></span>
<span id="cb18-6"><a href="#cb18-6" tabindex="-1"></a><span class="co"> Plot the training history</span></span>
<span id="cb18-7"><a href="#cb18-7" tabindex="-1"></a></span>
<span id="cb18-8"><a href="#cb18-8" tabindex="-1"></a><span class="co"> Args:</span></span>
<span id="cb18-9"><a href="#cb18-9" tabindex="-1"></a><span class="co"> history (keras History object that is returned by model.fit())</span></span>
<span id="cb18-10"><a href="#cb18-10" tabindex="-1"></a><span class="co"> metrics (str, list): Metric or a list of metrics to plot</span></span>
<span id="cb18-11"><a href="#cb18-11" tabindex="-1"></a><span class="co"> """</span></span>
<span id="cb18-12"><a href="#cb18-12" tabindex="-1"></a> history_df <span class="op">=</span> pd.DataFrame.from_dict(history.history)</span>
<span id="cb18-13"><a href="#cb18-13" tabindex="-1"></a> sns.lineplot(data<span class="op">=</span>history_df[metrics])</span>
<span id="cb18-14"><a href="#cb18-14" tabindex="-1"></a> plt.xlabel(<span class="st">"epochs"</span>)</span>
<span id="cb18-15"><a href="#cb18-15" tabindex="-1"></a> plt.ylabel(<span class="st">"metric"</span>)</span>
<span id="cb18-16"><a href="#cb18-16" tabindex="-1"></a></span>
<span id="cb18-17"><a href="#cb18-17" tabindex="-1"></a>plot_history(history, <span class="st">'root_mean_squared_error'</span>)</span></code></pre>
</div>
<figure><img src="fig/03_training_history_1_rmse.png" alt="Plot of the RMSE over epochs for the trained model that shows a decreasing error metric" class="figure mx-auto d-block"></figure><p>This looks very promising! Our metric (“RMSE”) is dropping nicely and
while it maybe keeps fluctuating a bit it does end up at fairly low
<em>RMSE</em> values. But the <em>RMSE</em> is just the root
<em>mean</em> squared error, so we might want to look a bit more in
detail how well our just trained model does in predicting the sunshine
hours.</p>
</section><section><h2 class="section-heading" id="perform-a-predictionclassification">7. Perform a Prediction/Classification<a class="anchor" aria-label="anchor" href="#perform-a-predictionclassification"></a></h2>
<hr class="half-width"><p>Now that we have our model trained, we can make a prediction with the
model before measuring the performance of our neural network.</p>
<div class="codewrapper sourceCode" id="cb19">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb19-1"><a href="#cb19-1" tabindex="-1"></a>y_train_predicted <span class="op">=</span> model.predict(X_train)</span>
<span id="cb19-2"><a href="#cb19-2" tabindex="-1"></a>y_test_predicted <span class="op">=</span> model.predict(X_test)</span></code></pre>
</div>
</section><section><h2 class="section-heading" id="measure-performance">8. Measure performance<a class="anchor" aria-label="anchor" href="#measure-performance"></a></h2>
<hr class="half-width"><p>There is not a single way to evaluate how a model performs. But there
are at least two very common approaches. For a <em>classification
task</em> that is to compute a <em>confusion matrix</em> for the test
set which shows how often particular classes were predicted correctly or
incorrectly.</p>
<p>For the present <em>regression task</em>, it makes more sense to
compare true and predicted values in a scatter plot.</p>
<p>So, let’s look at how the predicted sunshine hour have developed with
reference to their ground truth values.</p>
<div class="codewrapper sourceCode" id="cb20">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb20-1"><a href="#cb20-1" tabindex="-1"></a><span class="co"># We define a function that we will reuse in this lesson</span></span>
<span id="cb20-2"><a href="#cb20-2" tabindex="-1"></a><span class="kw">def</span> plot_predictions(y_pred, y_true, title):</span>
<span id="cb20-3"><a href="#cb20-3" tabindex="-1"></a> plt.style.use(<span class="st">'ggplot'</span>) <span class="co"># optional, that's only to define a visual style</span></span>
<span id="cb20-4"><a href="#cb20-4" tabindex="-1"></a> plt.scatter(y_pred, y_true, s<span class="op">=</span><span class="dv">10</span>, alpha<span class="op">=</span><span class="fl">0.5</span>)</span>
<span id="cb20-5"><a href="#cb20-5" tabindex="-1"></a> plt.xlabel(<span class="st">"predicted sunshine hours"</span>)</span>
<span id="cb20-6"><a href="#cb20-6" tabindex="-1"></a> plt.ylabel(<span class="st">"true sunshine hours"</span>)</span>
<span id="cb20-7"><a href="#cb20-7" tabindex="-1"></a> plt.title(title)</span>
<span id="cb20-8"><a href="#cb20-8" tabindex="-1"></a></span>
<span id="cb20-9"><a href="#cb20-9" tabindex="-1"></a>plot_predictions(y_train_predicted, y_train, title<span class="op">=</span><span class="st">'Predictions on the training set'</span>)</span></code></pre>
</div>
<figure><img src="fig/03_regression_predictions_trainset.png" alt="Scatter plot between predictions and true sunshine hours in Basel on the train set showing a concise spread" class="figure mx-auto d-block"></figure><div class="codewrapper sourceCode" id="cb21">
<h3 class="code-label">PYTHON<i aria-hidden="true" data-feather="chevron-left"></i><i aria-hidden="true" data-feather="chevron-right"></i>
</h3>
<pre class="sourceCode python" tabindex="0"><code class="sourceCode python"><span id="cb21-1"><a href="#cb21-1" tabindex="-1"></a>plot_predictions(y_test_predicted, y_test, title<span class="op">=</span><span class="st">'Predictions on the test set'</span>)</span></code></pre>
</div>
<figure><img src="fig/03_regression_predictions_testset.png" alt="Scatter plot between predictions and true sunshine hours in Basel on the test set showing a wide spread" class="figure mx-auto d-block"></figure><div id="exercise-reflecting-on-our-results" class="callout challenge">
<div class="callout-square">
<i class="callout-icon" data-feather="zap"></i>
</div>
<div id="exercise-reflecting-on-our-results" class="callout-inner">
<h3 class="callout-title">Exercise: Reflecting on our results</h3>
<div class="callout-content">
<ul><li>Is the performance of the model as you expected (or
better/worse)?</li>
<li>Is there a noteable difference between training set and test set?
And if so, any idea why?</li>
<li>(Optional) When developing a model, you will often vary different
aspects of your model like which features you use, model parameters and
architecture. It is important to settle on a single-number evaluation
metric to compare your models.
<ul><li>What single-number evaluation metric would you choose here and
why?</li>
</ul></li>
</ul></div>
</div>
</div>
<div id="accordionSolution3" class="accordion challenge-accordion accordion-flush">
<div class="accordion-item">
<button class="accordion-button solution-button collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#collapseSolution3" aria-expanded="false" aria-controls="collapseSolution3">
<h4 class="accordion-header" id="headingSolution3"> Show me the solution </h4>
</button>
<div id="collapseSolution3" class="accordion-collapse collapse" aria-labelledby="headingSolution3" data-bs-parent="#accordionSolution3">