diff --git a/content/chapters/15_regularization/15-09-wd.md b/content/chapters/15_regularization/15-09-wd.md new file mode 100644 index 0000000..e6449e3 --- /dev/null +++ b/content/chapters/15_regularization/15-09-wd.md @@ -0,0 +1,15 @@ +--- +title: "Chapter 15.09: Weight decay and L2" +weight: 15009 +--- +In this section, we show that L2 regularization with gradient descent is equivalent to weight decay and see how weight decay changes the optimization trajectory. + + + +### Lecture video + +{{< video id="xASHDEAWP0U" >}} + +### Lecture slides + +{{< pdfjs file="https://github.com/slds-lmu/lecture_sl/blob/main/slides-pdf/slides-regu-wd-vs-l2.pdf" >}} diff --git a/content/chapters/15_regularization/15-09-geom-l2.md b/content/chapters/15_regularization/15-10-geom-l2.md similarity index 86% rename from content/chapters/15_regularization/15-09-geom-l2.md rename to content/chapters/15_regularization/15-10-geom-l2.md index 61f213b..52f49ef 100644 --- a/content/chapters/15_regularization/15-09-geom-l2.md +++ b/content/chapters/15_regularization/15-10-geom-l2.md @@ -1,6 +1,6 @@ --- -title: "Chapter 15.09: Geometry of L2 Regularization" -weight: 15009 +title: "Chapter 15.10: Geometry of L2 Regularization" +weight: 15010 --- In this section, we provide a geometric understanding of \\(L2\\) regularization, showing how parameters are shrunk according to the eigenvalues of the Hessian of empirical risk, and discuss its correspondence to weight decay. diff --git a/content/chapters/15_regularization/15-10-geom-l1.md b/content/chapters/15_regularization/15-11-geom-l1.md similarity index 83% rename from content/chapters/15_regularization/15-10-geom-l1.md rename to content/chapters/15_regularization/15-11-geom-l1.md index a76c553..489d945 100644 --- a/content/chapters/15_regularization/15-10-geom-l1.md +++ b/content/chapters/15_regularization/15-11-geom-l1.md @@ -1,6 +1,6 @@ --- -title: "Chapter 15.10: Geometry of L1 Regularization" -weight: 15010 +title: "Chapter 15.11: Geometry of L1 Regularization" +weight: 15011 --- In this section, we provide a geometric understanding of \\(L1\\) regularization and show that it encourages sparsity in the parameter vector. diff --git a/content/chapters/15_regularization/15-11-early-stopping.md b/content/chapters/15_regularization/15-12-early-stopping.md similarity index 84% rename from content/chapters/15_regularization/15-11-early-stopping.md rename to content/chapters/15_regularization/15-12-early-stopping.md index ce0783a..0286c92 100644 --- a/content/chapters/15_regularization/15-11-early-stopping.md +++ b/content/chapters/15_regularization/15-12-early-stopping.md @@ -1,6 +1,6 @@ --- -title: "Chapter 15.11: Early Stopping" -weight: 15011 +title: "Chapter 15.12: Early Stopping" +weight: 15012 --- In this section, we introduce early stopping and show how it can act as a regularizer. diff --git a/content/chapters/15_regularization/15-12-ridge-deep.md b/content/chapters/15_regularization/15-13-ridge-deep.md similarity index 79% rename from content/chapters/15_regularization/15-12-ridge-deep.md rename to content/chapters/15_regularization/15-13-ridge-deep.md index 7df9bf5..aa0bbad 100644 --- a/content/chapters/15_regularization/15-12-ridge-deep.md +++ b/content/chapters/15_regularization/15-13-ridge-deep.md @@ -1,6 +1,6 @@ --- -title: "Chapter 15.12: Details on Ridge Regression: Deep Dive" -weight: 15012 +title: "Chapter 15.13: Details on Ridge Regression: Deep Dive" +weight: 15013 --- In this section, we consider Ridge regression as row-augmentation and as minimizing risk under feature noise. We also discuss the bias-variance tradeoff. diff --git a/content/chapters/15_regularization/15-13-lasso-deep.md b/content/chapters/15_regularization/15-14-lasso-deep.md similarity index 77% rename from content/chapters/15_regularization/15-13-lasso-deep.md rename to content/chapters/15_regularization/15-14-lasso-deep.md index 33cf405..b2bbb40 100644 --- a/content/chapters/15_regularization/15-13-lasso-deep.md +++ b/content/chapters/15_regularization/15-14-lasso-deep.md @@ -1,6 +1,6 @@ --- -title: "Chapter 15.13: Soft-thresholding and L1 regularization: Deep Dive" -weight: 15013 +title: "Chapter 15.14: Soft-thresholding and L1 regularization: Deep Dive" +weight: 15014 --- In this section, we prove the previously stated proposition regarding soft-thresholding and L1 regularization.