ic nested resampling (#1160)

* created ic_nested_resampling with pdf * update to include date * update recap on nested resampling ---------
slds-lmu · Nov 14, 2023 · fe203ae · fe203ae
1 parent a3e4dc8
commit fe203ae
Show file tree

Hide file tree

Showing 3 changed files with 39 additions and 0 deletions.
diff --git a/exercises-pdf/ic_nested_resampling.pdf b/exercises-pdf/ic_nested_resampling.pdf
diff --git a/exercises/nested-resampling/ex_rnw/ex_recap_nested_resampling.Rnw b/exercises/nested-resampling/ex_rnw/ex_recap_nested_resampling.Rnw
@@ -0,0 +1,18 @@
+Assume we have a dataset $\D = \Dset$ with $n$ observations of a continuous target variable $y$ and $p$ features $x_1, \ldots, x_p$. We want to build a prediction model that can be deployed and we want to estimate the corresponding generalization error. For this, we build a graph learner that consists of a neural network in one arm and a random forest in the other arm. The neural network shall have one hyperparameter, the number of hidden layers; assume the number of nodes per hidden layer and all other possible hyperparameters are fixed. The random forest shall have two hyperparameters, the maximal depth and the number of trees; assume that all other possible hyperparameters are fixed. In total, we pursue three goals (not necessarily in this order):
+\begin{itemize}
+  \item[A)] Train a final model $\hat{f}$ that can be deployed.
+  \item[B)] Tune the graph learner.
+  \item[C)] Estimate the generalization error.
+\end{itemize}
+
+Answer the following questions:
+\begin{itemize}
+  \item[1)] For each goal:
+    \begin{itemize}
+      \item[a)] Do we need resampling, nested resampling, or no resampling?
+      \item[b)] Which fraction of the available dataset can be used?
+    \end{itemize}
+  \item[2)] In which order (e.g., "A-B-C") can the three goals be tackled?
+  \item[3)] Write down a pseudo-algorithm for carrying out all three steps (in a sensible order as derived in 2))
+  \item[4)] Assume the number of hidden layers is $\in{\{1,2,3,4,5\}}$, the number of trees is $\in{\{10,50,100,200\}}$ and the maximal depth is $\in{\{2,3,4,5\}}$. Use 3-fold cross-validation as outer resampling and 4-fold cross-validaion as inner resampling. Compute the total number of model trainings carried out in 3).
+\end{itemize}
diff --git a/exercises/nested-resampling/ic_nested_resampling.Rnw b/exercises/nested-resampling/ic_nested_resampling.Rnw
@@ -0,0 +1,21 @@
+% !Rnw weave = knitr
+
+<<setup-child, include = FALSE>>=
+library('knitr')
+knitr::set_parent("../../style/preamble_ueb.Rnw")
+@
+
+\input{../../latex-math/basic-math.tex}
+\input{../../latex-math/basic-ml.tex}
+\input{../../latex-math/ml-ensembles.tex}
+\input{../../latex-math/ml-hpo.tex}
+\input{../../latex-math/ml-eval.tex}
+
+
+\kopfic{}{Nested Resampling}
+
+
+\aufgabe{Recap Nested Resampling}{
+<<child="ex_rnw/ex_recap_nested_resampling.Rnw">>=
+@
+}