diff --git a/exercises-pdf/ic_nested_resampling.pdf b/exercises-pdf/ic_nested_resampling.pdf new file mode 100644 index 000000000..d6996677c Binary files /dev/null and b/exercises-pdf/ic_nested_resampling.pdf differ diff --git a/exercises/nested-resampling/ex_rnw/ex_recap_nested_resampling.Rnw b/exercises/nested-resampling/ex_rnw/ex_recap_nested_resampling.Rnw new file mode 100644 index 000000000..af187443c --- /dev/null +++ b/exercises/nested-resampling/ex_rnw/ex_recap_nested_resampling.Rnw @@ -0,0 +1,18 @@ +Assume we have a dataset $\D = \Dset$ with $n$ observations of a continuous target variable $y$ and $p$ features $x_1, \ldots, x_p$. We want to build a prediction model that can be deployed and we want to estimate the corresponding generalization error. For this, we build a graph learner that consists of a neural network in one arm and a random forest in the other arm. The neural network shall have one hyperparameter, the number of hidden layers; assume the number of nodes per hidden layer and all other possible hyperparameters are fixed. The random forest shall have two hyperparameters, the maximal depth and the number of trees; assume that all other possible hyperparameters are fixed. In total, we pursue three goals (not necessarily in this order): +\begin{itemize} + \item[A)] Train a final model $\hat{f}$ that can be deployed. + \item[B)] Tune the graph learner. + \item[C)] Estimate the generalization error. +\end{itemize} + +Answer the following questions: +\begin{itemize} + \item[1)] For each goal: + \begin{itemize} + \item[a)] Do we need resampling, nested resampling, or no resampling? + \item[b)] Which fraction of the available dataset can be used? + \end{itemize} + \item[2)] In which order (e.g., "A-B-C") can the three goals be tackled? + \item[3)] Write down a pseudo-algorithm for carrying out all three steps (in a sensible order as derived in 2)) + \item[4)] Assume the number of hidden layers is $\in{\{1,2,3,4,5\}}$, the number of trees is $\in{\{10,50,100,200\}}$ and the maximal depth is $\in{\{2,3,4,5\}}$. Use 3-fold cross-validation as outer resampling and 4-fold cross-validaion as inner resampling. Compute the total number of model trainings carried out in 3). +\end{itemize} diff --git a/exercises/nested-resampling/ic_nested_resampling.Rnw b/exercises/nested-resampling/ic_nested_resampling.Rnw new file mode 100644 index 000000000..9a7c1c21a --- /dev/null +++ b/exercises/nested-resampling/ic_nested_resampling.Rnw @@ -0,0 +1,21 @@ +% !Rnw weave = knitr + +<>= +library('knitr') +knitr::set_parent("../../style/preamble_ueb.Rnw") +@ + +\input{../../latex-math/basic-math.tex} +\input{../../latex-math/basic-ml.tex} +\input{../../latex-math/ml-ensembles.tex} +\input{../../latex-math/ml-hpo.tex} +\input{../../latex-math/ml-eval.tex} + + +\kopfic{}{Nested Resampling} + + +\aufgabe{Recap Nested Resampling}{ +<>= +@ +} \ No newline at end of file