Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup style folder #1116

Closed
wants to merge 33 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8a9e540
Start cleanup process
jemus42 Jun 27, 2023
4db233e
Merge branch 'master' into cleanup
jemus42 Jun 27, 2023
4744aa9
Remove vestigial knitrout environment
jemus42 Jun 27, 2023
df43c79
Merge branch 'master' into cleanup
jemus42 Jun 28, 2023
8ed794f
Deduplication/annotation/misc
jemus42 Jun 28, 2023
7e80fe2
Move unused(?) color jpgs
jemus42 Jul 3, 2023
0069875
Leftover \scriptsize calls
jemus42 Jul 3, 2023
264a11f
Keep knitrout defined just in case
jemus42 Jul 6, 2023
7122f47
Merge branch 'master' into cleanup
jemus42 Jul 6, 2023
edb9c1f
Remove PDFs from slides
jemus42 Jul 7, 2023
1a72e61
Fix duplicate \documentclass
jemus42 Jul 7, 2023
56d253b
Formatting
jemus42 Jul 7, 2023
406fb44
Remove usage of unneeded kframe env
jemus42 Jul 7, 2023
c653605
More preamble cleanup
jemus42 Jul 7, 2023
c4cfe66
Move citebutton to lmu-lecture.sty
jemus42 Jul 7, 2023
a50b1e5
More iteration
jemus42 Jul 7, 2023
b9b7cfc
Merge branch 'master' into cleanup
jemus42 Jul 7, 2023
5ee32ff
Cleanup old files
jemus42 Jul 7, 2023
2257f9d
Cleanup
jemus42 Jul 7, 2023
0cf3439
Merge branch 'master' into cleanup
jemus42 Jul 7, 2023
b2b4efc
Merge branch 'master' into cleanup
jemus42 Jul 7, 2023
aaf0df5
Merge branch 'master' into cleanup
jemus42 Jul 10, 2023
e4235bb
Typo
jemus42 Jul 17, 2023
c8f023d
Merge branch 'master' into cleanup
jemus42 Jul 18, 2023
f8d0997
Merge branch 'master' into cleanup
jemus42 Aug 23, 2023
3b8b4cf
Merge branch 'master' into cleanup
jemus42 Aug 29, 2023
44f602e
merge master
jemus42 Oct 18, 2023
323a4b5
Merge branch 'master' into cleanup
jemus42 Oct 30, 2023
6684b3a
Merge branch 'master' into cleanup
jemus42 Jan 18, 2024
815d757
Merge branch 'master' into cleanup
jemus42 Feb 9, 2024
670e547
Merge branch 'master' into cleanup
jemus42 Feb 13, 2024
c4484f8
Merge branch 'master' into cleanup
jemus42 Mar 19, 2024
701af15
Merge branch 'master' into cleanup
jemus42 Mar 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 31 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
#--------------------------------------------#
# Things that need to exist but not in git #
#--------------------------------------------#

nospeakermargin.tex
speakermargin.tex

#-----------------------------------------------------------------------------#
# TeX intermediate stuff everybody loves to hate and hates to commit to git #
#-----------------------------------------------------------------------------#

*.pdf
*.aux
*.fdb_latexmk
Expand All @@ -12,6 +20,11 @@ speakermargin.tex
*.toc
*.vrb
*.synctex.gz

#----------------------------------------------------------#
# Editor-specific stuff that should generally be ignored #
#----------------------------------------------------------#

*.DS_Store
*.Rproj
*.Rhistory
Expand Down Expand Up @@ -59,7 +72,12 @@ slides/ml-philosophy/slides-*.pdf
slides/*/speakermargin.tex

# vim swap files
*.swp
# http://stratus3d.com/blog/2018/06/03/stop-excluding-editor-temp-files-in-gitignore/
[._]*.s[a-v][a-z]
[._]*.sw[a-p]
[._]s[a-v][a-z]
[._]sw[a-p]

# used for atom editor
.latexcfg
# Xournal files
Expand All @@ -72,3 +90,15 @@ NAMESPACE
.idea/*
*.pkl

# RStudio / R in general
*.Rproj
*.Rhistory
.Rproj.user
.RData
.Rdata

#-----------------------------------#
# OS-specific temp/preview files #
#-----------------------------------#
*.DS_Store

5 changes: 5 additions & 0 deletions .ignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Complementary to .gitignore, this file also affects tools like ripgrep

slides/attic/*
slides/*/attic
attic
23 changes: 0 additions & 23 deletions DESCRIPTION

This file was deleted.

File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes.
66 changes: 33 additions & 33 deletions slides/cart/slides-cart-computationalaspects.tex
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
\begin{columns}[T]
\column{0.49\textwidth}
Original data
\begin{knitrout}\scriptsize
\scriptsize
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}
\begin{tabular}{l|r|r|r|r|r}
\hline
Expand All @@ -38,12 +38,12 @@
\end{tabular}


\end{knitrout}

% FIGURE SOURCE: Use picture created in rsrc/monotone_trafo.R
\includegraphics[width = \textwidth]{figure/cart_splitcomp_1}
\column{0.49\textwidth}
Data with log-transformed $x$
\begin{knitrout}\scriptsize
\scriptsize
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}
\begin{tabular}{l|r|r|r|r|r}
\hline
Expand All @@ -54,7 +54,7 @@
\end{tabular}


\end{knitrout}

% FIGURE SOURCE: Use picture created in rsrc/monotone_trafo.R
\includegraphics[width = \textwidth]{figure/cart_splitcomp_2}
\end{columns}
Expand All @@ -68,10 +68,10 @@
$$x_j \in \{a,b,c\} \leftarrow \Np \rightarrow x_j \in \{d,e\} $$
\end{itemize}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/tree-categorical.pdf}
\includegraphics[width=0.8\textwidth]{figure/tree-categorical.pdf}
\end{figure}
\end{vbframe}

\begin{vbframe}{Categorical Features}
\begin{itemize}
\item A split on a categorical feature partitions the feature levels:
Expand All @@ -83,7 +83,7 @@
\end{itemize}

\end{vbframe}

\begin{frame}{Categorical Features}

For $0-1$ responses, in each node:
Expand All @@ -94,7 +94,7 @@
\begin{columns}
\begin{column}{0.33\textwidth}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary1.pdf}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary1.pdf}
\end{figure}
\end{column}
\begin{column}{0.33\textwidth}
Expand All @@ -117,12 +117,12 @@
\begin{columns}
\begin{column}{0.33\textwidth}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary1.pdf}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary1.pdf}
\end{figure}
\end{column}
\begin{column}{0.33\textwidth}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary2.pdf}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary2.pdf}
\end{figure}
\end{column}
\begin{column}{0.33\textwidth}
Expand All @@ -142,17 +142,17 @@
\begin{columns}
\begin{column}{0.33\textwidth}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary1.pdf}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary1.pdf}
\end{figure}
\end{column}
\begin{column}{0.33\textwidth}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary2.pdf}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary2.pdf}
\end{figure}
\end{column}
\begin{column}{0.33\textwidth}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary3.pdf}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-binary3.pdf}
\end{figure}
\end{column}
\end{columns}
Expand Down Expand Up @@ -195,17 +195,17 @@
\begin{columns}
\begin{column}{0.33\textwidth}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-cont1.pdf}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-cont1.pdf}
\end{figure}
\end{column}
\begin{column}{0.33\textwidth}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-cont2.pdf}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-cont2.pdf}
\end{figure}
\end{column}
\begin{column}{0.33\textwidth}
\begin{figure}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-cont3.pdf}
\includegraphics[width=0.8\textwidth]{figure/categoryplot-cont3.pdf}
\end{figure}
\end{column}
\end{columns}
Expand All @@ -215,11 +215,11 @@

\begin{vbframe}{Missing feature values}
\begin{itemize}
\item When splits are evaluated, only observations for which the used feature is not missing are used. (This can actually bias splits towards using features with lots of missing values.)
\item When splits are evaluated, only observations for which the used feature is not missing are used. (This can actually bias splits towards using features with lots of missing values.)
\item \textbf{Surrogate splits} can deal with missing values during prediction.
\item Surrogate splits are created during training. They define replacement splitting rules, using a different feature, that result in almost the same child nodes as the original split.
\item When observations are passed down the tree, % (in training or prediction),
and the feature value used in a split is missing, we use the surrogate split instead to decide to which child the data should be assigned.
\item When observations are passed down the tree, % (in training or prediction),
and the feature value used in a split is missing, we use the surrogate split instead to decide to which child the data should be assigned.
\end{itemize}
\end{vbframe}

Expand All @@ -228,7 +228,7 @@
\item Each surrogate split is a decision stump that tries to learn the actual splitting rule
\item Consider this tree with the primary split w.r.t. \texttt{Sepal.Length} where we perform binary classification (\texttt{setosa} vs. \texttt{virginica}):
\begin{figure}
\includegraphics[width=0.75\textwidth]{figure/tree-binary.pdf}
\includegraphics[width=0.75\textwidth]{figure/tree-binary.pdf}
\end{figure}
\item Our surrogate split should optimize a splitting criterion w.r.t. \texttt{Sepal.Length < 5.8}
\end{itemize}
Expand All @@ -245,25 +245,25 @@
\centering
\begin{tabular}{rrrrrll}
\hline
& Sepal.Length & ... & Petal.Width & Species & Sepal.Length $<$ 5.8 \\
& Sepal.Length & ... & Petal.Width & Species & Sepal.Length $<$ 5.8 \\
\hline
1 & 5.10 & ... & 0.20 & setosa & TRUE \\
4 & 4.60 & ... & 0.20 & setosa & TRUE \\
9 & 4.40 & ... & 0.20 & setosa & TRUE \\
15 & 5.80 & ... & 0.20 & setosa & FALSE \\
18 & 5.10 & ... & 0.30 & setosa & TRUE \\
52 & 5.80 & ... & 1.90 & virginica & FALSE \\
57 & 4.90 & ... & 1.70 & virginica & TRUE \\
62 & 6.40 & ... & 1.90 & virginica & FALSE \\
77 & 6.20 & ... & 1.80 & virginica & FALSE \\
99 & 6.20 & ... & 2.30 & virginica & FALSE \\
1 & 5.10 & ... & 0.20 & setosa & TRUE \\
4 & 4.60 & ... & 0.20 & setosa & TRUE \\
9 & 4.40 & ... & 0.20 & setosa & TRUE \\
15 & 5.80 & ... & 0.20 & setosa & FALSE \\
18 & 5.10 & ... & 0.30 & setosa & TRUE \\
52 & 5.80 & ... & 1.90 & virginica & FALSE \\
57 & 4.90 & ... & 1.70 & virginica & TRUE \\
62 & 6.40 & ... & 1.90 & virginica & FALSE \\
77 & 6.20 & ... & 1.80 & virginica & FALSE \\
99 & 6.20 & ... & 2.30 & virginica & FALSE \\
\hline
\end{tabular}
\end{table}
\item Add column that indicates whether \texttt{Sepal.Length < 5.8}
%\item As this splitting rule is very good, we will have many instances where \texttt{Sepal.Length < 5.8} is \texttt{TRUE} and \texttt{Species} is \texttt{setosa}
\item Fit tree of depth 1 using all features but \texttt{Sepal.Length} %used
to derive a split that explains
\item Fit tree of depth 1 using all features but \texttt{Sepal.Length} %used
to derive a split that explains
\texttt{Sepal.Length < 5.8} best $\Rightarrow$ surrogate split
\item Typically, software stores the best and a few more surrogate splits
%\item A good surrogate tries to mimic the primary split this way
Expand Down
28 changes: 14 additions & 14 deletions slides/cart/slides-cart-treegrowing.tex
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

\newcommand{\titlefigure}{figure_man/tree_depth1_structure.png}
\newcommand{\learninggoals}{
\item Understand how a tree is grown by an exhaustive search
\item Understand how a tree is grown by an exhaustive search
\item Know where and how the split point is set }

\title{Introduction to Machine Learning}
Expand All @@ -28,15 +28,15 @@
\item We start with an empty tree, a root node that contains all the data.\\
Trees are then grown by recursively applying \textbf{greedy} optimization to each node $\Np$.

\item Greedy means we do an \textbf{exhaustive search}: Ideally, all possible splits of $\Np$ on all possible points $t$ for all features $x_j$ are compared in terms of their empirical risk $\risk(\Np, j, t)$.
\item Greedy means we do an \textbf{exhaustive search}: Ideally, all possible splits of $\Np$ on all possible points $t$ for all features $x_j$ are compared in terms of their empirical risk $\risk(\Np, j, t)$.

\item The training data is then distributed to child nodes according to the optimal split and the procedure is repeated in the child nodes.

\end{itemize}

\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}

{\centering \includegraphics[width=0.65\textwidth]{figure/tree-classif-depth1-ann.pdf}
{\centering \includegraphics[width=0.65\textwidth]{figure/tree-classif-depth1-ann.pdf}

}

Expand All @@ -52,7 +52,7 @@

\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}

{\centering \includegraphics[width=0.95\textwidth]{figure/tree-classif-depth1.pdf}
{\centering \includegraphics[width=0.95\textwidth]{figure/tree-classif-depth1.pdf}

}

Expand All @@ -62,13 +62,13 @@

\begin{enumerate}[3]
\item Proceed recursively for each child node:
%Iterate over all features, and for each feature over all possible split points.
%Iterate over all features, and for each feature over all possible split points.
Select best split and divide data from parent node into left and right child nodes.
\end{enumerate}

\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}

{\centering \includegraphics[width=0.95\textwidth]{figure/tree-classif-depth2.pdf}
{\centering \includegraphics[width=0.95\textwidth]{figure/tree-classif-depth2.pdf}

}

Expand All @@ -81,7 +81,7 @@

\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}

{\centering \includegraphics[width=0.95\textwidth]{figure/tree-classif-depth3.pdf}
{\centering \includegraphics[width=0.95\textwidth]{figure/tree-classif-depth3.pdf}

}

Expand All @@ -90,7 +90,7 @@


\begin{vbframe}{Split placement}
\begin{knitrout}\scriptsize
\scriptsize
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}

{\centering \includegraphics[width=0.5\textwidth]{figure/split_point.pdf}
Expand All @@ -99,7 +99,7 @@



\end{knitrout}

\lz
Splits are usually placed at the mid-point of the observations they split: the large margin to the next closest observations makes better generalization on new, unseen data more likely.
\end{vbframe}
Expand Down Expand Up @@ -131,7 +131,7 @@
\begin{itemize}
\item We take the split with lowest MCE: \texttt{Sepal.Length} = $5.5$
\item In real life, we actually search over many more splitting points.
Common strategies involve: a) Searching over all possible split points (exhaustive search), b) searching quantile-wise
Common strategies involve: a) Searching over all possible split points (exhaustive search), b) searching quantile-wise
\item MCE is rarely used, we will cover split criteria in detail later.
%\item We will introduce additional (better) criteria soon
\end{itemize}
Expand Down Expand Up @@ -161,19 +161,19 @@
% rownames(ordered.design) = NULL
% kable(ordered.design, digits = 3)
% @
%
%
% \hspace{0.5cm}
% \column{0.7\textwidth}
% % FIGURE SOURCE: No source
% \includegraphics[height = 0.55\textheight]{figure_man/regression_tree}
% \end{columns}
% \vspace{0.5cm}
% Data points (red) were generated from the underlying function (black):
%
%
% $ sin(4x - 4) * (2x - 2)^2 * sin(20x -4) $
%
%
% % \framebreak
%
%
% % BB: doesnt seem too useful to show this, nothing really new in here
% % <<fig.height=5>>=
% % regr.task = makeRegrTask(data = design, target = "y")
Expand Down
Loading
Loading