diff --git a/chapter_notes/BDA_notes_ch13.pdf b/chapter_notes/BDA_notes_ch13.pdf
new file mode 100644
index 00000000..b2aa1a6a
Binary files /dev/null and b/chapter_notes/BDA_notes_ch13.pdf differ
diff --git a/chapter_notes/BDA_notes_ch13.tex b/chapter_notes/BDA_notes_ch13.tex
new file mode 100644
index 00000000..9a28571d
--- /dev/null
+++ b/chapter_notes/BDA_notes_ch13.tex
@@ -0,0 +1,138 @@
+\documentclass[a4paper,11pt,english]{article}
+
+\usepackage{babel}
+\usepackage[latin1]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{times}
+\usepackage{amsmath}
+\usepackage{microtype}
+\usepackage{url}
+\urlstyle{same}
+
+\usepackage[bookmarks=false]{hyperref}
+\hypersetup{%
+  bookmarksopen=true,
+  bookmarksnumbered=true,
+  pdftitle={Bayesian data analysis},
+  pdfsubject={Comments},
+  pdfauthor={Aki Vehtari},
+  pdfkeywords={Bayesian probability theory, Bayesian inference, Bayesian data analysis},
+  pdfstartview={FitH -32768}
+}
+
+
+% if not draft, smaller printable area makes the paper more readable
+\topmargin -4mm
+\oddsidemargin 0mm
+\textheight 225mm
+\textwidth 160mm
+
+%\parskip=\baselineskip
+\def\eff{\mathrm{rep}}
+
+\DeclareMathOperator{\E}{E}
+\DeclareMathOperator{\Var}{Var}
+\DeclareMathOperator{\var}{var}
+\DeclareMathOperator{\Sd}{Sd}
+\DeclareMathOperator{\sd}{sd}
+\DeclareMathOperator{\Bin}{Bin}
+\DeclareMathOperator{\Beta}{Beta}
+\DeclareMathOperator{\Invchi2}{Inv-\chi^2}
+\DeclareMathOperator{\NInvchi2}{N-Inv-\chi^2}
+\DeclareMathOperator{\logit}{logit}
+\DeclareMathOperator{\N}{N}
+\DeclareMathOperator{\U}{U}
+\DeclareMathOperator{\tr}{tr}
+%\DeclareMathOperator{\Pr}{Pr}
+\DeclareMathOperator{\trace}{trace}
+\DeclareMathOperator{\rep}{\mathrm{rep}}
+
+\pagestyle{empty}
+
+\begin{document}
+\thispagestyle{empty}
+
+\section*{Bayesian data analysis -- reading instructions 13} 
+\smallskip
+{\bf Aki Vehtari}
+\smallskip
+
+\subsection*{Chapter 13: Modal and distributional approximations}
+
+Chapter 4 presented normal distribution approximation at the mode (aka
+Laplace approximation). Chapter 13 discusses more about distributional
+approximations.
+
+Outline of the chapter 13
+\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
+\item[13.1] Finding posterior modes
+  \begin{itemize}
+  \item[-] Newton's method is very fast if the distribution is close to
+    normal and the computation of the second derivatives is fast
+  \item[-] Stan uses limited-memory Broyden-Fletcher-Goldfarb-Shannon
+    (L-BFGS) which is a quasi-Newton method which needs only the first
+    derivatives (provided by Stan autodiff). L-BFGS is known for good
+    performance for wide variety of functions.
+  \end{itemize}
+\item[13.2] Boundary-avoiding priors for modal summaries
+  \begin{itemize}
+  \item[-] Although full integration is preferred, sometimes optimization
+    of some parameters may be sufficient and faster, and then
+    boundary-avoiding priors maybe useful.
+  \end{itemize}
+\item[13.3] Normal and related mixture approximations
+  \begin{itemize}
+  \item[-] Discusses how the normal approximation can be used to
+    approximate integrals of a a smooth function times the posterior.
+  \item[-] Discusses mixture and $t$ approximations.
+  \end{itemize}
+\item[13.4] Finding marginal posterior modes using EM
+  \begin{itemize}
+  \item[-] Expectation maximization is less important in the time of
+    efficient probabilistic programming frameworks, but can be
+    sometimes useful for extra efficiency.
+  \end{itemize}
+\item[13.5] Conditional and marginal posterior approximations
+  \begin{itemize}
+  \item[-] Even in the time of efficient probabilistic programming, the
+    methods discussed in this section can produce very big speedups
+    for a big set of commonly used models. The methods discussed are
+    important part of popular INLA software and are coming also to
+    Stan to speedup latent Gaussian variable models.
+  \end{itemize}
+\item[13.6] Example: hierarchical normal model
+\item[13.7] Variational inference
+  \begin{itemize}
+  \item[-] Variational inference (VI) is very popular in machine
+    learning, and this section presents it in terms of BDA. Auto-diff
+    variational inference in Stan was developed after BDA3 was
+    published. 
+  \end{itemize}
+\item[13.8] Expectation propagation
+  \begin{itemize}
+  \item[-] Practical efficient computation for expectation propagation
+    (EP) is applicable for more limited set of models than post-BDA3
+    black-box VI, but for those models EP provides better posterior
+    approximation. Variants of EP can be used for parallelization of
+    any Bayesian computation for hierarchical models.
+  \end{itemize}
+\item[13.9] Other approximations
+  \begin{itemize}
+  \item[-] Just brief mentions of INLA (uses methods discussed in 13.5),
+    CCD (deterministic adaptive quadrature approach) and ABC
+    (inference when you can only sample from the generative model).
+  \end{itemize}
+\item[13.10] Unknown normalizing factors
+  \begin{itemize}
+  \item[-] Often the normalizing factor is not needed, but it can be
+    estimated using importance, bridge or path sampling.
+  \end{itemize}
+\end{list}
+
+
+\end{document}
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% End:
diff --git a/chapter_notes/BDA_notes_ch8.pdf b/chapter_notes/BDA_notes_ch8.pdf
new file mode 100644
index 00000000..91db54a5
Binary files /dev/null and b/chapter_notes/BDA_notes_ch8.pdf differ
diff --git a/chapter_notes/BDA_notes_ch8.tex b/chapter_notes/BDA_notes_ch8.tex
new file mode 100644
index 00000000..df9af219
--- /dev/null
+++ b/chapter_notes/BDA_notes_ch8.tex
@@ -0,0 +1,123 @@
+\documentclass[a4paper,11pt,english]{article}
+
+\usepackage{babel}
+\usepackage[latin1]{inputenc}
+\usepackage[T1]{fontenc}
+% \usepackage[T1,mtbold,lucidacal,mtplusscr,subscriptcorrection]{mathtime}
+\usepackage{times}
+\usepackage{amsmath}
+\usepackage{microtype}
+\usepackage{url}
+\urlstyle{same}
+
+\usepackage[bookmarks=false]{hyperref}
+\hypersetup{%
+  bookmarksopen=true,
+  bookmarksnumbered=true,
+  pdftitle={Bayesian data analysis},
+  pdfsubject={Comments},
+  pdfauthor={Aki Vehtari},
+  pdfkeywords={Bayesian probability theory, Bayesian inference, Bayesian data analysis},
+  pdfstartview={FitH -32768}
+}
+
+
+% if not draft, smaller printable area makes the paper more readable
+\topmargin -4mm
+\oddsidemargin 0mm
+\textheight 225mm
+\textwidth 160mm
+
+%\parskip=\baselineskip
+\def\eff{\mathrm{rep}}
+
+\DeclareMathOperator{\E}{E}
+\DeclareMathOperator{\Var}{Var}
+\DeclareMathOperator{\var}{var}
+\DeclareMathOperator{\Sd}{Sd}
+\DeclareMathOperator{\sd}{sd}
+\DeclareMathOperator{\Bin}{Bin}
+\DeclareMathOperator{\Beta}{Beta}
+\DeclareMathOperator{\Invchi2}{Inv-\chi^2}
+\DeclareMathOperator{\NInvchi2}{N-Inv-\chi^2}
+\DeclareMathOperator{\logit}{logit}
+\DeclareMathOperator{\N}{N}
+\DeclareMathOperator{\U}{U}
+\DeclareMathOperator{\tr}{tr}
+%\DeclareMathOperator{\Pr}{Pr}
+\DeclareMathOperator{\trace}{trace}
+\DeclareMathOperator{\rep}{\mathrm{rep}}
+
+\pagestyle{empty}
+
+\begin{document}
+\thispagestyle{empty}
+
+\section*{Bayesian data analysis -- reading instructions 8} 
+\smallskip
+{\bf Aki Vehtari}
+\smallskip
+
+\subsection*{Chapter 8}
+
+In the earlier chapters it was assumed that the data collection is
+ignorable. Chapter 8 explains when data collection can be ignorable
+and when we need to model also the data collection.
+We don't have time to go through chapter 8 in BDA course at Aalto, but
+it is highly recommended that you would read it in the end or after
+the course. Most important parts are 8.1, 8.5, pp 220--222 of 8.6, and
+8.8, and you can get back to the other sections later.
+
+Outline of the chapter 8 (* denotes the most important parts)
+\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
+\item 8.1 Bayesian inference requires a model for data collection  (*)
+\item 8.2 Data-collection models and ignorability
+\item 8.3 Sample surveys
+\item 8.4 Designed experiments
+\item 8.5 Sensitivity and the role of randomization (*)
+\item 8.6 Observational studies (* pp 220--222)
+\item 8.7 Censoring and truncation (*)
+\end{list}
+
+Most important terms in the chapter
+\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
+\item observed data
+\item complete data
+\item missing data
+\item stability assumption
+\item data model
+\item inclusion model
+\item complete data likelihood
+\item observed data likelihood
+\item finite-population and superpopulation inference
+\item ignorability
+\item ignorable designs
+\item propensity score
+\item sample surveys
+\item random sampling of a finite population
+\item stratified sampling
+\item cluster sampling
+\item designed experiments
+\item complete randomization
+\item randomized blocks and latin squares
+\item sequntial designs
+\item randomization given covariates
+\item observational studies
+\item censoring
+\item truncation
+\item missing completely at random
+\end{list}
+
+% Gelman: ``All contexts where the model is fit to data that are not
+% necessarily representative of the population that is the target of
+% study. The key idea is to include in the Bayesian model an inclusion
+% variable with a probability distribution that represents the process
+% by which data become observed.''
+
+\end{document}
+
+
+%%% Local Variables: 
+%%% TeX-PDF-mode: t
+%%% TeX-master: t
+%%% End: