diff --git a/chapter_notes/BDA_notes_ch13.pdf b/chapter_notes/BDA_notes_ch13.pdf new file mode 100644 index 00000000..b2aa1a6a Binary files /dev/null and b/chapter_notes/BDA_notes_ch13.pdf differ diff --git a/chapter_notes/BDA_notes_ch13.tex b/chapter_notes/BDA_notes_ch13.tex new file mode 100644 index 00000000..9a28571d --- /dev/null +++ b/chapter_notes/BDA_notes_ch13.tex @@ -0,0 +1,138 @@ +\documentclass[a4paper,11pt,english]{article} + +\usepackage{babel} +\usepackage[latin1]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{times} +\usepackage{amsmath} +\usepackage{microtype} +\usepackage{url} +\urlstyle{same} + +\usepackage[bookmarks=false]{hyperref} +\hypersetup{% + bookmarksopen=true, + bookmarksnumbered=true, + pdftitle={Bayesian data analysis}, + pdfsubject={Comments}, + pdfauthor={Aki Vehtari}, + pdfkeywords={Bayesian probability theory, Bayesian inference, Bayesian data analysis}, + pdfstartview={FitH -32768} +} + + +% if not draft, smaller printable area makes the paper more readable +\topmargin -4mm +\oddsidemargin 0mm +\textheight 225mm +\textwidth 160mm + +%\parskip=\baselineskip +\def\eff{\mathrm{rep}} + +\DeclareMathOperator{\E}{E} +\DeclareMathOperator{\Var}{Var} +\DeclareMathOperator{\var}{var} +\DeclareMathOperator{\Sd}{Sd} +\DeclareMathOperator{\sd}{sd} +\DeclareMathOperator{\Bin}{Bin} +\DeclareMathOperator{\Beta}{Beta} +\DeclareMathOperator{\Invchi2}{Inv-\chi^2} +\DeclareMathOperator{\NInvchi2}{N-Inv-\chi^2} +\DeclareMathOperator{\logit}{logit} +\DeclareMathOperator{\N}{N} +\DeclareMathOperator{\U}{U} +\DeclareMathOperator{\tr}{tr} +%\DeclareMathOperator{\Pr}{Pr} +\DeclareMathOperator{\trace}{trace} +\DeclareMathOperator{\rep}{\mathrm{rep}} + +\pagestyle{empty} + +\begin{document} +\thispagestyle{empty} + +\section*{Bayesian data analysis -- reading instructions 13} +\smallskip +{\bf Aki Vehtari} +\smallskip + +\subsection*{Chapter 13: Modal and distributional approximations} + +Chapter 4 presented normal distribution approximation at the mode (aka +Laplace approximation). Chapter 13 discusses more about distributional +approximations. + +Outline of the chapter 13 +\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt} +\item[13.1] Finding posterior modes + \begin{itemize} + \item[-] Newton's method is very fast if the distribution is close to + normal and the computation of the second derivatives is fast + \item[-] Stan uses limited-memory Broyden-Fletcher-Goldfarb-Shannon + (L-BFGS) which is a quasi-Newton method which needs only the first + derivatives (provided by Stan autodiff). L-BFGS is known for good + performance for wide variety of functions. + \end{itemize} +\item[13.2] Boundary-avoiding priors for modal summaries + \begin{itemize} + \item[-] Although full integration is preferred, sometimes optimization + of some parameters may be sufficient and faster, and then + boundary-avoiding priors maybe useful. + \end{itemize} +\item[13.3] Normal and related mixture approximations + \begin{itemize} + \item[-] Discusses how the normal approximation can be used to + approximate integrals of a a smooth function times the posterior. + \item[-] Discusses mixture and $t$ approximations. + \end{itemize} +\item[13.4] Finding marginal posterior modes using EM + \begin{itemize} + \item[-] Expectation maximization is less important in the time of + efficient probabilistic programming frameworks, but can be + sometimes useful for extra efficiency. + \end{itemize} +\item[13.5] Conditional and marginal posterior approximations + \begin{itemize} + \item[-] Even in the time of efficient probabilistic programming, the + methods discussed in this section can produce very big speedups + for a big set of commonly used models. The methods discussed are + important part of popular INLA software and are coming also to + Stan to speedup latent Gaussian variable models. + \end{itemize} +\item[13.6] Example: hierarchical normal model +\item[13.7] Variational inference + \begin{itemize} + \item[-] Variational inference (VI) is very popular in machine + learning, and this section presents it in terms of BDA. Auto-diff + variational inference in Stan was developed after BDA3 was + published. + \end{itemize} +\item[13.8] Expectation propagation + \begin{itemize} + \item[-] Practical efficient computation for expectation propagation + (EP) is applicable for more limited set of models than post-BDA3 + black-box VI, but for those models EP provides better posterior + approximation. Variants of EP can be used for parallelization of + any Bayesian computation for hierarchical models. + \end{itemize} +\item[13.9] Other approximations + \begin{itemize} + \item[-] Just brief mentions of INLA (uses methods discussed in 13.5), + CCD (deterministic adaptive quadrature approach) and ABC + (inference when you can only sample from the generative model). + \end{itemize} +\item[13.10] Unknown normalizing factors + \begin{itemize} + \item[-] Often the normalizing factor is not needed, but it can be + estimated using importance, bridge or path sampling. + \end{itemize} +\end{list} + + +\end{document} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: diff --git a/chapter_notes/BDA_notes_ch8.pdf b/chapter_notes/BDA_notes_ch8.pdf new file mode 100644 index 00000000..91db54a5 Binary files /dev/null and b/chapter_notes/BDA_notes_ch8.pdf differ diff --git a/chapter_notes/BDA_notes_ch8.tex b/chapter_notes/BDA_notes_ch8.tex new file mode 100644 index 00000000..df9af219 --- /dev/null +++ b/chapter_notes/BDA_notes_ch8.tex @@ -0,0 +1,123 @@ +\documentclass[a4paper,11pt,english]{article} + +\usepackage{babel} +\usepackage[latin1]{inputenc} +\usepackage[T1]{fontenc} +% \usepackage[T1,mtbold,lucidacal,mtplusscr,subscriptcorrection]{mathtime} +\usepackage{times} +\usepackage{amsmath} +\usepackage{microtype} +\usepackage{url} +\urlstyle{same} + +\usepackage[bookmarks=false]{hyperref} +\hypersetup{% + bookmarksopen=true, + bookmarksnumbered=true, + pdftitle={Bayesian data analysis}, + pdfsubject={Comments}, + pdfauthor={Aki Vehtari}, + pdfkeywords={Bayesian probability theory, Bayesian inference, Bayesian data analysis}, + pdfstartview={FitH -32768} +} + + +% if not draft, smaller printable area makes the paper more readable +\topmargin -4mm +\oddsidemargin 0mm +\textheight 225mm +\textwidth 160mm + +%\parskip=\baselineskip +\def\eff{\mathrm{rep}} + +\DeclareMathOperator{\E}{E} +\DeclareMathOperator{\Var}{Var} +\DeclareMathOperator{\var}{var} +\DeclareMathOperator{\Sd}{Sd} +\DeclareMathOperator{\sd}{sd} +\DeclareMathOperator{\Bin}{Bin} +\DeclareMathOperator{\Beta}{Beta} +\DeclareMathOperator{\Invchi2}{Inv-\chi^2} +\DeclareMathOperator{\NInvchi2}{N-Inv-\chi^2} +\DeclareMathOperator{\logit}{logit} +\DeclareMathOperator{\N}{N} +\DeclareMathOperator{\U}{U} +\DeclareMathOperator{\tr}{tr} +%\DeclareMathOperator{\Pr}{Pr} +\DeclareMathOperator{\trace}{trace} +\DeclareMathOperator{\rep}{\mathrm{rep}} + +\pagestyle{empty} + +\begin{document} +\thispagestyle{empty} + +\section*{Bayesian data analysis -- reading instructions 8} +\smallskip +{\bf Aki Vehtari} +\smallskip + +\subsection*{Chapter 8} + +In the earlier chapters it was assumed that the data collection is +ignorable. Chapter 8 explains when data collection can be ignorable +and when we need to model also the data collection. +We don't have time to go through chapter 8 in BDA course at Aalto, but +it is highly recommended that you would read it in the end or after +the course. Most important parts are 8.1, 8.5, pp 220--222 of 8.6, and +8.8, and you can get back to the other sections later. + +Outline of the chapter 8 (* denotes the most important parts) +\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt} +\item 8.1 Bayesian inference requires a model for data collection (*) +\item 8.2 Data-collection models and ignorability +\item 8.3 Sample surveys +\item 8.4 Designed experiments +\item 8.5 Sensitivity and the role of randomization (*) +\item 8.6 Observational studies (* pp 220--222) +\item 8.7 Censoring and truncation (*) +\end{list} + +Most important terms in the chapter +\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt} +\item observed data +\item complete data +\item missing data +\item stability assumption +\item data model +\item inclusion model +\item complete data likelihood +\item observed data likelihood +\item finite-population and superpopulation inference +\item ignorability +\item ignorable designs +\item propensity score +\item sample surveys +\item random sampling of a finite population +\item stratified sampling +\item cluster sampling +\item designed experiments +\item complete randomization +\item randomized blocks and latin squares +\item sequntial designs +\item randomization given covariates +\item observational studies +\item censoring +\item truncation +\item missing completely at random +\end{list} + +% Gelman: ``All contexts where the model is fit to data that are not +% necessarily representative of the population that is the target of +% study. The key idea is to include in the Bayesian model an inclusion +% variable with a probability distribution that represents the process +% by which data become observed.'' + +\end{document} + + +%%% Local Variables: +%%% TeX-PDF-mode: t +%%% TeX-master: t +%%% End: