-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathsampler-problem.Rmd
108 lines (76 loc) · 3.06 KB
/
sampler-problem.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
title: A problem with the predictive sampler in R-INLA
author: Stefan Siegert
date: November 2017
layout: default
---
## Update:
**The behavior described here is due to a bug in INLA which has been fixed in version 17.11.11.**
(see also the [thread on the INLA discussion group](https://groups.google.com/forum/#!topic/r-inla-discussion-group/y7dAzbrXb6o))
## Predicting a random walk
```{r}
suppressPackageStartupMessages(library(INLA))
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(tidyverse))
knitr::opts_chunk$set(
cache.path='_knitr_cache/sampler-problem/',
fig.path='figure/sampler-problem/'
)
```
```{r}
inla.version()
```
This example is to illustrate a problem with the posterior predictive sampling algorithm in R-INLA.
```{r rw1, fig.height=5}
set.seed(321)
n = 130
inla_data = data_frame(i = 1:n, y = cumsum(rnorm(n, 0, .01))) %>%
mutate(y = ifelse(i <= 100, y, NA_real_))
ggplot(data=inla_data, aes(x=i, y=y)) + geom_line()
```
```{r inla-rw1, fig.width=10, fig.height=10}
# run inla
inla_formula = y ~ f(i, model='rw1')
inla_result = inla(formula=inla_formula, data=inla_data, family='gaussian',
control.family=list(initial=12, fixed=TRUE),
control.compute=list(config=TRUE))
# posterior predictive samples
n_sampls = 50
set.seed(321)
inla_sampls = inla.posterior.sample(n=n_sampls, result=inla_result, seed=123)
# extract "Predictor" output
i_pred = str_c('Predictor:', str_pad(1:n, 3, 'left', '0'))
inla_sampls = inla_sampls %>%
setNames(1:n_sampls) %>%
map_df( ~ .x$latent[i_pred,1]) %>%
mutate(i = 1:n) %>%
gather(key='sample', value='y', -i) %>%
mutate_if(is.character, as.integer)
# plot predictive samples
ggplot(data=inla_sampls, aes(x=i, y=y, group=sample)) + geom_line(aes(colour=sample)) +
scale_colour_continuous(type='viridis') + coord_cartesian(xlim=c(90, n))
```
The predictive samples are highly correlated.
A lot of thinning will be necessary to get pseudo-independent samples.
I am wondering if there are options I could set to improve the sampler?
## Avoiding the `seed` argument in `inla.posterior.sample`
The near-perfect correlation of the posterior predictive samples seems to disappear when I remove the `seed` argument in `inla.posterior.sample`:
```{r inla-rw1-noseed, fig.width=10, fig.height=10}
# posterior predictive samples
n_sampls = 50
set.seed(321)
inla_sampls_noseed = inla.posterior.sample(n=n_sampls, result=inla_result)
# extract "Predictor" output
i_pred = str_c('Predictor:', str_pad(1:n, 3, 'left', '0'))
inla_sampls_noseed = inla_sampls_noseed %>%
setNames(1:n_sampls) %>%
map_df( ~ .x$latent[i_pred,1]) %>%
mutate(i = 1:n) %>%
gather(key='sample', value='y', -i) %>%
mutate_if(is.character, as.integer)
# plot predictive samples
ggplot(data=inla_sampls_noseed, aes(x=i, y=y, group=sample)) + geom_line(aes(colour=sample)) +
scale_colour_continuous(type='viridis') + coord_cartesian(xlim=c(90, n))
```
This looks much better.
The high correlation is probably due to a bug with how the `seed` argument is used internally.