-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy pathHow_to_Evaluate_Models.html
executable file
·255 lines (229 loc) · 51.3 KB
/
How_to_Evaluate_Models.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="Robert Kubinec" />
<meta name="date" content="2018-10-30" />
<title>How to Evaluate Models</title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20both%3B%0Amargin%3A%200%200%2010px%2010px%3B%0Apadding%3A%204px%3B%0Awidth%3A%20400px%3B%0Aborder%3A%201px%20solid%20%23CCCCCC%3B%0Aborder%2Dradius%3A%205px%3B%0Abackground%2Dcolor%3A%20%23f6f6f6%3B%0Afont%2Dsize%3A%2013px%3B%0Aline%2Dheight%3A%201%2E3%3B%0A%7D%0A%23TOC%20%2Etoctitle%20%7B%0Afont%2Dweight%3A%20bold%3B%0Afont%2Dsize%3A%2015px%3B%0Amargin%2Dleft%3A%205px%3B%0A%7D%0A%23TOC%20ul%20%7B%0Apadding%2Dleft%3A%2040px%3B%0Amargin%2Dleft%3A%20%2D1%2E5em%3B%0Amargin%2Dtop%3A%205px%3B%0Amargin%2Dbottom%3A%205px%3B%0A%7D%0A%23TOC%20ul%20ul%20%7B%0Amargin%2Dleft%3A%20%2D2em%3B%0A%7D%0A%23TOC%20li%20%7B%0Aline%2Dheight%3A%2016px%3B%0A%7D%0Atable%20%7B%0Amargin%3A%201em%20auto%3B%0Aborder%2Dwidth%3A%201px%3B%0Aborder%2Dcolor%3A%20%23DDDDDD%3B%0Aborder%2Dstyle%3A%20outset%3B%0Aborder%2Dcollapse%3A%20collapse%3B%0A%7D%0Atable%20th%20%7B%0Aborder%2Dwidth%3A%202px%3B%0Apadding%3A%205px%3B%0Aborder%2Dstyle%3A%20inset%3B%0A%7D%0Atable%20td%20%7B%0Aborder%2Dwidth%3A%201px%3B%0Aborder%2Dstyle%3A%20inset%3B%0Aline%2Dheight%3A%2018px%3B%0Apadding%3A%205px%205px%3B%0A%7D%0Atable%2C%20table%20th%2C%20table%20td%20%7B%0Aborder%2Dleft%2Dstyle%3A%20none%3B%0Aborder%2Dright%2Dstyle%3A%20none%3B%0A%7D%0Atable%20thead%2C%20table%20tr%2Eeven%20%7B%0Abackground%2Dcolor%3A%20%23f7f7f7%3B%0A%7D%0Ap%20%7B%0Amargin%3A%200%2E5em%200%3B%0A%7D%0Ablockquote%20%7B%0Abackground%2Dcolor%3A%20%23f6f6f6%3B%0Apadding%3A%200%2E25em%200%2E75em%3B%0A%7D%0Ahr%20%7B%0Aborder%2Dstyle%3A%20solid%3B%0Aborder%3A%20none%3B%0Aborder%2Dtop%3A%201px%20solid%20%23777%3B%0Amargin%3A%2028px%200%3B%0A%7D%0Adl%20%7B%0Amargin%2Dleft%3A%200%3B%0A%7D%0Adl%20dd%20%7B%0Amargin%2Dbottom%3A%2013px%3B%0Amargin%2Dleft%3A%2013px%3B%0A%7D%0Adl%20dt%20%7B%0Afont%2Dweight%3A%20bold%3B%0A%7D%0Aul%20%7B%0Amargin%2Dtop%3A%200%3B%0A%7D%0Aul%20li%20%7B%0Alist%2Dstyle%3A%20circle%20outside%3B%0A%7D%0Aul%20ul%20%7B%0Amargin%2Dbottom%3A%200%3B%0A%7D%0Apre%2C%20code%20%7B%0Abackground%2Dcolor%3A%20%23f7f7f7%3B%0Aborder%2Dradius%3A%203px%3B%0Acolor%3A%20%23333%3B%0Awhite%2Dspace%3A%20pre%2Dwrap%3B%20%0A%7D%0Apre%20%7B%0Aborder%2Dradius%3A%203px%3B%0Amargin%3A%205px%200px%2010px%200px%3B%0Apadding%3A%2010px%3B%0A%7D%0Apre%3Anot%28%5Bclass%5D%29%20%7B%0Abackground%2Dcolor%3A%20%23f7f7f7%3B%0A%7D%0Acode%20%7B%0Afont%2Dfamily%3A%20Consolas%2C%20Monaco%2C%20%27Courier%20New%27%2C%20monospace%3B%0Afont%2Dsize%3A%2085%25%3B%0A%7D%0Ap%20%3E%20code%2C%20li%20%3E%20code%20%7B%0Apadding%3A%202px%200px%3B%0A%7D%0Adiv%2Efigure%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0Aimg%20%7B%0Abackground%2Dcolor%3A%20%23FFFFFF%3B%0Apadding%3A%202px%3B%0Aborder%3A%201px%20solid%20%23DDDDDD%3B%0Aborder%2Dradius%3A%203px%3B%0Aborder%3A%201px%20solid%20%23CCCCCC%3B%0Amargin%3A%200%205px%3B%0A%7D%0Ah1%20%7B%0Amargin%2Dtop%3A%200%3B%0Afont%2Dsize%3A%2035px%3B%0Aline%2Dheight%3A%2040px%3B%0A%7D%0Ah2%20%7B%0Aborder%2Dbottom%3A%204px%20solid%20%23f7f7f7%3B%0Apadding%2Dtop%3A%2010px%3B%0Apadding%2Dbottom%3A%202px%3B%0Afont%2Dsize%3A%20145%25%3B%0A%7D%0Ah3%20%7B%0Aborder%2Dbottom%3A%202px%20solid%20%23f7f7f7%3B%0Apadding%2Dtop%3A%2010px%3B%0Afont%2Dsize%3A%20120%25%3B%0A%7D%0Ah4%20%7B%0Aborder%2Dbottom%3A%201px%20solid%20%23f7f7f7%3B%0Amargin%2Dleft%3A%208px%3B%0Afont%2Dsize%3A%20105%25%3B%0A%7D%0Ah5%2C%20h6%20%7B%0Aborder%2Dbottom%3A%201px%20solid%20%23ccc%3B%0Afont%2Dsize%3A%20105%25%3B%0A%7D%0Aa%20%7B%0Acolor%3A%20%230033dd%3B%0Atext%2Ddecoration%3A%20none%3B%0A%7D%0Aa%3Ahover%20%7B%0Acolor%3A%20%236666ff%3B%20%7D%0Aa%3Avisited%20%7B%0Acolor%3A%20%23800080%3B%20%7D%0Aa%3Avisited%3Ahover%20%7B%0Acolor%3A%20%23BB00BB%3B%20%7D%0Aa%5Bhref%5E%3D%22http%3A%22%5D%20%7B%0Atext%2Ddecoration%3A%20underline%3B%20%7D%0Aa%5Bhref%5E%3D%22https%3A%22%5D%20%7B%0Atext%2Ddecoration%3A%20underline%3B%20%7D%0A%0Acode%20%3E%20span%2Ekw%20%7B%20color%3A%20%23555%3B%20font%2Dweight%3A%20bold%3B%20%7D%20%0Acode%20%3E%20span%2Edt%20%7B%20color%3A%20%23902000%3B%20%7D%20%0Acode%20%3E%20span%2Edv%20%7B%20color%3A%20%2340a070%3B%20%7D%20%0Acode%20%3E%20span%2Ebn%20%7B%20color%3A%20%23d14%3B%20%7D%20%0Acode%20%3E%20span%2Efl%20%7B%20color%3A%20%23d14%3B%20%7D%20%0Acode%20%3E%20span%2Ech%20%7B%20color%3A%20%23d14%3B%20%7D%20%0Acode%20%3E%20span%2Est%20%7B%20color%3A%20%23d14%3B%20%7D%20%0Acode%20%3E%20span%2Eco%20%7B%20color%3A%20%23888888%3B%20font%2Dstyle%3A%20italic%3B%20%7D%20%0Acode%20%3E%20span%2Eot%20%7B%20color%3A%20%23007020%3B%20%7D%20%0Acode%20%3E%20span%2Eal%20%7B%20color%3A%20%23ff0000%3B%20font%2Dweight%3A%20bold%3B%20%7D%20%0Acode%20%3E%20span%2Efu%20%7B%20color%3A%20%23900%3B%20font%2Dweight%3A%20bold%3B%20%7D%20%20code%20%3E%20span%2Eer%20%7B%20color%3A%20%23a61717%3B%20background%2Dcolor%3A%20%23e3d2d2%3B%20%7D%20%0A" rel="stylesheet" type="text/css" />
</head>
<body>
<h1 class="title toc-ignore">How to Evaluate Models</h1>
<h4 class="author"><em>Robert Kubinec</em></h4>
<h4 class="date"><em>2018-10-30</em></h4>
<p>A big part of the purpose of <code>idealstan</code> is to give people different options in fitting ideal point models to diverse data. Along with that, <code>idealstan</code> makes use of Bayesian model evaluation a la <a href="https://CRAN.R-project.org/package=loo/">loo</a> and also can analyze the posterior predictive distribution using <a href="http://mc-stan.org/users/interfaces/bayesplot">bayesplot</a>. <code>loo</code> is an approximation of the predictive error of a Bayesian model via leave-one-out cross-validation (LOO-CV). True LOO-CV on Bayesian models is computationally prohibitive because it involves estimating a new model for each data point. For IRT models incorporating thousands or even millions of observations, this is practically infeasible.</p>
<p><code>bayesplot</code> allows us to analyze the data we used to estimate the model compared to data produced by the model, or what is called the posterior predictive distribution. This is very useful as a general summary of model fit to see whether there are values of the outcome that we are over or under predicting.</p>
<p><code>idealstan</code> implements functions for each ideal point model that calculate the log-posterior probability of the data, which is the necessary input to use <code>loo</code>’s model evaluation features. This vignette demonstrates the basic usage. I also discuss best practices for evaluating models.</p>
<p>We first begin by simulating data for a standard IRT 2-PL ideal point model but with strategically missing data:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">irt_2pl <-<span class="st"> </span><span class="kw">id_sim_gen</span>(<span class="dt">inflate=</span><span class="ot">TRUE</span>)</code></pre></div>
<pre><code>## Warning: package 'bindrcpp' was built under R version 3.4.4</code></pre>
<p>We can then fit two ideal point models to the same data, one that uses the correct binomial model for a 0/1 binary outcome, and the second that tries to fit a Poisson count model to this same binary data.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Because of CRAN limitations, only using 2 cores & 2 chains</span>
irt_2pl_correct <-<span class="st"> </span><span class="kw">id_estimate</span>(<span class="dt">idealdata=</span>irt_2pl,
<span class="dt">model_type=</span><span class="dv">2</span>,
<span class="dt">restrict_ind_high =</span> <span class="kw">as.character</span>(<span class="kw">sort</span>(irt_2pl<span class="op">@</span>simul_data<span class="op">$</span>true_person,
<span class="dt">decreasing=</span><span class="ot">TRUE</span>,
<span class="dt">index=</span><span class="ot">TRUE</span>)<span class="op">$</span>ix[<span class="dv">1</span>]),
<span class="dt">restrict_ind_low =</span> <span class="kw">as.character</span>(<span class="kw">sort</span>(irt_2pl<span class="op">@</span>simul_data<span class="op">$</span>true_person,
<span class="dt">decreasing=</span><span class="ot">FALSE</span>,
<span class="dt">index=</span><span class="ot">TRUE</span>)<span class="op">$</span>ix[<span class="dv">1</span>]),
<span class="dt">fixtype=</span><span class="st">'vb_partial'</span>,
<span class="dt">ncores=</span><span class="dv">2</span>,
<span class="dt">nchains=</span><span class="dv">2</span>,
<span class="dt">niters =</span> <span class="dv">500</span>)</code></pre></div>
<pre><code>## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1: 100 -1057.420 1.000 1.000
## Chain 1: 200 -1045.939 0.505 1.000
## Chain 1: 300 -1044.276 0.338 0.011
## Chain 1: 400 -1041.217 0.254 0.011
## Chain 1: 500 -1042.919 0.203 0.003 MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">irt_2pl_incorrect <-<span class="st"> </span><span class="kw">id_estimate</span>(<span class="dt">idealdata=</span>irt_2pl,
<span class="dt">model_type=</span><span class="dv">8</span>,
<span class="dt">restrict_ind_high =</span> <span class="kw">as.character</span>(<span class="kw">sort</span>(irt_2pl<span class="op">@</span>simul_data<span class="op">$</span>true_person,
<span class="dt">decreasing=</span><span class="ot">TRUE</span>,
<span class="dt">index=</span><span class="ot">TRUE</span>)<span class="op">$</span>ix[<span class="dv">1</span>]),
<span class="dt">restrict_ind_low =</span> <span class="kw">as.character</span>(<span class="kw">sort</span>(irt_2pl<span class="op">@</span>simul_data<span class="op">$</span>true_person,
<span class="dt">decreasing=</span><span class="ot">FALSE</span>,
<span class="dt">index=</span><span class="ot">TRUE</span>)<span class="op">$</span>ix[<span class="dv">1</span>]),
<span class="dt">fixtype=</span><span class="st">'vb_partial'</span>,
<span class="dt">ncores=</span><span class="dv">2</span>,
<span class="dt">nchains=</span><span class="dv">2</span>,
<span class="dt">niters=</span><span class="dv">500</span>)</code></pre></div>
<pre><code>## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1: 100 -4154172.311 1.000 1.000
## Chain 1: 200 -462000.315 4.496 7.992
## Chain 1: 300 -1605796.824 3.235 1.000
## Chain 1: 400 -9960842.290 2.636 1.000
## Chain 1: 500 -3660771.278 2.453 1.000
## Chain 1: 600 -208653028151.058 2.211 1.000
## Chain 1: 700 -871621239.323 35.950 1.000
## Chain 1: 800 -25862319.457 35.544 1.721
## Chain 1: 900 -47171459.169 31.645 1.000
## Chain 1: 1000 -13591951.336 28.727 1.721
## Chain 1: 1100 -1871355.503 29.254 2.471 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1200 -1104070282.330 28.554 1.721 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1300 -92502391119.340 28.582 1.721 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1400 -563994.875 16429.680 2.471 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1500 -49620069.571 16429.607 2.471 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1600 -8661.581 17002.282 6.263 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1700 -1810.391 16978.822 3.784 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1800 -2552.708 16975.581 2.471 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1900 -1682.214 16975.588 2.471 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 2000 -1659.393 16975.342 0.998 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 2100 -1588.202 16974.720 0.989 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 2200 -1558.193 16974.622 0.988 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 2300 -1547.898 16974.524 0.517 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 2400 -1546.471 573.342 0.291 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 2500 -1547.006 573.243 0.045 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 2600 -1548.441 0.468 0.019
## Chain 1: 2700 -1547.860 0.090 0.014
## Chain 1: 2800 -1546.677 0.061 0.007 MEDIAN ELBO CONVERGED
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...</code></pre>
<p>The first thing we want to check with any MCMC model is convergence. An easy way to check is by looking at the Rhat distributions. If all these values are below 1.1, then we have good reason to believe that the model converged, and we can get these distributions with the <code>id_plot_rhats</code> function:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">id_plot_rhats</span>(irt_2pl_correct)</code></pre></div>
<pre><code>## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</code></pre>
<p><img src="" style="display: block; margin: auto;" /></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">id_plot_rhats</span>(irt_2pl_incorrect)</code></pre></div>
<pre><code>## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</code></pre>
<p><img src="" style="display: block; margin: auto;" /></p>
<p>We can only assume that the first model will converge. If the second model failed the Rhat test, at this point we should go back and see if the model is miss-specified or there is something wrong with the data. But for the sake of illustration, we will look at other diagnostics. We can also examine whether 1) the models are able to replicate the data they were fitted on accurately and 2) overall measures of model fit.</p>
<p>We can first look at how well the model reproduces the data, which is called the posterior predictive distribution. We can obtain these distributions using the <code>id_post_pred</code> function:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">post_correct <-<span class="st"> </span><span class="kw">id_post_pred</span>(irt_2pl_correct)</code></pre></div>
<pre><code>## [1] "Processing posterior replications for 1000 scores using 100 posterior samples out of a total of 500 samples."</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">post_incorrect <-<span class="st"> </span><span class="kw">id_post_pred</span>(irt_2pl_incorrect)</code></pre></div>
<pre><code>## [1] "Processing posterior replications for 1000 scores using 100 posterior samples out of a total of 500 samples."</code></pre>
<p>What we can do is the use a wrapper around the <code>bayesplot</code> package called <code>id_plot_ppc</code> to see how well these models replicate their own data. These plots show the original data as blue bars and the posterior predictive estimate as an uncertainty interval. If the interval overlaps with the observed data, then the model was able to replicate the observed data, which is a basic and important validity test for the model, i.e., could the model have generated the data that was used to fit it?</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">id_plot_ppc</span>(irt_2pl_correct,<span class="dt">ppc_pred=</span>post_correct)</code></pre></div>
<p><img src="" style="display: block; margin: auto;" /></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">id_plot_ppc</span>(irt_2pl_incorrect,<span class="dt">ppc_pred=</span>post_incorrect)</code></pre></div>
<p><img src="" style="display: block; margin: auto;" /></p>
<p>It is stupidly obvious from these plots that one should not use a Poisson distribution for a Bernoulli (binary) outcome as it will predict values as high as 10 (although it still puts the majority of the density on 1 and 2). Only two observed values are showing on the Poisson predictive distribution because the missing data model must be shown separately if the outcome is unbounded. To view the missing data model (i.e., a binary response), simply change the <code>output</code> option in <code>id_post_pred</code> to <code>'missing'</code>. We can also look at particular persons or items to see how well the models predict those persons or items. For example, let’s look at the first two persons in the simulated data for which we incorrectly used the Poisson model by passing their IDs as a character vector to the <code>group</code> option:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">id_plot_ppc</span>(irt_2pl_incorrect,<span class="dt">ppc_pred=</span>post_incorrect,<span class="dt">group=</span><span class="kw">c</span>(<span class="st">'1'</span>,<span class="st">'2'</span>))</code></pre></div>
<p><img src="" style="display: block; margin: auto;" /></p>
<p>Finally, we can turn to summary measures of model fit that also allow us to compare models directly to each other (if they were fit on the same data). To do so, I first employ the <code>log_lik</code> option of the <code>id_post_pred</code> function to generate log-likelihood values for each of these models.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">log_lik_irt_2pl_correct <-<span class="st"> </span><span class="kw">id_post_pred</span>(irt_2pl_correct,<span class="dt">type=</span><span class="st">'log_lik'</span>)</code></pre></div>
<pre><code>## [1] "Processing posterior replications for 1000 scores using 500 posterior samples out of a total of 500 samples."</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">log_lik_irt_2pl_incorrect <-<span class="st"> </span><span class="kw">id_post_pred</span>(irt_2pl_incorrect,<span class="dt">type=</span><span class="st">'log_lik'</span>)</code></pre></div>
<pre><code>## [1] "Processing posterior replications for 1000 scores using 500 posterior samples out of a total of 500 samples."</code></pre>
<p>Note that we must also specify the <code>relative_eff</code> function in order to calculate degrees of freedom. The first argument to <code>relative_eff</code> is the exponentiated log-likelihood matrix we calculated previously. The second argument uses the function <code>derive_chain</code> and also takes the log-likelihood matrix as its argument.</p>
<p>I put in an option for two cores, but a larger number could be used which would improve the speed of the calculations.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">correct_loo <-<span class="st"> </span><span class="kw">loo</span>(log_lik_irt_2pl_correct,
<span class="dt">cores=</span><span class="dv">2</span>,
<span class="dt">r_eff=</span><span class="kw">relative_eff</span>(<span class="kw">exp</span>(log_lik_irt_2pl_correct),
<span class="dt">chain_id=</span><span class="kw">derive_chain</span>(log_lik_irt_2pl_correct)))</code></pre></div>
<pre><code>## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">incorrect_loo <-<span class="st"> </span><span class="kw">loo</span>(log_lik_irt_2pl_incorrect,
<span class="dt">cores=</span><span class="dv">2</span>,
<span class="dt">r_eff=</span><span class="kw">relative_eff</span>(<span class="kw">exp</span>(log_lik_irt_2pl_incorrect),
<span class="dt">chain_id=</span><span class="kw">derive_chain</span>(log_lik_irt_2pl_incorrect)))</code></pre></div>
<pre><code>## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">print</span>(correct_loo)</code></pre></div>
<pre><code>##
## Computed from 500 by 1000 log-likelihood matrix
##
## Estimate SE
## elpd_loo -630.2 14.8
## p_loo 105.4 3.8
## looic 1260.4 29.6
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 936 93.6% 106
## (0.5, 0.7] (ok) 59 5.9% 73
## (0.7, 1] (bad) 5 0.5% 48
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">print</span>(incorrect_loo)</code></pre></div>
<pre><code>##
## Computed from 500 by 1000 log-likelihood matrix
##
## Estimate SE
## elpd_loo -1042.5 12.2
## p_loo 39.3 2.2
## looic 2084.9 24.4
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 992 99.2% 172
## (0.5, 0.7] (ok) 7 0.7% 227
## (0.7, 1] (bad) 1 0.1% 379
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.</code></pre>
<p>With this calculation we can examine the models’ <code>loo</code> values, which shows the relative predictive performance of the model to the data. Overall, model performance seems quite good, as the Pareto k values show that there are only a few dozen observations in the dataset that aren’t well predicted. The LOO-IC, or the leave-one-out information criterion (think AIC or BIC), is much lower (i.e. better) for the correct model, as we would expect.</p>
<p>We can also compare the LOOIC of the two models explicitly using a second <code>loo</code> function that will even give us a confidence interval around the difference. If the difference is negative, then the first (correct) model has higher predictive accuracy:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">compare</span>(correct_loo,
incorrect_loo)</code></pre></div>
<pre><code>## elpd_diff se
## -412.3 14.8</code></pre>
<p>The first (correct) model is clearly preferred, as we would certainly hope in this extreme example. In less obvious cases, the <code>elpd</code> values can help arbitrate between model choices when a theoretical reason for choosing a model does not exist.</p>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>