-
Notifications
You must be signed in to change notification settings - Fork 1
/
cancer.Rmd
173 lines (130 loc) · 2.94 KB
/
cancer.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# 吸烟导致癌症? {#cancer}
```{r cancer_setup,message=FALSE,cache=FALSE}
library(tidyverse)
library(rstan)
```
从肺癌患者和无癌个体中随机抽取两组样本[^cancer],目的是考察两组人吸烟习惯的差异。
数据如下
```{r cancer01}
cancer <- tribble(
~group, ~n, ~smokers,
"Cancer patients", 86, 82,
"Control group", 86, 72
)
cancer
```
## 二项模型
为完成这个二项抽样模型,将每组数据里吸烟人数的比例$\pi_i$作为参数,并使用uniform先验概率分布
$$
\begin{aligned}[t]
r_i &\sim \mathsf{Binomial}(n_i, \pi_i)
\end{aligned}
$$
吸烟比例的差值为
$$
\delta = \pi_1 - \pi_2 ,
$$
并让$\pi$使用uniform先验概率分布
$$
\begin{aligned}
\pi_i &\sim \mathsf{Beta}(1, 1)
\end{aligned}
$$
在`generated quantities` block中计算了比例log-odds ratio之差
$$
\lambda = \log\left(\frac{\pi_1}{1 - \pi_1}\right) - \log \left( \frac{\pi_2}{1 - \pi_2} \right) ,
$$
具体 Stan 模型如下
```{r}
cancer_mod1 <- "
data {
int<lower = 0> r[2];
int<lower = 1> n[2];
}
parameters {
vector<lower = 0, upper = 1>[2] p;
}
model {
p ~ beta(1, 1);
r ~ binomial(n, p);
}
generated quantities {
real delta;
int delta_up;
real lambda;
int lambda_up;
delta = p[1] - p[2];
delta_up = delta > 0;
lambda = logit(p[1]) - logit(p[2]);
lambda_up = lambda > 0;
}
"
cancer_data <- list(
r = cancer$smokers,
n = cancer$n
)
cancer_fit1 <- stan(model_code = cancer_mod1, data = cancer_data)
```
```{r}
cancer_fit1
```
## 二项Logit模型
另外一种方法,就是直接模拟比例之差,这里需要用到参数$\alpha$ 和 $\beta$,每组数据的比例都是参数的log-odds
$$
\begin{aligned}[t]
r_i &\sim \mathsf{Binomial}(n_i, \pi_i) \\
\pi_1 &= \frac{1}{1 + \exp(-(\alpha + \beta)} \\
\pi_2 &= \frac{1}{1 + \exp(-\alpha))}
\end{aligned}
$$
这里设定$\alpha$ 和$\beta$ 弱先验信息
$$
\begin{aligned}
\alpha &\sim N(0, 10)\\
\beta &\sim N(0, 2.5)
\end{aligned}
$$
```{r cancer_mod2}
stan_program <- "
data {
int<lower = 0> r[2];
int<lower = 1> n[2];
}
parameters {
real a;
real b;
}
transformed parameters {
vector<lower = 0., upper = 1.>[2] p;
p[1] = inv_logit(a + b);
p[2] = inv_logit(a);
}
model {
a ~ normal(0, 10);
b ~ normal(0, 2.5);
r ~ binomial(n, p);
}
generated quantities {
real delta;
int delta_up;
real lambda;
int lambda_up;
delta = p[1] - p[2];
delta_up = delta > 0;
lambda = logit(p[1]) - logit(p[2]);
lambda_up = lambda > 0;
}
"
cancer_data <- list(
r = cancer$smokers,
n = cancer$n
)
cancer_fit2 <- stan(model_code = stan_program, data = cancer_data)
```
```{r}
cancer_fit2
```
## 参考
[^cancer]: This example is derived from Simon Jackman,
"[Cancer: difference in two binomial proportions](https://web-beta.archive.org/web/20070601000000*/http://jackman.stanford.edu:80/mcmc/cancer.odc)",
*BUGS Examples,* 2007-07-24, This examples comes from @JohnsonAlbert1999a, using data from @Dorn1954a.