-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBayesian Multilevel Modeling.Rmd
311 lines (212 loc) · 11.2 KB
/
Bayesian Multilevel Modeling.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
---
title: "Bayesian Multilevel Modeling Exercise"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Continuing with our exercise from last class. We want to build a Bayesian multi-level model. First, Load the following libraries:
```{r, echo = T, message=F, warning=F, fig.width=5, fig.height=3, fig.align="center"}
library(tidyverse)
library(lme4)
library(rstan)
library(cowplot)
library(ggplot2)
library(lubridate)
library(ggridges)
library(kableExtra)
```
Then load the data from last class:
```{r, echo = T, message=F, warning=F, fig.width=6, fig.height=4, fig.align="center"}
SalesTrans =
read_csv("C:/Users/ellen/Documents/UH/Fall 2020/Class Materials/Section II/Class 1/Data/Sales.csv")
Location =
read_csv("C:/Users/ellen/Documents/UH/Fall 2020/Class Materials/Section II/Class 1/Data/Location.csv")
MerGroup =
read_csv("C:/Users/ellen/Documents/UH/Fall 2020/Class Materials/Section II/Class 1/Data/MerGroup.csv")
SalesTrans = SalesTrans %>% inner_join(Location, by = "LocationID")
SalesTrans = SalesTrans %>% inner_join(MerGroup, by = "MerGroup")
LocationID = as.factor(SalesTrans$LocationID)
SalesTrans$ProductID = as.factor(SalesTrans$ProductID)
SalesTrans$Description = as.factor(SalesTrans$Description)
SalesTrans$MerGroup = as.factor(SalesTrans$MerGroup)
# breaking out Q4 to simplify exercise
SalesTrans$Qtr = quarter(SalesTrans$Tdate)
SalesTrans = filter(SalesTrans, Qtr == 4)
SalesTransSummary = SalesTrans %>%
group_by(Description, MerGroup, MfgPromo, Wk ) %>%
summarise(Volume = n(), TotSales = sum(Amount) )
```
Now, let's filter it down to 3 Desciptions *(locations)* and 4 MerGroups so it's easier to see. We'll plot this out here - note that lm in ggplot does not generate models for level1/level2 groups where data does not exist, and it only generates independent models *(no pooling)*.
These are the locations and MerGroups we'll be analyzing the rest of the exercise:
```{r, echo = T, message=F, warning=F, fig.width=6, fig.height=4, fig.align="center"}
SalesSummarySub = filter(SalesTransSummary,
Description %in% c('Seattle', "Arlington", "San Jose", "Boston"),
MerGroup %in% c("Accessories", "Beauty", "Cooking", "Men"))
p = ggplot(SalesSummarySub, aes(Wk, TotSales, color = MerGroup)) +
geom_point(alpha = .2) +
geom_smooth(method = "lm", se = F, alpha = .05, linetype = "dashed", alpha = .5) +
facet_wrap(~Description) +
theme(panel.background = element_rect(fill = "white"))
p
```
But, we're going to build models based on the total population *(we need to understand how overall groups and effects behave first)*. So, let's back up, and to make this super obvious, we'll create a simple lm model *(full pooling)* for a baseline *(and we'll save the parameters for priors)*.
```{r, echo = T, message=F, warning=F, fig.width=6, fig.height=4, fig.align="center"}
Priors = lm(TotSales ~ Wk , data = SalesTransSummary)
lmI = as.numeric(Priors$coefficients[1])
lmS = as.numeric(Priors$coefficients[2])
pt = ggplot(SalesTransSummary, aes(Wk, TotSales)) +
geom_point(alpha = .2) +
geom_abline(aes(intercept=lmI, slope=lmS)) +
theme(panel.background = element_rect(fill = "white"))
pt
```
Now, we'll take those priors and build a 2 level Bayesian model. First, we'll use a fully pooled lm model to force one intercept and one slope across all the levels and values *(we'll use rep to repeat the same value 10 times)*.
We're doing this just to see how the groups *(levels)* are constructed and controlled by priors and sigmas:
```{r, echo = T, message=F, warning=F, fig.width=6, fig.height=4, fig.align="center"}
# build stan model
stanMod <- '
data {
int<lower=0> N;
vector[N] x; // Wk
vector[N] y; // TotSales
int<lower=0> J; // Description
int Description[N];
int<lower=0> K; // MerGroup
int MerGroup[N];
vector[J] p_alpha;
vector[K] p_alpha2;
real<lower=0> p_alphaSigma;
real<lower=0> p_alphaSigma2;
vector[J] p_beta;
vector[K] p_beta2;
real<lower=0> p_betaSigma;
real<lower=0> p_betaSigma2;
}
parameters {
// random effects we found
vector[J] alpha; // intercept for Description
vector[K] alpha2; // intercept for MerGroup
vector[J] beta; // slope for Description
vector[K] beta2; // slope for MerGroup
real<lower=0> sigma; // to control sigma of y_hat
}
transformed parameters {
vector[N] y_hat;
for(i in 1:N)
y_hat[i]=alpha[Description[i]]+alpha2[MerGroup[i]]+
(beta[Description[i]]*x[i])+(beta2[MerGroup[i]]*x[i]);
}
model {
target += normal_lpdf(alpha | p_alpha, p_alphaSigma);
target += normal_lpdf(alpha2 | p_alpha2, p_alphaSigma2);
target += normal_lpdf(beta | p_beta, p_betaSigma);
target += normal_lpdf(beta2 | p_beta2, p_betaSigma2);
target += normal_lpdf(sigma | 50, 50);
// y_hat: our prediction
target += normal_lpdf(y | y_hat, sigma);
}
'
fit <- stan(model_code = stanMod, data = list(
N = nrow(SalesTransSummary),
y = SalesTransSummary$TotSales,
x = SalesTransSummary$Wk,
J = length(unique(SalesTransSummary$Description)),
K = length(unique(SalesTransSummary$MerGroup)),
Description = as.numeric(SalesTransSummary$Description),
MerGroup = as.numeric(SalesTransSummary$MerGroup),
p_alpha = rep(lmI,10), # Description intercept
p_alpha2 = rep(lmI,10), # MerGroup intercept
#
p_alphaSigma = 10,
p_alphaSigma2 = 10,
p_beta = rep(lmS,10), # Description slope (lmer model)
p_beta2 = rep(lmS,10), # MerGroup slope (lmer model)
p_betaSigma = 1,
p_betaSigma2 = 1
), refresh = 0)
```
Now, we'll extract the parameters, put them in a dataframe that we can use for plotting, and narrow the visualization down to the subgroup we began with:
```{r, echo = T, message=F, warning=F, fig.width=6, fig.height=4, fig.align="center"}
# Extract coefficients from stan model
sumFit <- data.frame(summary(fit))
# Description
Intercept1 <- summary(fit, pars = c("alpha"), probs = c(0.1, 0.9))$summary
# MerGroup
Intercept2 <- summary(fit, pars = c("alpha2"), probs = c(0.1, 0.9))$summary
# Description slope
Slope1 <- summary(fit, pars = c("beta"), probs = c(0.1, 0.9))$summary
# MerGroup slope
Slope2 <- summary(fit, pars = c("beta2"), probs = c(0.1, 0.9))$summary
# random effects for each Description
CoefMapD <- data.frame(Description = unique(SalesTransSummary$Description),
DIntercept = Intercept1[,1], DSlope = Slope1[,1])
# random effects for each MerGroup
CoefMapM <- data.frame(MerGroup = unique(SalesTransSummary$MerGroup), MIntercept = Intercept2[,1], MSlope = Slope2[,1])
# add all intercepts and all slopes cross each row
CoefMapB = crossing(CoefMapD, CoefMapM) %>%
mutate(Intercept = DIntercept + MIntercept, Slope = DSlope + MSlope)
CoefMapBSub <- filter(CoefMapB,
Description %in% c('Seattle', "Arlington", "San Jose", "Boston"),
MerGroup %in% c("Accessories", "Beauty", "Cooking", "Men"))
```
Taking a look at the models on a plot:
```{r, echo = T, message=F, warning=F, fig.width=6, fig.height=4, fig.align="center"}
p1 = p +
geom_abline(data = CoefMapBSub,
aes(intercept=Intercept, slope=Slope, color=MerGroup))
p1
```
Note that our Bayesian model gives us parameter *effects* for every location and merch group *(see dataframe CoefMapBSub)*, even where no data exists *(e.g., Arlington data had not Accessories or Beauty products)*. *(you can't see it on the plot here, because all the models overlap because we severely limited the priors and sigmas for learning purposes)* This is a BIG deal - how would you forecast Accessories for Arlington if Nordies was planning on carrying that group next year? We can do this becuase the model quantifies the *EFFECTS* of each location and the *EFFECTS* of each merch group. Then we just add the effect to the base to get the prosterior for a location, merch group of interest.
In this example however, we have unrealistically limited effects. So let's open this up a bit by using the last model parameters as priors and increasing sigma values, allowing the sampler to weight the liklihood data more:
```{r, echo = T, message=F, warning=F, fig.width=6, fig.height=4, fig.align="center"}
fit <- stan(model_code = stanMod, data = list(
N = nrow(SalesTransSummary),
y = SalesTransSummary$TotSales,
x = SalesTransSummary$Wk,
J = length(unique(SalesTransSummary$Description)),
K = length(unique(SalesTransSummary$MerGroup)),
Description = as.numeric(SalesTransSummary$Description),
MerGroup = as.numeric(SalesTransSummary$MerGroup),
p_alpha = as.numeric(CoefMapD[,2]),
p_alpha2 = as.numeric(CoefMapM[,2]),
#
p_alphaSigma = 20000,
p_alphaSigma2 = 20000,
p_beta = as.numeric(CoefMapD[,3]),
p_beta2 = as.numeric(CoefMapM[,3]),
p_betaSigma = 1000,
p_betaSigma2 = 1000
), refresh = 0)
# Extract coefficients from stan model
sumFit <- data.frame(summary(fit))
# Description
Intercept1 <- summary(fit, pars = c("alpha"), probs = c(0.1, 0.9))$summary
# MerGroup
Intercept2 <- summary(fit, pars = c("alpha2"), probs = c(0.1, 0.9))$summary
# Description slope
Slope1 <- summary(fit, pars = c("beta"), probs = c(0.1, 0.9))$summary
# MerGroup slope
Slope2 <- summary(fit, pars = c("beta2"), probs = c(0.1, 0.9))$summary
# random effects for each Description
CoefMapD2 <- data.frame(Description = unique(SalesTransSummary$Description),
DIntercept = Intercept1[,1], DSlope = Slope1[,1])
# random effects for each MerGroup
CoefMapM2 <- data.frame(MerGroup = unique(SalesTransSummary$MerGroup), MIntercept = Intercept2[,1], MSlope = Slope2[,1])
# add all intercepts and all slopes cross each row
CoefMapB2 = crossing(CoefMapD2, CoefMapM2) %>%
mutate(Intercept = DIntercept + MIntercept, Slope = DSlope + MSlope)
CoefMapBSub2 <- filter(CoefMapB2,
Description %in% c('Seattle', "Arlington", "San Jose", "Boston"),
MerGroup %in% c("Accessories", "Beauty", "Cooking", "Men"))
```
Notice how the models have extended to meet the likelihood data:
```{r, echo = T, message=F, warning=F, fig.width=6, fig.height=4, fig.align="center"}
p2 <- p + geom_abline(data = CoefMapBSub2,
aes(intercept=Intercept, slope=Slope, color=MerGroup))
p2
```
Concept Review:
We created 100 models, each with different parameter combinations. These are not independent models - they share covariance with group members (i.e., Sales in Seattle move together, and Sales of Beauty move together). So, if I need to forecast Beauty sales in Arlington, I just add the Beauty parameter to the Arlington group. It won't be the same as adding the Beauty parameter to another location. This is very powerful for analysis and forecasting.
We control the models through priors and sigmas. Sigmas tell us how much confidence we have in our priors. If I had known that a major competitor would be opening a store across the street, I might reduce the location parameter and squeeze the sigma. This is how we integrate experience and judgment into models. Again, this is very powerful for analysis and forecasting.
Note how we managed parameters in dataframes. You can do this anyway you like, but the parameters is what you do all this work for, so you want to secure and manage the data. These parameters are why the models are descriptive, explainable, while also being flexible.