-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathANOVA_R_sameas_SPSS.Rmd
executable file
·262 lines (179 loc) · 7.91 KB
/
ANOVA_R_sameas_SPSS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
---
title: "ANOVA_R_sameas_SPSS"
author: "liuc"
date: '2022-03-18'
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## ANOVA_R_sameas_SPSS
在做方差分析时,R里默认的contrasts和SPSS里面不一致, 现将其一致的结果整理如下:
http://www.statscanbefun.com/rblog/2015/8/27/ensuring-r-generates-the-same-anova-f-values-as-spss
https://medium.com/humansystemsdata/analysis-of-variance-showdown-r-vs-spss-f4e50234a94
he nature of the differences between SPSS and R becomes evident when there are an unequal number of participants across factorial ANOVA cells. There are a few simple steps that can be followed to ensure that R ANOVA values do indeed match those generated by SPSS. These steps involves using Type-III sums of squares for the ANOVA but there is more to it than that.
There are three things you need to do to ensure ANOVA F-values in R match those in SPSS. I will briefly list these three steps and then provide a more details description of each.
1. Set each independent variable as a factor
2. Set the default contrast to helmert
3. Conduct analysis using Type III Sums of Squares
```{r,include=FALSE}
library(tidyverse)
library(easystats)
library(ggeffects)
library(compareGroups)
```
```{r}
# We need to change the default contrast for unordered factors from "cont.treatment" to "contr.helmert". We do this with the command below
options(contrasts = c("contr.helmert", "contr.poly"))
df <- haven::read_sav('~/Downloads/两因素方差分析-SPSS教程-医咖会/16 双因素方差分析.sav') %>%
rstatix::convert_as_factor(gender, education)
dd <- aov(Index ~ gender * education, data = df)
car::Anova(dd, type = 'III')
```
```{r}
# same as dd, 用线性模型得到的结果
crf.lm <- lm(formula = Index ~ gender * education, data = df)
car::Anova(crf.lm)
```
*aov* 和 `stats::oneway.test`的异同:
主要是var.equal的差别。aov is designed for balanced designs, and the results can be hard to interpret without balance: beware that missing values in the response(s) will likely lose the balance.
```{r}
stats::oneway.test(dW24 ~ tvRatio_group, data = df, var.equal = T)
aov(dW24 ~ tvRatio_group, data = df) %>% summary()
```
一个较为完善的方差分析,post-hoc
```{r}
# 单因素方差分析
df %>% group_by(education) %>%
check_outliers(method = c(
"mahalanobis",
"iqr",
"zscore"
))
df %>% group_by(education) %>%
shapiro_test(Index)
aov.res.one <- aov(Index ~ education, data = df)
car::Anova(aov.res.one, type = 'III')
# 模型检验,符合正态分布、方差齐性等
check_model(aov.res.one)
# 双因素方差分析
aov.res.two <- aov(Index ~ gender * education, data = df)
car::Anova(aov.res.two, type = 'III')
```
*post-hoc analysis *
```{r}
library(multcomp)
library(multcompView)
library(FSA)
```
the Tukey HSD test allows to compares all groups but at the cost of less power
```{r}
TukeyHSD(res_aov)
```
the Dunnett’s test allows to only make comparisons with a reference group, but with the benefit of more power
```{r}
# Dunnett's test:
post_test <- glht(res_aov,
linfct = mcp(species = "Dunnett")
)
```
```{r}
FSA::dunnTest(flipper_length_mm ~ species,
data = dat,
method = "holm"
)
```
## MANOVA 多元方差分析
多元方差分析是指有多个结局变量outcome的ANOVA模型。
多个结局变量的情况如,ABC三组小鼠分别经受DEF两种药物不同的处理,研究者想要同时观察小鼠身高和体重的变化,NULL假设为药物处理对身高体重皆有影响,此种情况下适合用MANOVA。
MANOVA需要满足几个assumption: multivariate normality. `mvnormtest::mshapiro.test( )`, *No multicollinearity*, *No outliers*.
在结果有意义后,可通过one-way ANOVA对每一个dependent variable分别检验。
> https://www.r-bloggers.com/2022/01/manova-in-r-how-to-implement-and-interpret-one-way-manova/
MANOVA in R uses Pillai’s Trace test for the calculations, which is then converted to an F-statistic when we want to check the significance of the group mean differences. You can use other tests, such as Wilk’s Lambda, Roy’s Largest Root, or Hotelling-Lawley’s test, but Pillai’s Trace test is the most powerful one.
```{r}
dependent_vars <- cbind(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)
independent_var <- iris$Species
# MANOVA test
manova_model <- manova(dependent_vars ~ independent_var, data = iris)
summary(manova_model)
# If the value is 0.14 or greater, we can say the effect size is large.
effectsize::eta_squared(manova_model)
# Look to see which differ
summary.aov(manova_model)
# post-hoc by LDA
iris_lda <- MASS::lda(independent_var ~ dependent_vars, CV = F)
iris_lda
lda_df <- data.frame(
species = iris[, "Species"],
lda = predict(iris_lda)$x
)
lda_df
ggplot(lda_df) +
geom_point(aes(x = lda.LD1, y = lda.LD2, color = species), size = 4) +
theme_classic()
```
## 重复测量方差分析
#### 单因素重复测量方差分析一般指单一的Time因素
主要讨论
#### 双因素重复测量方差分析
重复测量方差分析适用于结局变量为连续变量,且因变量符合正态分布 的数据。
比如常见的医学数据,对两组患者的在不同时间采样的某项定量数据,是最为常见的试验设计方式。一般因变量的测量为不同时期测量数据减去baseline数据作为结局变量。
对于AB两组各10个人的设计,time为within,group为between;对于10个人都进行两组不同的处理,则time和group都是within因素。
Time作为within-Subjects variables
between-Subject variables是指何种变量呢?group.
covariates 如何解释呢?
*单独效应:*
*主效应:*
*交互效应:*
```{r}
# 注意数据的category变量要变成factor
df_wider <- haven::read_sav('./datasets/两因素重复测量的方差分析.sav')
df <- df_wider %>% pivot_longer(cols = starts_with('t'), names_to = 'time', values_to = 'value') %>%
rstatix::convert_as_factor(group, id, time)
df$time <- factor(df$time, levels = c('t0', 't5', 't30', 't60', 't120'))
# 重复测量方差分析
aov_res <- aov(value ~ group * time + Error(id/time), data = df)
aov_res
summary(aov_res)
parameters::model_parameters(aov_res)
effectsize::eta_squared(aov_res)
# emmeans 对aov的协方差似乎不适用。。。
grafify::posthoc_Levelwise(Model = aov_res,
Fixed_Factor = c("group"),
infer = c(TRUE, TRUE)
)
## use rstatix package
aov_res2 <- rstatix::anova_test(formula = value ~ group * time + Error(id/time),
data = df
)
```
*Repeated-measures ANOVA with the car package*
https://mspeekenbrink.github.io/sdam-r-companion/repeated-measures-anova.html
```{r}
library(car)
library(afex)
# car::Anova()需要宽数据,以下用afex包分析长数据
df_wider <- df_wider %>% rstatix::convert_as_factor(group, id)
mvmod <- lm(cbind(t0, t5, t30, t60, t120) ~ group, data=df_wider)
idata <- data.frame(Time = factor(c("t0", "t5", "t30", "t60", "t120")))
contrasts(idata$Time) # check the levels, 好像有点搞不懂,还是用afex包吧。
rmaov <- car::Anova(mvmod, idata=idata, idesign = ~Time, type=3)
rmaov
summary(rmaov, multivariate=FALSE)
# afex provides an abridged ANOVA table, where the Greenhouse-Geisser correction is automatically applied
afmod <- afex::aov_car(value ~ group * time + Error(id/time),
data = df)
afmod
summary(afmod)
afex::nice(afmod, es="pes", correction = "none")
em_version <- emmeans::emmeans(afmod, specs = ~ group)
em_version
emmeans::contrast(em_version, method=list("group1 - group2" = c(1,-1)))
plot(ggemmeans(afmod, terms = c("time", "group"),
condition = c(diagnose = "severe"))) +
ggplot2::ggtitle("EMMEANS Plot")
grafify::posthoc_Levelwise(Model = afmod,
Fixed_Factor = c("group"),
infer = c(TRUE, TRUE)
)
```