ANOVA_R_sameas_SPSS.Rmd

---
title: "ANOVA_R_sameas_SPSS"
author: "liuc"
date: '2022-03-18'
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## ANOVA_R_sameas_SPSS

在做方差分析时，R里默认的contrasts和SPSS里面不一致, 现将其一致的结果整理如下：

http://www.statscanbefun.com/rblog/2015/8/27/ensuring-r-generates-the-same-anova-f-values-as-spss

https://medium.com/humansystemsdata/analysis-of-variance-showdown-r-vs-spss-f4e50234a94

he nature of the differences between SPSS and R becomes evident when there are an unequal number of participants across factorial ANOVA cells. There are a few simple steps that can be followed to ensure that R ANOVA values do indeed match those generated by SPSS. These steps involves using Type-III sums of squares for the ANOVA but there is more to it than that.


There are three things you need to do to ensure ANOVA F-values in R match those in SPSS. I will briefly list these three steps and then provide a more details description of each.

1. Set each independent variable as a factor
2. Set the default contrast to helmert
3. Conduct analysis using Type III Sums of Squares


```{r,include=FALSE}
library(tidyverse)
library(easystats)
library(ggeffects)
library(compareGroups)
```


```{r}
# We need to change the default contrast for unordered factors from "cont.treatment" to "contr.helmert". We do this with the command below

options(contrasts = c("contr.helmert", "contr.poly"))

df <- haven::read_sav('~/Downloads/两因素方差分析-SPSS教程-医咖会/16 双因素方差分析.sav') %>% 
  rstatix::convert_as_factor(gender, education)

dd <- aov(Index ~ gender * education, data = df)
car::Anova(dd, type = 'III')
           
```

```{r}
# same as dd, 用线性模型得到的结果
crf.lm <- lm(formula = Index ~ gender * education, data = df)

car::Anova(crf.lm)


```


*aov* 和 `stats::oneway.test`的异同：
主要是var.equal的差别。aov is designed for balanced designs, and the results can be hard to interpret without balance: beware that missing values in the response(s) will likely lose the balance.
```{r}
stats::oneway.test(dW24 ~ tvRatio_group, data = df, var.equal = T)

aov(dW24 ~ tvRatio_group, data = df) %>% summary()
```


一个较为完善的方差分析，post-hoc

```{r}
# 单因素方差分析
df %>% group_by(education) %>% 
  check_outliers(method = c(
  "mahalanobis",
  "iqr",
  "zscore"
))
df %>% group_by(education) %>% 
  shapiro_test(Index)

aov.res.one <- aov(Index ~ education, data = df)
car::Anova(aov.res.one, type = 'III')
# 模型检验，符合正态分布、方差齐性等
check_model(aov.res.one)


# 双因素方差分析
aov.res.two <- aov(Index ~ gender * education, data = df)
car::Anova(aov.res.two, type = 'III')


```


*post-hoc analysis *

```{r}
library(multcomp)
library(multcompView)
library(FSA)
```

the Tukey HSD test allows to compares all groups but at the cost of less power
```{r}
TukeyHSD(res_aov)
```
the Dunnett’s test allows to only make comparisons with a reference group, but with the benefit of more power
```{r}
# Dunnett's test:
post_test <- glht(res_aov,
  linfct = mcp(species = "Dunnett")
)
```

```{r}
FSA::dunnTest(flipper_length_mm ~ species,
  data = dat,
  method = "holm"
)
```


## MANOVA 多元方差分析
多元方差分析是指有多个结局变量outcome的ANOVA模型。
多个结局变量的情况如，ABC三组小鼠分别经受DEF两种药物不同的处理，研究者想要同时观察小鼠身高和体重的变化，NULL假设为药物处理对身高体重皆有影响，此种情况下适合用MANOVA。
MANOVA需要满足几个assumption: multivariate normality. `mvnormtest::mshapiro.test( )`, *No multicollinearity*, *No outliers*.
在结果有意义后，可通过one-way ANOVA对每一个dependent variable分别检验。

> https://www.r-bloggers.com/2022/01/manova-in-r-how-to-implement-and-interpret-one-way-manova/

MANOVA in R uses Pillai’s Trace test for the calculations, which is then converted to an F-statistic when we want to check the significance of the group mean differences. You can use other tests, such as Wilk’s Lambda, Roy’s Largest Root, or Hotelling-Lawley’s test, but Pillai’s Trace test is the most powerful one.
```{r}
dependent_vars <- cbind(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)
independent_var <- iris$Species
# MANOVA test
manova_model <- manova(dependent_vars ~ independent_var, data = iris)

summary(manova_model)

# If the value is 0.14 or greater, we can say the effect size is large.
effectsize::eta_squared(manova_model)

# Look to see which differ
summary.aov(manova_model)

# post-hoc by LDA
iris_lda <- MASS::lda(independent_var ~ dependent_vars, CV = F)
iris_lda
lda_df <- data.frame(
  species = iris[, "Species"],
  lda = predict(iris_lda)$x
)
lda_df

ggplot(lda_df) +
  geom_point(aes(x = lda.LD1, y = lda.LD2, color = species), size = 4) +
  theme_classic()
```


## 重复测量方差分析
#### 单因素重复测量方差分析一般指单一的Time因素
主要讨论
#### 双因素重复测量方差分析

重复测量方差分析适用于结局变量为连续变量，且因变量符合正态分布 的数据。

比如常见的医学数据，对两组患者的在不同时间采样的某项定量数据，是最为常见的试验设计方式。一般因变量的测量为不同时期测量数据减去baseline数据作为结局变量。
对于AB两组各10个人的设计，time为within，group为between；对于10个人都进行两组不同的处理，则time和group都是within因素。

Time作为within-Subjects variables
between-Subject variables是指何种变量呢？group.
covariates 如何解释呢？

*单独效应：*
*主效应：*
*交互效应：*


```{r}
# 注意数据的category变量要变成factor
df_wider <- haven::read_sav('./datasets/两因素重复测量的方差分析.sav')
df <- df_wider %>% pivot_longer(cols = starts_with('t'), names_to = 'time', values_to = 'value') %>% 
  rstatix::convert_as_factor(group, id, time)
df$time <- factor(df$time, levels = c('t0', 't5', 't30', 't60', 't120'))

# 重复测量方差分析
aov_res <- aov(value ~ group * time + Error(id/time), data = df)

aov_res
summary(aov_res)

parameters::model_parameters(aov_res)
effectsize::eta_squared(aov_res)

# emmeans 对aov的协方差似乎不适用。。。
grafify::posthoc_Levelwise(Model = aov_res,
                           Fixed_Factor = c("group"),
                           infer = c(TRUE, TRUE)
                           )


## use rstatix package
aov_res2 <- rstatix::anova_test(formula = value ~ group * time + Error(id/time),
                    data = df
                    )


```

*Repeated-measures ANOVA with the car package*
https://mspeekenbrink.github.io/sdam-r-companion/repeated-measures-anova.html

```{r}
library(car)
library(afex)

# car::Anova()需要宽数据，以下用afex包分析长数据
df_wider <- df_wider %>%  rstatix::convert_as_factor(group, id)

mvmod <- lm(cbind(t0, t5, t30, t60, t120) ~ group, data=df_wider)
idata <- data.frame(Time = factor(c("t0", "t5", "t30", "t60", "t120")))
contrasts(idata$Time) # check the levels, 好像有点搞不懂，还是用afex包吧。
rmaov <- car::Anova(mvmod, idata=idata, idesign = ~Time,  type=3)
rmaov
summary(rmaov, multivariate=FALSE)


# afex provides an abridged ANOVA table, where the Greenhouse-Geisser correction is automatically applied
afmod <- afex::aov_car(value ~ group * time + Error(id/time), 
                       data = df)
afmod
summary(afmod)
afex::nice(afmod, es="pes", correction = "none")

em_version <- emmeans::emmeans(afmod, specs = ~ group)
em_version
emmeans::contrast(em_version, method=list("group1 - group2" = c(1,-1)))

plot(ggemmeans(afmod, terms = c("time", "group"),
               condition = c(diagnose = "severe"))) +
  ggplot2::ggtitle("EMMEANS Plot")

grafify::posthoc_Levelwise(Model = afmod,
                           Fixed_Factor = c("group"),
                           infer = c(TRUE, TRUE)
                           )

```