Skip to content

Commit

Permalink
fix failing checks
Browse files Browse the repository at this point in the history
  • Loading branch information
rempsyc committed Oct 4, 2023
1 parent 273be72 commit 6d8d3bb
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 22 deletions.
21 changes: 16 additions & 5 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Ankerst
Archimbaud
Arel
Asq
BCI
BFBayesFactor
BMJ
Baayen
Expand Down Expand Up @@ -75,6 +76,7 @@ Gazen
Gelman
Gnanadesikan
Guilford
HDI
HJ
Hastie
Herron
Expand Down Expand Up @@ -110,10 +112,10 @@ Killeen
Kliegl
Kristensen
Kullback
Lakens
LOF
LOGLOSS
LOOIC
Lakens
Laniado
Leibler
Lemeshow
Expand All @@ -123,6 +125,7 @@ Leys
Lillo
Liu
Lomax
MADs
MSA
Maddala
Magee
Expand All @@ -148,6 +151,7 @@ Nakagawa's
Nordhausen
Normed
ORCID
OSF
Olkin
PNFI
Pek
Expand Down Expand Up @@ -210,6 +214,7 @@ WAIC
WMK
Weisberg
Windmeijer
Winsorization
Witten
Xu
YL
Expand Down Expand Up @@ -237,6 +242,7 @@ brmsfit
cauchy
clusterable
concurvity
datawizard
dbscan
der
detrend
Expand All @@ -252,20 +258,17 @@ fpsyg
gam
geoms
ggplot
github
gjo
glm
glmmTMB
glmrob
grey
heteroskedasticity
homoskedasticity
homoscedasticity
homoskedasticity
https
intra
intraclass
io
ize
joss
kmeans
lavaan
Expand All @@ -285,18 +288,22 @@ models’
multicollinearity
multimodel
multiresponse
multivariable
nd
nonnest
overfitted
patilindrajeets
poisson
preprint
priori
pscl
quared
quartile
quartiles
rOpenSci
recoding
rempsyc
reproducibility
rescaling
rma
rmarkdown
Expand All @@ -308,6 +315,7 @@ se
smicd
sphericity
strengejacke
suboptimal
subscale
subscales
theoreritcal
Expand All @@ -317,5 +325,8 @@ und
underfitted
underfitting
visualisation
winsorization
winsorize
winsorized
xy
youtube
39 changes: 22 additions & 17 deletions vignettes/check_outliers.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ output:
bibliography: paper.bib
vignette: >
\usepackage[utf8]{inputenc}
%\VignetteIndexEntry{Checking model assumption - linear models}
%\VignetteIndexEntry{Checking outliers with *performance*}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
Expand Down Expand Up @@ -110,10 +110,10 @@ plot(outliers)

```{r univariate_implicit, fig.cap = "Visual depiction of outliers using the robust z-score method. The distance represents an aggregate score for variables mpg, cyl, disp, and hp.", echo=FALSE}
library(see)
plot(outliers) +
plot(outliers) +
ggplot2::theme(axis.text.x = ggplot2::element_text(
angle = 45, size = 7
))
))
```

Other univariate methods are available, such as using the interquartile range (IQR), or based on different intervals, such as the Highest Density Interval (HDI) or the Bias Corrected and Accelerated Interval (BCI). These methods are documented and described in the function's [help page](<https://easystats.github.io/performance/reference/check_outliers.html>).
Expand All @@ -140,10 +140,10 @@ plot(outliers)
```

```{r multivariate_implicit, fig.cap = "Visual depiction of outliers using the Minimum Covariance Determinant (MCD) method, a robust version of the Mahalanobis distance. The distance represents the MCD scores for variables mpg, cyl, disp, and hp.", echo=FALSE}
plot(outliers) +
plot(outliers) +
ggplot2::theme(axis.text.x = ggplot2::element_text(
angle = 45, size = 7
))
))
```

Other multivariate methods are available, such as another type of robust Mahalanobis distance that in this case relies on an orthogonalized Gnanadesikan-Kettenring pairwise estimator [@gnanadesikan1972robust]. These methods are documented and described in the function's [help page](https://easystats.github.io/performance/reference/check_outliers.html).
Expand All @@ -169,21 +169,26 @@ Table 1 below summarizes which methods to use in which cases, and with what thre
```{r, echo=FALSE}
df <- data.frame(
`Statistical Test` = c(
"Supported regression model",
"Structural Equation Modeling (or other unsupported model)",
"Simple test with few variables (*t* test, correlation, etc.)"),
"Supported regression model",
"Structural Equation Modeling (or other unsupported model)",
"Simple test with few variables (*t* test, correlation, etc.)"
),
`Diagnosis Method` = c(
"**Model-based**: Cook (or Pareto for Bayesian models)",
"**Multivariate**: Minimum Covariance Determinant (MCD)",
"**Univariate**: robust *z* scores (MAD)"),
"**Model-based**: Cook (or Pareto for Bayesian models)",
"**Multivariate**: Minimum Covariance Determinant (MCD)",
"**Univariate**: robust *z* scores (MAD)"
),
`Recommended Threshold` = c(
"`qf(0.5, ncol(x), nrow(x) - ncol(x))` (or 0.7 for Pareto)",
"`qchisq(p = 1 - 0.001, df = ncol(x))`",
"`qnorm(p = 1 - 0.001 / 2)`, ~ 3.29")
"`qf(0.5, ncol(x), nrow(x) - ncol(x))` (or 0.7 for Pareto)",
"`qchisq(p = 1 - 0.001, df = ncol(x))`",
"`qnorm(p = 1 - 0.001 / 2)`, ~ 3.29"
)
)
knitr::kable(
df, col.names = gsub("[.]", " ", names(df)),
caption = "Summary of Statistical Outlier Detection Methods Recommendations.", longtable = TRUE)
df,
col.names = gsub("[.]", " ", names(df)),
caption = "Summary of Statistical Outlier Detection Methods Recommendations.", longtable = TRUE
)
```

## Cook's Distance vs. MCD
Expand Down Expand Up @@ -258,7 +263,7 @@ _Removing_ outliers can in this case be a valid strategy, and ideally one would
The _easystats_ ecosystem makes it easy to incorporate this step into your workflow through the `winsorize()` function of *{datawizard}*, a lightweight R package to facilitate data wrangling and statistical transformations [@patil2022datawizard]. This procedure will bring back univariate outliers within the limits of 'acceptable' values, based either on the percentile, the _z_ score, or its robust alternative based on the MAD.

```{r winsorization}
data[1501:1502, ] # See outliers rows
data[1501:1502, ] # See outliers rows
# Winsorizing using the MAD
library(datawizard)
winsorized_data <- winsorize(data, method = "zscore", robust = TRUE, threshold = 3)
Expand Down

0 comments on commit 6d8d3bb

Please sign in to comment.