-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #278 from tidymodels/consolidate-contr_one_hot
Consolidate `contr_one_hot()`
- Loading branch information
Showing
7 changed files
with
193 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
Package: hardhat | ||
Title: Construct Modeling Packages | ||
Version: 1.4.0.9002 | ||
Version: 1.4.0.9003 | ||
Authors@R: c( | ||
person("Hannah", "Frick", , "[email protected]", role = c("aut", "cre"), | ||
comment = c(ORCID = "0000-0002-6049-5258")), | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
```{r load, include = FALSE} | ||
library(dplyr) | ||
``` | ||
|
||
By default, `model.matrix()` generates binary indicator variables for factor predictors. When the formula does not remove an intercept, an incomplete set of indicators are created; no indicator is made for the first level of the factor. | ||
|
||
For example, `species` and `island` both have three levels but `model.matrix()` creates two indicator variables for each: | ||
|
||
```{r ref-cell} | ||
library(dplyr) | ||
library(modeldata) | ||
data(penguins) | ||
levels(penguins$species) | ||
levels(penguins$island) | ||
model.matrix(~ species + island, data = penguins) %>% | ||
colnames() | ||
``` | ||
|
||
For a formula with no intercept, the first factor is expanded to indicators for _all_ factor levels but all other factors are expanded to all but one (as above): | ||
|
||
```{r hybrid} | ||
model.matrix(~ 0 + species + island, data = penguins) %>% | ||
colnames() | ||
``` | ||
|
||
For inference, this hybrid encoding can be problematic. | ||
|
||
To generate all indicators, use this contrast: | ||
|
||
```{r one-hot} | ||
# Switch out the contrast method | ||
old_contr <- options("contrasts")$contrasts | ||
new_contr <- old_contr | ||
new_contr["unordered"] <- "contr_one_hot" | ||
options(contrasts = new_contr) | ||
model.matrix(~ species + island, data = penguins) %>% | ||
colnames() | ||
options(contrasts = old_contr) | ||
``` | ||
|
||
Removing the intercept here does not affect the factor encodings. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,38 +1,64 @@ | ||
# `contr_one_hot()` input checks | ||
|
||
Code | ||
contr_one_hot(n = 1, sparse = TRUE) | ||
contr_one_hot(n = 2, sparse = TRUE) | ||
Condition | ||
Warning: | ||
`sparse = TRUE` not implemented for `contr_one_hot()`. | ||
Output | ||
1 | ||
1 1 | ||
1 2 | ||
1 1 0 | ||
2 0 1 | ||
|
||
--- | ||
|
||
Code | ||
contr_one_hot(n = 1, contrasts = FALSE) | ||
contr_one_hot(n = 2, contrasts = FALSE) | ||
Condition | ||
Warning: | ||
`contrasts = FALSE` not implemented for `contr_one_hot()`. | ||
Output | ||
1 | ||
1 1 | ||
1 2 | ||
1 1 0 | ||
2 0 1 | ||
|
||
--- | ||
|
||
Code | ||
contr_one_hot(n = 1:2) | ||
Condition | ||
Error in `contr_one_hot()`: | ||
! `n` must have length 1 when an integer is provided. | ||
! `n` must be a whole number, not an integer vector. | ||
|
||
--- | ||
|
||
Code | ||
contr_one_hot(n = list(1:2)) | ||
Condition | ||
Error in `contr_one_hot()`: | ||
! `n` must be a character vector or an integer of size 1. | ||
! `n` must be a whole number, not a list. | ||
|
||
--- | ||
|
||
Code | ||
contr_one_hot(character(0)) | ||
Condition | ||
Error in `contr_one_hot()`: | ||
! `n` cannot be empty. | ||
|
||
--- | ||
|
||
Code | ||
contr_one_hot(-1) | ||
Condition | ||
Error in `contr_one_hot()`: | ||
! `n` must be a whole number larger than or equal to 1, not the number -1. | ||
|
||
--- | ||
|
||
Code | ||
contr_one_hot(list()) | ||
Condition | ||
Error in `contr_one_hot()`: | ||
! `n` must be a whole number, not an empty list. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters