Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add set_labels to clean_names() #564

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ Authors@R: c(
person("Ryan", "Knight", , "[email protected]", role = "ctb"),
person("Malte", "Grosser", , "[email protected]", role = "ctb"),
person("Jonathan", "Zadra", , "[email protected]", role = "ctb"),
person("Olivier", "Roy", role = "ctb")
person("Olivier", "Roy", role = "ctb"),
person("Josep", family = "Pueyo-Ros", email = "[email protected]", role = "ctb")
)
Description: The main janitor functions can: perfectly format data.frame
column names; provide quick counts of variable combinations (i.e.,
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ These are all minor breaking changes resulting from enhancements and are not exp

* The new function `excel_time_to_numeric()` converts times from Excel that do not have accompanying dates into a number of seconds. (#245, thanks to **@billdenney** for the feature.)

* A new argument `set_labels` to `clean_names()` stores the old names as labels in each column. Variable labels are visualized in Rstudio's data viewer or used by default by some packages such as `gt` instead of variable names. Labels can also be used in ggplot labels thanks to the function `easy_labs()` in the `ggeasy` package. Read this wonderful [post](https://www.pipinghotdata.com/posts/2022-09-13-the-case-for-variable-labels-in-r/) for more info about column labels. (#563, thanks to **@jospueyo** for the feature).

## Bug fixes

* `adorn_totals("row")` now succeeds if the new `name` of the totals row is already a factor level of the input data.frame (#529, thanks @egozoglu for reporting).
Expand Down
22 changes: 19 additions & 3 deletions R/clean_names.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#' (characters) to "u".
#'
#' @param dat The input `data.frame`.
#' @param set_labels If set to `TRUE`, old names are stored as labels in each column of the returned data.frame.
#' @inheritDotParams make_clean_names -string
#' @return A `data.frame` with clean names.
#'
Expand All @@ -32,7 +33,8 @@
#' support using `clean_names()` on `sf` and `tbl_graph` (from
#' `tidygraph`) objects as well as on database connections through
#' `dbplyr`. For cleaning other named objects like named lists
#' and vectors, use `make_clean_names()`.
#' and vectors, use `make_clean_names()`. When `set_labels` is set to `TRUE`, the old names,
#' stored as column labels, can be restored using `sjlabelled::label_to_colnames()`.
#'
#' @export
#' @family Set names
Expand Down Expand Up @@ -71,7 +73,7 @@ clean_names <- function(dat, ...) {

#' @rdname clean_names
#' @export
clean_names.default <- function(dat, ...) {
clean_names.default <- function(dat, ..., set_labels = FALSE) {
if (is.null(names(dat)) && is.null(dimnames(dat))) {
stop(
"`clean_names()` requires that either names or dimnames be non-null.",
Expand All @@ -81,14 +83,21 @@ clean_names.default <- function(dat, ...) {
if (is.null(names(dat))) {
dimnames(dat) <- lapply(dimnames(dat), make_clean_names, ...)
} else {
if (set_labels){
old_names <- names(dat)
for (i in seq_along(old_names)){
attr(dat[[i]], "label") <- old_names[[i]]
}
}
names(dat) <- make_clean_names(names(dat), ...)

}
dat
}

#' @rdname clean_names
#' @export
clean_names.sf <- function(dat, ...) {
clean_names.sf <- function(dat, ..., set_labels = FALSE) {
sfirke marked this conversation as resolved.
Show resolved Hide resolved
if (!requireNamespace("sf", quietly = TRUE)) { # nocov start
stop(
"Package 'sf' needed for this function to work. Please install it.",
Expand All @@ -103,6 +112,12 @@ clean_names.sf <- function(dat, ...) {
sf_cleaned <- make_clean_names(sf_names[1:n_cols], ...)
# rename original df
names(dat)[1:n_cols] <- sf_cleaned

if(set_labels){
for (i in seq_along(sf_names[1:n_cols])){
attr(dat[[i]], "label") <- sf_names[[i]]
}
}

return(dat)
}
Expand All @@ -116,6 +131,7 @@ clean_names.tbl_graph <- function(dat, ...) {
call. = FALSE
)
} # nocov end

dplyr::rename_all(dat, .funs = make_clean_names, ...)
}

Expand Down
9 changes: 6 additions & 3 deletions man/clean_names.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/janitor-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

40 changes: 40 additions & 0 deletions tests/testthat/test-clean-names.R
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,22 @@ test_that("do not create duplicates (fix #251)", {
)
})

test_that("labels are created in default method (feature request #563)", {
dat_df <- dplyr::tibble(`a a` = c(11, 22), `b b` = c(2, 3))
dat_df_clean_labels <- clean_names(dat_df, set_labels = TRUE)
dat_df_clean <- clean_names(dat_df)

for (i in seq_along(names(dat_df))){
# check that old names are saved as labels when set_labels is TRUE
expect_equal(attr(dat_df_clean_labels[[i]], "label"), names(dat_df)[[i]])
# check that old names are not stored if set_labels is not TRUE
expect_null(attr(dat_df_clean[[i]], "label"))
}

# expect names are always cleaned
expect_equal(names(dat_df_clean), c("a_a", "b_b"))
expect_equal(names(dat_df_clean_labels), c("a_a", "b_b"))
})

test_that("allow for duplicates (fix #495)", {
expect_equal(
Expand Down Expand Up @@ -578,6 +594,30 @@ test_that("Tests for cases beyond default snake for sf objects", {
)
})

test_that("labels are created in sf method (feature request #563)", {
skip_if_not_installed("sf")

dat_df <- dplyr::tibble(`a a` = c(11, 22), `b b` = c(2, 3))
dat_sf <- dat_df
dat_sf$x <- c(1,2)
dat_sf$y <- c(1,2)
dat_sf <- sf::st_as_sf(dat_sf, coords = c("x", "y"))
dat_sf_clean_labels <- clean_names(dat_sf, set_labels = TRUE)
dat_sf_clean <- clean_names(dat_sf)

for (i in seq_along(names(dat_df))){
# check that old names are saved as labels when set_labels is TRUE
expect_equal(attr(dat_sf_clean_labels[[i]], "label"), names(dat_sf)[[i]])

# check that old names are not stored if set_labels is not TRUE
expect_null(attr(dat_sf_clean[[i]], "label"))
}
# expect names are always cleaned
expect_equal(names(dat_sf_clean), c("a_a", "b_b", "geometry"))
expect_equal(names(dat_sf_clean_labels), c("a_a", "b_b", "geometry"))
})


#------------------------------------------------------------------------------#
#------------------------ Tests for tbl_graph method --------------------------#####
#------------------------------------------------------------------------------#
Expand Down
Loading