Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to generate/evaluate code for variables ending in ranges of numbers #79

Open
schroeder-matt opened this issue Apr 15, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@schroeder-matt
Copy link

One of the few things that's easier to do in SAS than in R is summing across ranges of variables by number. In SAS, sum(of v1-v4) is interpreted as v1 + v2 + v3 + v4 -- but R has nothing similar that I'm aware of. This is unfortunate because so many Census Bureau datasets have this naming scheme, and many times I need to add up several variables at once.

I created the attached function (building on the framework developed by former Research staff Nicole Sullivan), but I store it in multiple repositories. I figured it would be good to have this in councilR so that it lives in only one place and potentially help others.

sumRange.txt

Is this something that would be a good fit for councilR? If so, please feel free to make edits to anything and everything, because I am not an expert in designing functions. I also made this with Census Bureau data in mind, so you may find opportunities to generalize the code for other uses.

@schroeder-matt schroeder-matt added the enhancement New feature or request label Apr 15, 2024
@eroten
Copy link
Collaborator

eroten commented Apr 15, 2024

Well done with the function! Excellent use of the :: and parameter documentation. I'm pasting here for quicker reference

Do you think across() could work for your needs? It is a newer feature in the tidyverse. I've used it in some code here .

If across doesn't work, can you help me understand how this function is works differently?

#' @title Generate code to sum across multiple cells
#'
#' @param .table character, prefix for table
#' @param .start first cell in range to be summed
#' @param .end last cell in range to be summed
#' @param ...
#' @param .int numeric, interval between cells. Default is `1`
#' @param .width numeric, pad cell numbers with 0s to this length. Default is `1`.
#' @param repeatTimes numeric, number of times to repeat the sequence. Default
#'     is `1`.
#' @param repeatOffset numeric, jump by this number of cells each time the
#'     sequence repeats. Default is `0`.
#'
#' @return
#' @export
#'
#' @examples
#' #### sumRange("B01234e", 2, 6) --> B01234e2 + B01234e3 + B01234e4 + B01234e5 + B01234e6
#' #### sumRange("B01234e", 2, 6, .int=2) --> B01234e2 + B01234e4 + B01234e6
#' #### sumRange("B01234e", 2, 6, .int=2, .width=3) --> B01234e002 + B01234e004 + B01234e006
#' #### sumRange("B01234e", 2, 6, .int=2, .width=3, repeatTimes=2, repeatOffset=10) -->
#' B01234e002 + B01234e004 + B01234e006 + B01234e012 + B01234e014 + B01234e016
#' #### can be used in dplyr::mutate(!!sumRange())

sumRange <- function(.table,
                     .start,
                     .end,
                     ...,
                     .int = 1,
                     .width = 1,
                     repeatTimes = 1,
                     repeatOffset = 0) {

  if (repeatTimes == 1) { # if no repetition is needed, just use simple code
    a <- paste0(rep(.table),
                stringr::str_pad(seq(from = .start,
                                     to = .end,
                                     by = .int), width = .width, pad = "0"),
                collapse = " + ") # and this links each table/cell number combination with a "+"
    rlang::parse_expr(a)
  } else { # otherwise, repeat as many times as requested in argument, putting elements into a list
    a <- purrr::map(1:repeatTimes, ~ paste0(rep(.table),
                                            stringr::str_pad(seq(from = .start + (repeatOffset * (.x - 1)),
                                                                 to = .end + (repeatOffset * (.x - 1)),
                                                                 by = .int), width = .width, pad = "0"),
                                            collapse = " + ") # and this links each table/cell number combination with a "+"
    )
    # then we just have to combine the list elements (one per repetition)
    rlang::parse_expr(paste0(a, collapse = " + "))
  }
}

@schroeder-matt
Copy link
Author

schroeder-matt commented Apr 15, 2024

Sure! across() is a simpler way to transform/create multiple variables at once; this function is a simpler way to add up multiple variables at once (in order to transform/create a single variable). An example is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants