-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommended way to split data by a variable, apply a function, and return a bound dataframe (i.e., a non-experimental do()
replacement)
#7666
Comments
I think the slight weirdness for you is that you'd like access to the metadata about the current group inside your function. You can get access to that with library(dplyr, warn.conflicts = FALSE)
fun <- function(data, group) {
if (group$Species == "setosa") {
tail(data, n = 3) |> select(Petal.Length)
} else {
head(data, n = 3) |> select(Petal.Length)
}
}
iris |>
reframe(.by = Species, {
fun(pick(everything()), cur_group())
})
#> Species Petal.Length
#> 1 setosa 1.4
#> 2 setosa 1.5
#> 3 setosa 1.4
#> 4 versicolor 4.7
#> 5 versicolor 4.5
#> 6 versicolor 4.9
#> 7 virginica 6.0
#> 8 virginica 5.1
#> 9 virginica 5.9 I think we are fairly confident that |
hi -- a few options one you can duplicate the Species column and then pass through your function as is-- example below
Alternatively you can slightly modify your function so that you pass through pass the data and the nest_by column
|
one other method using map() -- just need to slightly alter your function to return the Species name
|
Hi folks, Thanks very much for your replies! I was hoping there'd be a one-to-one I think Davis's suggestion may be the way to go; thanks for your help! Best, |
Hi folks,
My question hinges on this sort of situation. Obviously this example is pretty artificial, but its a situation in which you have a function which acts on a dataframe, returns a dataframe, doesn't return grouping variables, but might need access to them.
Created on 2025-03-01 with reprex v2.1.1
do()
can deal with this admirably, but I'm unsure what the modern equivalent is.purrr::map()
doesn't behave the same because it drops the group variables, so you don't know what is what:nest()
also doesn't really work because the nested dataframe has no access to theSpecies
column:group_modify()
is recommended by thedo()
documentation, but it is experimental, so I don't want to put it in packages until its stable. It also doesn't use.by
, which may imply this isn't a function that's going to be taken forward? It seems to be the closest thing, however.reframe()
I believe can only act on columns, so I don't think that's quite right either? It is also "experimental".The answer may lie in
pick()
, but I'm not quite sure how to apply it to this specific use case. It also seems to not 'nest' the grouping variables, so it has the same issue as the abovenest()
example.So I'm at a bit of a loss as to what the new
do()
actually is!Cheers.
The text was updated successfully, but these errors were encountered: