Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap f_* functions as methods #1

Open
chainsawriot opened this issue Sep 10, 2024 · 5 comments
Open

Wrap f_* functions as methods #1

chainsawriot opened this issue Sep 10, 2024 · 5 comments

Comments

@chainsawriot
Copy link

As most dplyr functions are S3 methods (to be used for other backends, such as data.table for dtplyr and duckdb for duckplyr)

https://github.com/tidyverse/dplyr/blob/1d17672a54305170dc75c251f8ae69a85c0bea37/R/distinct.R#L68-L70

https://duckplyr.tidyverse.org/reference/as_duckplyr_df.html

I was thinking it would be possible to create a new S3 class, e.g. fastplyr, and wrap all f_* functions as *.fastplyr, e.g. f_distinct as distinct.fastplyr.

Ultimately, one would then use this package almost as a drop-in replacement of dplyr.

## assuming as_fastplyr() is a function to attach the fastplyr class to a data.frame
flights |> as_fastplyr() |>
  distinct(origin, dest)

If you think that is a good idea, I could explore a bit.

@NicChr
Copy link
Owner

NicChr commented Sep 10, 2024

Thanks for the comments, I like the idea though I'll need to think a bit more about that one!
Definitely open to exploring this, in the meantime here are some of my thoughts:

Part of my motivation was to avoid having the user to perform conversions, hence why I opted for alternative names that were explicit in the fact that they're different from dplyr. One could argue that writing methods like, e.g. distinct.fastplyr might obscure that a different function entirely is being used.
Also this has the nice advantage that users looking to convert their existing code can literally find + replace e.g. distinct with f_distinct very easily, whereas adding a as_fastplyr() line everywhere is not as simple.

On the other hand I can see how this might be quite powerful and make the code more readable, especially to those familiar with dplyr already.

Regarding the implementation, I'm assuming it's similar to the tidytable package where all that's needed is to create a new tbl_df object, e.g. fastplyr_tbl_df?

I have just submitted 0.1.0 to CRAN (pending acceptance), so any changes like this will go in one of the next versions.

@chainsawriot
Copy link
Author

@NicChr Of course, you can keep all f_*() functions as they are. I think an extremely minimal implementation (not tested) would be

as_fastplyr <- function(.data, ...) {
  UseMethod("as_fastplyr")
}

as_fastplyr.data.frame <- function(.data) {
  if ("fastplyr" %in% class(.data)) {
    return(.data)
  }
  class(.data) <- c("fastplyr", class(.data))
  return(.data)
}

distinct.fastplyr <- function(...) {
  as_fastplyr(f_distinct(...))
}

@NicChr
Copy link
Owner

NicChr commented Sep 10, 2024

I'm not sure 'fastplyr' conveys that it's a type of tbl, maybe something like 'fp_tbl' or 'fastplyr_tbl'?

@NicChr
Copy link
Owner

NicChr commented Sep 10, 2024

As I started writing some of the code for these S3 wrappers I realised that since not all of the dplyr functions are s3 methods, e.g. bind_rows(), we would probably have to rename f_bind_rows to bind_rows and basically overwrite the dplyr methods. If we don't do that then users won't be aware that they're using the dplyr version of bind_rows.

@chainsawriot
Copy link
Author

Unfortunately, yes, you are right.

tidyverse/dplyr#6905

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants