Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior in filter() when column name matches an environment variable #6885

Closed
bzbradford opened this issue Jul 19, 2023 · 2 comments

Comments

@bzbradford
Copy link

A friend and I have both encountered unexpected behavior from filter() and struggled to debug it until realizing that filter() was comparing a column against itself, instead of against a variable of the same name that had been defined elsewhere in the environment (for example, as a function argument).

library(dplyr)

# mtcars has 32 rows originally
nrow(mtcars)
# > [1] 32

# should show that 11 cars have 4 cylinders
mtcars |> filter(cyl == 4) |> nrow()
# > [1] 11

# should also show that 11 cars have 4 cylinders
cyl <- 4
mtcars |> filter(cyl == cyl) |> nrow()
# > [1] 32

Possible solutions:

  • filter() warns when comparing a column against itself (which would return all rows), much like it warns when using a single = instead of double equal ==
  • filter() first looks to the environment for a defined variable of the same name as the column when a column is possibly compared against itself
@moodymudskipper
Copy link

See ?rlang::dot-data, you might do :

mtcars |> filter(cyl == .env$cyl)

This is often used too :

mtcars |> filter(cyl == !!cyl)

That being said, since cyl == cyl never makes sense in filter, it could warn or even fail, because this issue is very common.

@DavisVaughan
Copy link
Member

I think it would be tough to correctly and robustly detect these cases, so I don't think there is much we can do about this. I'm not sure cyl == cyl is really common enough to warrant an extra check for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants