Closed
Description
The first
function in dbplyr returns whatever expression it's given as a string, instead of returning the first value (which should respect previous group_by()
and arrange()
.
It was discovered in this Stack Overflow question. That table (and the version I reproduced locally) is PostgreSQL, but the problem can be reproduced in SQLlite:
library(dplyr)
batting <- tbl(dbplyr::lahman_sqlite(), "Batting")
tbl(dbplyr::lahman_sqlite(), "Batting") %>%
arrange(yearID) %>%
group_by(playerID) %>%
summarize(first = first(G))
Returns:
# Source: lazy query [?? x 2]
# Database: sqlite 3.22.0 [/var/folders/8p/xzrrqphx2qb3d2s_fgqrk5xr0000gn/T//Rtmpe0JeOa/lahman.sqlite]
playerID first
<chr> <chr>
1 aardsda01 `G`
2 aaronha01 `G`
3 aaronto01 `G`
4 aasedo01 `G`
5 abadan01 `G`
6 abadfe01 `G`
7 abadijo01 `G`
8 abbated01 `G`
9 abbeybe01 `G`
10 abbeych01 `G`
# ... with more rows
This matches with the result of show_query()
on the above.
<SQL>
SELECT `playerID`, '`G`' AS `first`
FROM (SELECT *
FROM `Batting`
ORDER BY `yearID`)
GROUP BY `playerID`
It has a similar behavior:
- If it's given
first(blabla)
, or another expression that's not a column - If it's done without an arrange or a
group_by()
(it can thus be reproduced with justtbl(dbplyr::lahman_sqlite(), "Batting") %>% summarize(first(G))
)