Skip to content

first() doesn't work in summaries #129

Closed
@dgrtwo

Description

@dgrtwo

The first function in dbplyr returns whatever expression it's given as a string, instead of returning the first value (which should respect previous group_by() and arrange().

It was discovered in this Stack Overflow question. That table (and the version I reproduced locally) is PostgreSQL, but the problem can be reproduced in SQLlite:

library(dplyr)

batting <- tbl(dbplyr::lahman_sqlite(), "Batting")

tbl(dbplyr::lahman_sqlite(), "Batting") %>%
  arrange(yearID) %>%
  group_by(playerID) %>%
  summarize(first = first(G))

Returns:

# Source:   lazy query [?? x 2]
# Database: sqlite 3.22.0 [/var/folders/8p/xzrrqphx2qb3d2s_fgqrk5xr0000gn/T//Rtmpe0JeOa/lahman.sqlite]
   playerID  first
   <chr>     <chr>
 1 aardsda01 `G`  
 2 aaronha01 `G`  
 3 aaronto01 `G`  
 4 aasedo01  `G`  
 5 abadan01  `G`  
 6 abadfe01  `G`  
 7 abadijo01 `G`  
 8 abbated01 `G`  
 9 abbeybe01 `G`  
10 abbeych01 `G`  
# ... with more rows

This matches with the result of show_query() on the above.

<SQL>
SELECT `playerID`, '`G`' AS `first`
FROM (SELECT *
FROM `Batting`
ORDER BY `yearID`)
GROUP BY `playerID`

It has a similar behavior:

  • If it's given first(blabla), or another expression that's not a column
  • If it's done without an arrange or a group_by() (it can thus be reproduced with just tbl(dbplyr::lahman_sqlite(), "Batting") %>% summarize(first(G)))

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behaviorfunc trans 🌍Translation of individual functions to SQL

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions