Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input containing rename, sort, filter, join results in output ORDER BY referring to wrong table #5097

Open
2 tasks done
kgutwin opened this issue Jan 18, 2025 · 1 comment · May be fixed by #5098
Open
2 tasks done
Labels
bug Invalid compiler output or panic

Comments

@kgutwin
Copy link
Collaborator

kgutwin commented Jan 18, 2025

What happened?

A user built a seemingly simple PRQL query that performs the following operations, in this order:

  • Rename a column using select
  • Sort by that renamed column
  • Filter by that column
  • Join another table on a different column

The output SQL, when passed to DuckDB, produces:

Error: Binder Error: Referenced table "table_1" not found!
Candidate tables: "table_0"
LINE 25:   table_1."AA"

The relevant part of the output SQL is:

SELECT
  table_0."AA",
  table_0.artist_id,
  artists.*
FROM
  table_0
  JOIN artists ON table_0.artist_id = artists.artist_id
ORDER BY
  table_1."AA"

Note that the ORDER BY references table_1, which is referenced in the WITH clause but not in the final SELECT statement.

If you change the order or remove any of the steps, the generated output is correct. In particular, if you toggle the order of the sort and filter so that filter precedes sort, the output is correct.

The PRQL input below, when put into the Playground, will reproduce the error (click the Query Results tab).

PRQL input

from albums
select { AA=album_id, artist_id }
sort AA
filter AA >= 25
join artists (==artist_id)

SQL output

WITH table_1 AS (
  SELECT
    album_id AS "AA",
    artist_id
  FROM
    albums
),
table_0 AS (
  SELECT
    "AA",
    artist_id
  FROM
    table_1
  WHERE
    "AA" >= 25
)
SELECT
  table_0."AA",
  table_0.artist_id,
  artists.*
FROM
  table_0
  JOIN artists ON table_0.artist_id = artists.artist_id
ORDER BY
  table_1."AA"

-- Generated by PRQL compiler version:0.13.2 (https://prql-lang.org)

Expected SQL output

WITH table_1 AS (
  SELECT
    album_id AS "AA",
    artist_id
  FROM
    albums
),
table_0 AS (
  SELECT
    "AA",
    artist_id
  FROM
    table_1
  WHERE
    "AA" >= 25
)
SELECT
  table_0."AA",
  table_0.artist_id,
  artists.*
FROM
  table_0
  JOIN artists ON table_0.artist_id = artists.artist_id
ORDER BY
  table_0."AA"  -- note table_0 rather than table_1

-- Generated by PRQL compiler version:0.13.2 (https://prql-lang.org)

MVCE confirmation

  • Minimal example
  • New issue

Anything else?

No response

@kgutwin kgutwin added the bug Invalid compiler output or panic label Jan 18, 2025
@kgutwin
Copy link
Collaborator Author

kgutwin commented Jan 18, 2025

Looks like this is likely a bug in the 'sort propagates to main pipeline' part of prqlc/src/sql/pq/postprocess.rs, as implemented in #2562 and discussed in detail in #1363. Investigating further...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Invalid compiler output or panic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant