Wrong output when tables are used and columns have same name #4752

cottrell · 2024-07-17T19:03:41Z

What happened?

Deriving columns with same name as existing results in confusing or wrong output.

PRQL input

from invoices
derive {
  total = invoices.total * 0,
  a = invoices.total * 0,
}
into data

from data
select {
total,
data.total,
a
}

SQL output

WITH data AS (
  SELECT
    *,
    total * 0 AS total,
    total * 0 AS a
  FROM
    invoices
)
SELECT
  total,
  total,
  a
FROM
  data

-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)

Expected SQL output

No response

MVCE confirmation

Minimal example
New issue

Anything else?

No response

max-sixty · 2024-07-18T21:52:50Z

Our semantics should be a bit clearer here, but what would you want the output to be? Currently PRQL returns:

SELECT
  total,
  total,
  a

...based on

total,
data.total,
a

What would we expect? ("Expected SQL Output" is useful there!)

cottrell · 2024-07-19T07:42:46Z

Actually, is this a problem with DuckDB? Or with PRQL?

max-sixty · 2024-07-19T07:48:15Z

What should the SQL be?

cottrell · 2024-07-19T07:51:27Z

I think the problem is with the select * and, at least in DuckDB, it appears to be order dependent what you get as an answer.

The following (putting the '*' last seems to achieve the correct results.

WITH data AS (
  SELECT
    total * 0 AS total,
    total * 0 AS a,
   *
  FROM
    invoices
)
SELECT
  total,
  total,
  a
FROM
  data

Additionally, this PRQL works properly

from invoices
select {
total
}
derive {
  total = invoices.total * 0,
  a = invoices.total * 0,
}
into data

from data
select {
total,
data.total,
a
}

with sql

WITH data AS (
  SELECT
    total AS _expr_0,
    total * 0 AS total,
    total * 0 AS a
  FROM
    invoices
)
SELECT
  total,
  total,
  a
FROM
  data

-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)

I think the select * should not be there as you know the columns you need if you crawl the DAG. At minimum the * should appear last I think but I'm not sure if that behaviour is the same across SQL dialects.

aljazerzen · 2024-07-21T16:25:51Z

This is a bug.

(I hate wildcard columns)

max-sixty · 2024-07-21T18:08:19Z

We discussed this on the call. It's not an easy problem from the compiler perspective since it doesn't know whether it's creating duplicate columns, we'll need to think more.

For a workaround — replacing derive with select there should work. (Doesn't solve the underlying issue in PRQL, but should fix the specific case)

aljazerzen · 2024-07-21T18:26:50Z

Minimal example:

from invoices
derive {
  total = total * 0,
}
into data

from data
select {
  total,
}

... produces:

WITH data AS (
  SELECT
    *,
    total * 0 AS total
  FROM
    invoices
)
SELECT
  total
FROM
  data;

The problem is that CTE data will two columns named total: one from invoices and one defined in the SELECT clause. When using total in the main query:

DuckDB will use the first column named total and ignore the second one,
PostgreSQL will raise "ambiguous name" error.

aljazerzen · 2024-07-21T18:57:21Z

A correct SQL output to the above input would be:

WITH data AS (
  SELECT
    * EXCLUDE (total),  -- exclude total from wildcard
    total as _expr_0,   -- include total under a different name
    total * 0 AS total  -- this column will now be the only column named total
  FROM
    invoices
)
SELECT
  total
FROM
  data;

Unfortunately this only works for SQL dialects that support a way of excluding columns from a wildcard.

With say PostgreSQL, this is not possible to express because:

we don't know all of the column of invoice, so we need to use wildcard select, which will include column total,
we need to also include the computed column aliased to total,
this produces a relation with multiple columns named total so we cannot refer to them properly.

cottrell · 2024-07-22T12:31:43Z

I think for everyone's benefit, it would be good to lean hard into "please never use wildcards in anything that matters". It really isn't that hard to use strict column naming but everyone needs to be reminded to do it.

If most things work in wildcard mode that is convenient for quick tests and development, but really with PRQL and what it offers, you are going to suffer if you forget to remove wildcards.

Like "strict mode" or something might be reasonable branding.

I would even perhaps enjoy the ability to have some kind of flavour or option I could add to disallow wildcards in the query as this would remind me and throw if I break the rule.

aljazerzen · 2024-07-22T13:18:33Z

That's a good idea. A strict mode for PRQL where it will not infer table types and deal with "unknown columns", but instead require that the table types be specified by the user.

This can already be done in current versions of PRQL:

module default_db {
  let invoices <[{
    invoice_id = int,
    customer_id = int,
    invoice_date = date,
    total = float,
  }]>
}

from default_db.invoices
derive {
  total = total * 0,
}
into data

from data
select {
  total,
}

... this will produce correct SQL, that will return total of all 0, as it should.

cottrell added the bug Invalid compiler output or panic label Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong output when tables are used and columns have same name #4752

Wrong output when tables are used and columns have same name #4752

cottrell commented Jul 17, 2024

max-sixty commented Jul 18, 2024 •

edited

Loading

cottrell commented Jul 19, 2024

max-sixty commented Jul 19, 2024

cottrell commented Jul 19, 2024

aljazerzen commented Jul 21, 2024

max-sixty commented Jul 21, 2024

aljazerzen commented Jul 21, 2024

aljazerzen commented Jul 21, 2024

cottrell commented Jul 22, 2024 •

edited

Loading

aljazerzen commented Jul 22, 2024 •

edited

Loading

Wrong output when tables are used and columns have same name #4752

Wrong output when tables are used and columns have same name #4752

Comments

cottrell commented Jul 17, 2024

What happened?

PRQL input

SQL output

Expected SQL output

MVCE confirmation

Anything else?

max-sixty commented Jul 18, 2024 • edited Loading

cottrell commented Jul 19, 2024

max-sixty commented Jul 19, 2024

cottrell commented Jul 19, 2024

aljazerzen commented Jul 21, 2024

max-sixty commented Jul 21, 2024

aljazerzen commented Jul 21, 2024

aljazerzen commented Jul 21, 2024

cottrell commented Jul 22, 2024 • edited Loading

aljazerzen commented Jul 22, 2024 • edited Loading

max-sixty commented Jul 18, 2024 •

edited

Loading

cottrell commented Jul 22, 2024 •

edited

Loading

aljazerzen commented Jul 22, 2024 •

edited

Loading