Scalars are too verbose in column name output #15395

blaginin · 2025-03-24T18:58:18Z

Is your feature request related to a problem or challenge?

When parsing scalar expressions, DataFusion makes queries quite complicated:

>  select 3, array_length([1, 2, 4, 5, 10000, 2.4]);;
+----------+-----------------------------------------------------------------------------------------+
| Int64(3) | array_length(make_array(Int64(1),Int64(2),Int64(4),Int64(5),Int64(10000),Float64(2.4))) |
+----------+-----------------------------------------------------------------------------------------+
| 3        | 6                                                                                       |
+----------+-----------------------------------------------------------------------------------------+

Such detailed comparison isn't needed in most cases and adds unnecessary cognitive complexity for developers. Moreover, it also breaks the original query - if you put select Int64(3), you'll get Error during planning: Invalid function 'int64'

Describe the solution you'd like

In most cases, the simplest version will be enough. On the example above it is:

> select 3, array_length([1, 2, 4, 5, 10000, 2.4]);
+---+---------------------------------------------+
| 3 | array_length(make_array(1,2,4,5,10000,2.4)) |
+---+---------------------------------------------+
| 3 | 6                                           |
+---+---------------------------------------------+

IMO there are three ways how to fix this problem:

Just remove :? here for all cases

datafusion/datafusion/expr/src/expr.rs

Line 2951 in 0ff8984

Expr::Literal(v) => write!(f, "{v:?}"),

This will make queries simpler everywhere. The downside is that we lose some information, e.g., type info in the shell (which one can argue isn't needed and can be checked separately)
Disable verbose as a ConfigOptions param and make it changeable (via datafusion-cli or sdk).
Use short names only if parsing back preserves the correct type. For example, when formatting Int64(0), output 0 (since parsing 0 results in Int64(0)). When formatting Int32(0), keep Int32(0).

Additional context

May be quite easy to update after we finish #15178

The text was updated successfully, but these errors were encountered:

blaginin · 2025-03-24T18:59:49Z

Created an issue to ask what others think - the change is quite simple, but maybe there were previous discussions or consequences I don't see?

Omega359 · 2025-03-24T21:13:07Z

for context duckdb:

D select 3, array_length([1, 2, 4, 5, 10000, 2.4]);
┌───────┬───────────────────────────────────────────────────────┐
│   3   │ array_length(main.list_value(1, 2, 4, 5, 10000, 2.4)) │
│ int32 │                         int64                         │
├───────┼───────────────────────────────────────────────────────┤
│   3   │                           6                           │
└───────┴───────────────────────────────────────────────────────┘

blaginin · 2025-03-24T21:16:05Z

Thank you Bruce!!! Btw adding column type (second line of the first row) can actually be a good thing to implement separately

jayzhan211 · 2025-03-24T23:46:00Z

If we have column type, we don't need to display type for inner elements. Maybe we can work on column type first?

alamb · 2025-03-25T20:48:14Z

Here is a related ticket:

beautify default column names #2027

Created an issue to ask what others think - the change is quite simple, but maybe there were previous discussions or consequences I don't see?

I think @irenjj added similar code recently to support new duckdb style tree explain plans via Expr::sql_name:

Simplify display format of AggregateFunctionExpr, add Expr::sql_name #15253

However, it seems like the display is still somewhat 🤮 (I think b/c the column name is bad)

> explain format tree select 3, array_length([1, 2, 4, 5, 10000, 2.4]);;
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │       ProjectionExec      │ |
|               | │    --------------------   │ |
|               | │        Int64(3): 3        │ |
|               | │                           │ |
|               | │  array_length(make_array  │ |
|               | │     (Int64(1),Int64(2)    │ |
|               | │     ,Int64(4),Int64(5)    │ |
|               | │   ,Int64(10000),Float64   │ |
|               | │          (2.4))):         │ |
|               | │             6             │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │     PlaceholderRowExec    │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

Perhaps we should switch to tree by default as well as use sql_name for the column name 🤔

[DISCUSS] Switch to tree explain by default #15343

irenjj · 2025-03-26T03:18:24Z

Here is a related ticket:

beautify default column names #2027

Created an issue to ask what others think - the change is quite simple, but maybe there were previous discussions or consequences I don't see?

I think @irenjj added similar code recently to support new duckdb style tree explain plans via Expr::sql_name:

Simplify display format of AggregateFunctionExpr, add Expr::sql_name #15253

However, it seems like the display is still somewhat 🤮 (I think b/c the column name is bad)

explain format tree select 3, array_length([1, 2, 4, 5, 10000, 2.4]);;
+---------------+-------------------------------+
| plan_type | plan |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ ProjectionExec │ |
| | │ -------------------- │ |
| | │ Int64(3): 3 │ |
| | │ │ |
| | │ array_length(make_array │ |
| | │ (Int64(1),Int64(2) │ |
| | │ ,Int64(4),Int64(5) │ |
| | │ ,Int64(10000),Float64 │ |
| | │ (2.4))): │ |
| | │ 6 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ PlaceholderRowExec │ |
| | └───────────────────────────┘ |
| | |
+---------------+-------------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.
Perhaps we should switch to tree by default as well as use sql_name for the column name 🤔

[DISCUSS] Switch to tree explain by default #15343

sql_name only works in aggr func, need more effort for projection, window, ...

blaginin · 2025-03-26T17:26:36Z

If we have column type, we don't need to display type for inner elements. Maybe we can work on column type first?

Thank you!!! That's fair, I've created a separate ticket: #15442

blaginin · 2025-03-26T17:29:42Z

I think @irenjj added similar code recently to support new duckdb style tree explain plans via Expr::sql_name

We can potentially use human_display instead of schema_name when creating schemas. Though tbf I really like the idea of always rendering data in a more compact way (the first option from my list) - it should be even easier to implement 🤗

alamb · 2025-03-26T20:24:04Z

always rendering data in a more compact way (the first option from my list) -

I think it is a better choice too

The challenge is that it will change the schema of the output (the output column name will change) as will the internal column names

blaginin added the enhancement New feature or request label Mar 24, 2025

alamb changed the title ~~Scalars are too verbose~~ Scalars are too verbose in column name output Mar 25, 2025

This was referenced Mar 26, 2025

Add an option to display column types in the table #15442

Closed

Add an option to show column type apache/arrow-rs#7335

Merged

blaginin mentioned this issue Apr 21, 2025

Add FormatOptions to Config #15793

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalars are too verbose in column name output #15395

Scalars are too verbose in column name output #15395

blaginin commented Mar 24, 2025 •

edited

Loading

blaginin commented Mar 24, 2025

Omega359 commented Mar 24, 2025

blaginin commented Mar 24, 2025

jayzhan211 commented Mar 24, 2025

alamb commented Mar 25, 2025

irenjj commented Mar 26, 2025

blaginin commented Mar 26, 2025

blaginin commented Mar 26, 2025

alamb commented Mar 26, 2025

Scalars are too verbose in column name output #15395

Scalars are too verbose in column name output #15395

Comments

blaginin commented Mar 24, 2025 • edited Loading

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Additional context

blaginin commented Mar 24, 2025

Omega359 commented Mar 24, 2025

blaginin commented Mar 24, 2025

jayzhan211 commented Mar 24, 2025

alamb commented Mar 25, 2025

irenjj commented Mar 26, 2025

blaginin commented Mar 26, 2025

blaginin commented Mar 26, 2025

alamb commented Mar 26, 2025

blaginin commented Mar 24, 2025 •

edited

Loading