-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Scalars are too verbose in column name output #15395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Created an issue to ask what others think - the change is quite simple, but maybe there were previous discussions or consequences I don't see? |
for context duckdb:
|
Thank you Bruce!!! Btw adding column type (second line of the first row) can actually be a good thing to implement separately |
If we have column type, we don't need to display type for inner elements. Maybe we can work on column type first? |
Here is a related ticket:
I think @irenjj added similar code recently to support new duckdb style However, it seems like the display is still somewhat 🤮 (I think b/c the column name is bad) > explain format tree select 3, array_length([1, 2, 4, 5, 10000, 2.4]);;
+---------------+-------------------------------+
| plan_type | plan |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ ProjectionExec │ |
| | │ -------------------- │ |
| | │ Int64(3): 3 │ |
| | │ │ |
| | │ array_length(make_array │ |
| | │ (Int64(1),Int64(2) │ |
| | │ ,Int64(4),Int64(5) │ |
| | │ ,Int64(10000),Float64 │ |
| | │ (2.4))): │ |
| | │ 6 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ PlaceholderRowExec │ |
| | └───────────────────────────┘ |
| | |
+---------------+-------------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.
Perhaps we should switch to |
|
Thank you!!! That's fair, I've created a separate ticket: #15442 |
We can potentially use |
I think it is a better choice too The challenge is that it will change the schema of the output (the output column name will change) as will the internal column names |
Is your feature request related to a problem or challenge?
When parsing scalar expressions, DataFusion makes queries quite complicated:
Such detailed comparison isn't needed in most cases and adds unnecessary cognitive complexity for developers. Moreover, it also breaks the original query - if you put
select Int64(3)
, you'll getError during planning: Invalid function 'int64'
Describe the solution you'd like
In most cases, the simplest version will be enough. On the example above it is:
IMO there are three ways how to fix this problem:
:?
here for all casesdatafusion/datafusion/expr/src/expr.rs
Line 2951 in 0ff8984
This will make queries simpler everywhere. The downside is that we lose some information, e.g., type info in the shell (which one can argue isn't needed and can be checked separately)
ConfigOptions
param and make it changeable (via datafusion-cli or sdk).Int64(0)
, output0
(since parsing0
results inInt64(0)
). When formattingInt32(0)
, keepInt32(0)
.Additional context
May be quite easy to update after we finish #15178
The text was updated successfully, but these errors were encountered: