-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[DISCUSS] Switch to tree
explain by default
#15343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For example, here is the plan from a recent query from #15177 (I actually had to trim it to fit in the 65k limit): > explain SELECT * from 'hits_partitioned' WHERE "WatchID" IN (SELECT "WatchID" FROM 'hits_partitioned' WHERE "URL" LIKE '%google%' ORDER BY "EventTime" LIMIT 10);
+---------------+---------
| plan_type | plan |
+---------------+----------
| logical_plan | LeftSemi Join: hits_partitioned.WatchID = __correlated_sq_1.WatchID |
| | TableScan: ... |
| | SubqueryAlias: __correlated_sq_1 |
| | Projection: hits_partitioned.WatchID |
| | Sort: hits_partitioned.EventTime ASC NULLS LAST, fetch=10 |
| | Projection: hits_partitioned.WatchID, hits_partitioned.EventTime |
| | Filter: CAST(hits_partitioned.URL AS Utf8View) LIKE Utf8View("%google%") |
| | TableScan: hits_partitioned projection=[WatchID, EventTime, URL], partial_filters=[CAST(hits_partitioned.URL AS Utf8View) LIKE Utf8View("%google%")] |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192 |
| | HashJoinExec: mode=Partitioned, join_type=RightSemi, on=[(WatchID@0, WatchID@0)] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=Hash([WatchID@0], 16), input_partitions=16 |
| | RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1 |
| | ProjectionExec: expr=[WatchID@0 as WatchID] |
| | SortPreservingMergeExec: [EventTime@1 ASC NULLS LAST], fetch=10 |
| | SortExec: TopK(fetch=10), expr=[EventTime@1 ASC NULLS LAST], preserve_partitioning=[true] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: CAST(URL@2 AS Utf8View) LIKE %google%, projection=[WatchID@0, EventTime@1] |
| | DataSourceExec: file_groups={16 groups: ... |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=Hash([WatchID@0], 16), input_partitions=16 |
| | DataSourceExec: file_groups={16 groups: ...file_type=parquet |
| | |
+---------------+--
2 row(s) fetched.
Elapsed 0.072 seconds. EXPLAIN FORMAT tree ... I think the > explain format tree SELECT * from 'hits_partitioned' WHERE "WatchID" IN (SELECT "WatchID" FROM 'hits_partitioned' WHERE "URL" LIKE '%google%' ORDER BY "EventTime" LIMIT 10);
+---------------+------------------------------------------------------------+
| plan_type | plan |
+---------------+------------------------------------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ CoalesceBatchesExec │ |
| | │ -------------------- │ |
| | │ target_batch_size: │ |
| | │ 8192 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ HashJoinExec │ |
| | │ -------------------- │ |
| | │ join_type: RightSemi │ |
| | │ ├──────────────┐ |
| | │ on: │ │ |
| | │ (WatchID = WatchID) │ │ |
| | └─────────────┬─────────────┘ │ |
| | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
| | │ CoalesceBatchesExec ││ CoalesceBatchesExec │ |
| | │ -------------------- ││ -------------------- │ |
| | │ target_batch_size: ││ target_batch_size: │ |
| | │ 8192 ││ 8192 │ |
| | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
| | │ RepartitionExec ││ RepartitionExec │ |
| | │ -------------------- ││ -------------------- │ |
| | │ output_partition_count: ││ output_partition_count: │ |
| | │ 16 ││ 16 │ |
| | │ ││ │ |
| | │ partitioning_scheme: ││ partitioning_scheme: │ |
| | │ Hash([WatchID@0], 16) ││ Hash([WatchID@0], 16) │ |
| | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
| | │ RepartitionExec ││ DataSourceExec │ |
| | │ -------------------- ││ -------------------- │ |
| | │ output_partition_count: ││ files: 115 │ |
| | │ 1 ││ format: parquet │ |
| | │ ││ │ |
| | │ partitioning_scheme: ││ │ |
| | │ RoundRobinBatch(16) ││ │ |
| | └─────────────┬─────────────┘└───────────────────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ ProjectionExec │ |
| | │ -------------------- │ |
| | │ WatchID: WatchID │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ SortPreservingMergeExec │ |
| | │ -------------------- │ |
| | │ EventTime ASC NULLS │ |
| | │ LASTlimit: │ |
| | │ 10 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ SortExec │ |
| | │ -------------------- │ |
| | │ EventTime@1 ASC NULLS LAST│ |
| | │ │ |
| | │ limit: 10 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ CoalesceBatchesExec │ |
| | │ -------------------- │ |
| | │ target_batch_size: │ |
| | │ 8192 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ FilterExec │ |
| | │ -------------------- │ |
| | │ predicate: │ |
| | │ CAST(URL AS Utf8View) LIKE│ |
| | │ %google% │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ DataSourceExec │ |
| | │ -------------------- │ |
| | │ files: 115 │ |
| | │ format: parquet │ |
| | │ │ |
| | │ predicate: │ |
| | │ CAST(URL AS Utf8View) LIKE│ |
| | │ %google% │ |
| | └───────────────────────────┘ |
| | |
+---------------+------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.083 seconds. |
tree
explain by defaulttree
explain by default
That's cool, |
Plus one for making it the default :) |
+1 |
1 similar comment
+1 |
I think this is very important. We found many bugs using the current, detailed plan print-outs. Let's make the tree mode default for the CLI experience (and tests verifying the CLI behavior), but all the other tests should keep using the current format. |
I agree with this, test coverage should not be weakened. Unless we ensure tree format displays at least as much detail as the ident format, test default should be ident format. |
I agree with others on the tests, let them continue to use the condensed format. Otherwise I agree with making the tree format the default. |
I agree to use the tree explanation format as the default display, as it is more concise. |
I wonder if we could just change the default format for |
Makes sense to change the default format for |
I agree with this change. The |
Here is a proposed PR: |
ss### Is your feature request related to a problem or challenge?
Thanks to @irenjj @zebsme @Standing-Man and others we have beautiful
tree
explains. SeeSQL EXPLAIN
Tree Rendering #14914However the current default explain plan that most users will see uses the classic
indent
mode which includes quite a bit of detail that requires expertise to see and understand the plan (see example below)Describe the solution you'd like
I would like to discuss using the tree mode by default (if you just run
EXPLAIN ...
) as that is the first introduction to DataFusion plans for many users.(and if course I have DuckDB envy many times)
Describe alternatives you've considered
I think we could simply change the default value of
datafusion.explain.format
totree
I think the biggest part of this ticket will be updating all the tests
Most tests that use the indent mode should continue to use the indent mode as it contains more information. In other words the tests should not be updated to use the
tree
modeAdditional context
No response
The text was updated successfully, but these errors were encountered: