Skip to content

Revert Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead #14948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 1, 2025

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 28, 2025

Which issue does this PR close?

Rationale for this change

The code added in #14877 doesn't seem to handle multiple lines well and this explain plans currently look bad

We are trying to get the DataFusion 46 release out, so I think it is the least risky bet to revert these changes for 46 and then work on getting them working well in a new PR for 47

Before this PR:

> create table foo(x int, y int) as values (1,2), (3,4);
0 row(s) fetched.
Elapsed 0.018 seconds.

> explain select * from foo where x = 4;
+---------------+--------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                     |
+---------------+--------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Filter: foo.x = Int32(4)
  TableScan: foo projection=[x, y]                                                              |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192
  FilterExec: x@0 = 4
    DataSourceExec: partitions=1, partition_sizes=[1]
 |
+---------------+--------------------------------------------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.013 seconds.

After this PR:

> create table foo(x int, y int) as values (1,2), (3,4);
explain select * from foo where x = 4;
0 row(s) fetched.
Elapsed 0.029 seconds.

+---------------+-------------------------------------------------------+
| plan_type     | plan                                                  |
+---------------+-------------------------------------------------------+
| logical_plan  | Filter: foo.x = Int32(4)                              |
|               |   TableScan: foo projection=[x, y]                    |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192           |
|               |   FilterExec: x@0 = 4                                 |
|               |     DataSourceExec: partitions=1, partition_sizes=[1] |
|               |                                                       |
+---------------+-------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.006 seconds.

What changes are included in this PR?

Revert the changes from

Are these changes tested?

Are there any user-facing changes?

…corner case for only one small batch which lines are less than max_rows (apache#14921)"

This reverts commit 463ef3b.
…nt, make it totally streaming printing without memory overhead (apache#14877)"

This reverts commit 53fc94f.
@alamb alamb changed the title Alamb/revert cli update Revert changes to datafusion streaming output Feb 28, 2025
@alamb alamb changed the title Revert changes to datafusion streaming output Revert Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead Feb 28, 2025
@zhuqi-lucas
Copy link
Contributor

zhuqi-lucas commented Mar 1, 2025

Thank you @alamb , it makes sense to me.

So next step is, after we releasing 46.0.0, we revert this PR.

And i created a follow-up ticket to fix the remaining multiple lines issue:
#14953

@zhuqi-lucas
Copy link
Contributor

Sorry, i missed that we already has a ticket for it:
#14947

@xudong963 xudong963 merged commit 382e232 into apache:main Mar 1, 2025
27 of 28 checks passed
zhuqi-lucas added a commit to zhuqi-lucas/arrow-datafusion that referenced this pull request Mar 1, 2025
…and print, make it totally streaming printing without memory overhead (apache#14948)"

This reverts commit 382e232.
@alamb alamb deleted the alamb/revert_cli_update branch March 1, 2025 13:13
@alamb
Copy link
Contributor Author

alamb commented Mar 1, 2025

Thank you @alamb , it makes sense to me.

So next step is, after we releasing 46.0.0, we revert this PR.

And i created a follow-up ticket to fix the remaining multiple lines issue: #14953

Ineed -- thank you 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

datafusion-cli regression: explain plan output looks bad (error rendering multi-lines)
4 participants