Skip to content

feat: topk functionality for aggregates should support utf8view and largeutf8 #15152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 14, 2025

Conversation

zhuqi-lucas
Copy link
Contributor

Which issue does this PR close?

  • Closes sub-task topk functionality for aggregates should support utf8view for

#15096

Rationale for this change

topk functionality for aggregates should support utf8view

What changes are included in this PR?

topk functionality for aggregates should support utf8view

Are these changes tested?

Yes

Are there any user-facing changes?

topk functionality for aggregates should support utf8view

@zhuqi-lucas
Copy link
Contributor Author

I will add sqllogictesting after the config PR merged:

#15104

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhuqi-lucas -- looks good to me

I'll wait for SQL test coverage before hitting approve

Any chance you are willing to add LargeUtf8` support while you are at it?

@zhuqi-lucas
Copy link
Contributor Author

Thanks @zhuqi-lucas -- looks good to me

I'll wait for SQL test coverage before hitting approve

Any chance you are willing to add LargeUtf8` support while you are at it?

Thank you @alamb for review, i am willing to add LargeUtf8 support also!

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Mar 12, 2025
@zhuqi-lucas
Copy link
Contributor Author

Thanks @zhuqi-lucas -- looks good to me

I'll wait for SQL test coverage before hitting approve

Any chance you are willing to add LargeUtf8` support while you are at it?

Thank you @alamb, enabled the slt testing for utf8view and also add LargeUtf8 support in latest PR.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhuqi-lucas -- I just have one more test request

@@ -19,6 +19,12 @@
# Setup test data table
#######

# Setting to map varchar to utf8view, to test PR https://github.com/apache/datafusion/pull/15152
# Before the PR, the test case would not work because the Utf8View will not be supported by the TopK aggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change now loses coverage for utf8

Instead perhaps we can make a table with stringview?

Something like

CREATE TABLE traces_utf8view 
AS SELECT 
  arrow_cast(trace_id, 'Utf8View') as trace_id,
  timestamp,
  other
)

And then do an explain

explain select trace_id, MAX(timestamp) from traces group by trace_id order by MAX(timestamp) desc limit 4;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it would be great to do the same fir LargeUtf8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thank you @alamb.

@zhuqi-lucas
Copy link
Contributor Author

Thanks @zhuqi-lucas -- I just have one more test request

Thank you @alamb for review, addressed it in latest PR!

@zhuqi-lucas zhuqi-lucas changed the title feat: topk functionality for aggregates should support utf8view feat: topk functionality for aggregates should support utf8view and largeutf8 Mar 13, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience and perseverence @zhuqi-lucas

@alamb
Copy link
Contributor

alamb commented Mar 13, 2025

FYI @avantgardnerio

@alamb alamb merged commit f8828ab into apache:main Mar 14, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 14, 2025

Thanks again @zhuqi-lucas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants