-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: topk functionality for aggregates should support utf8view and largeutf8 #15152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I will add sqllogictesting after the config PR merged: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zhuqi-lucas -- looks good to me
I'll wait for SQL test coverage before hitting approve
Any chance you are willing to add
LargeUtf8` support while you are at it?
Thank you @alamb for review, i am willing to add |
Thank you @alamb, enabled the slt testing for utf8view and also add LargeUtf8 support in latest PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zhuqi-lucas -- I just have one more test request
@@ -19,6 +19,12 @@ | |||
# Setup test data table | |||
####### | |||
|
|||
# Setting to map varchar to utf8view, to test PR https://github.com/apache/datafusion/pull/15152 | |||
# Before the PR, the test case would not work because the Utf8View will not be supported by the TopK aggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change now loses coverage for utf8
Instead perhaps we can make a table with stringview?
Something like
CREATE TABLE traces_utf8view
AS SELECT
arrow_cast(trace_id, 'Utf8View') as trace_id,
timestamp,
other
)
And then do an explain
explain select trace_id, MAX(timestamp) from traces group by trace_id order by MAX(timestamp) desc limit 4;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it would be great to do the same fir LargeUtf8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thank you @alamb.
Thank you @alamb for review, addressed it in latest PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience and perseverence @zhuqi-lucas
FYI @avantgardnerio |
Thanks again @zhuqi-lucas |
Which issue does this PR close?
#15096
Rationale for this change
topk functionality for aggregates should support utf8view
What changes are included in this PR?
topk functionality for aggregates should support utf8view
Are these changes tested?
Yes
Are there any user-facing changes?
topk functionality for aggregates should support utf8view