-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Perf: Support Utf8View datatype single column comparisons for SortPreservingMergeStream #15348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…servingMergeStream
} | ||
|
||
fn eq(l: &Self, l_idx: usize, r: &Self, r_idx: usize) -> bool { | ||
unsafe { GenericByteViewArray::compare_unchecked(l, l_idx, r, r_idx).is_eq() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a 'safety:' note to say why is is ok to use unsafe here. An example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Omega359 for review, good example, i will address it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be good to justify the use of unchecked (which I think is ok here)
The docs say https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#method.compare_unchecked
SO maybe the safety argument is mostly "The left/right_idx must within range of each array"
It also seems like we need to be comparing the Null masks too 🤔 like checking if the values are null before comparing
Given that this comparison is typically the hottest part of a merge operation maybe we should try using unchecked comparisions elswhere
Thank you for the work on better Utf8View support. I tried one sort benchmark with sort-preserving merging on a single Reproducer
main: 8s According to the flamegraph, an extra overhead of |
Thank you @2010YOUY01 for review, i may know the problem about the above Reproducer:
SELECT l_shipmode, l_comment, l_partkey
FROM lineitem
ORDER BY l_shipmode; It will show the performance improvement. And finally, i think we need to create a follow-up ticket to improve and investigate the regression case. It will be valuable for us to improve it. Thanks! |
Updated the result for short string sort which will benefit a lot from StringView type, add here is the Q 11 for sort test: - const SORT_QUERIES: [&'static str; 10] = [
+ const SORT_QUERIES: [&'static str; 11] = [
// Q1: 1 sort key (type: INTEGER, cardinality: 7) + 1 payload column
r#"
SELECT l_linenumber, l_partkey
@@ -159,6 +159,12 @@ impl RunOpt {
FROM lineitem
ORDER BY l_orderkey, l_suppkey, l_linenumber, l_comment
"#,
+ // Q11: 1 sort key (type: VARCHAR, cardinality: 4.5M) + 1 payload column
+ r#"
+ SELECT l_shipmode, l_comment, l_partkey
+ FROM lineitem
+ ORDER BY l_shipmode;
+ "#,
]; This PR: Q11 iteration 0 took 5645.3 ms and returned 59986052 rows
Q11 iteration 1 took 5641.1 ms and returned 59986052 rows
Q11 iteration 2 took 5520.6 ms and returned 59986052 rows
Q11 avg time: 5602.33 ms The main: Q11 iteration 0 took 6687.5 ms and returned 59986052 rows
Q11 iteration 1 took 6504.5 ms and returned 59986052 rows
Q11 iteration 2 took 6544.6 ms and returned 59986052 rows
Q11 avg time: 6578.87 ms About 20% performance improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuqi-lucas -- this looks pretty sweet. I think we need to sort out nulls and safety comment and this will be good to go
} | ||
|
||
fn eq(l: &Self, l_idx: usize, r: &Self, r_idx: usize) -> bool { | ||
unsafe { GenericByteViewArray::compare_unchecked(l, l_idx, r, r_idx).is_eq() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be good to justify the use of unchecked (which I think is ok here)
The docs say https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#method.compare_unchecked
SO maybe the safety argument is mostly "The left/right_idx must within range of each array"
It also seems like we need to be comparing the Null masks too 🤔 like checking if the values are null before comparing
Given that this comparison is typically the hottest part of a merge operation maybe we should try using unchecked comparisions elswhere
Thank you @alamb for review, good suggestion, and i checked the nullable check is checked in the parent wrapper call, for example: impl<T: CursorValues> CursorValues for ArrayValues<T> {
fn len(&self) -> usize {
self.values.len()
}
fn eq(l: &Self, l_idx: usize, r: &Self, r_idx: usize) -> bool {
match (l.is_null(l_idx), r.is_null(r_idx)) {
(true, true) => true,
(false, false) => T::eq(&l.values, l_idx, &r.values, r_idx),
_ => false,
}
}
fn eq_to_previous(cursor: &Self, idx: usize) -> bool {
assert!(idx > 0);
match (cursor.is_null(idx), cursor.is_null(idx - 1)) {
(true, true) => true,
(false, false) => T::eq(&cursor.values, idx, &cursor.values, idx - 1),
_ => false,
}
}
fn compare(l: &Self, l_idx: usize, r: &Self, r_idx: usize) -> Ordering {
match (l.is_null(l_idx), r.is_null(r_idx)) {
(true, true) => Ordering::Equal,
(true, false) => match l.options.nulls_first {
true => Ordering::Less,
false => Ordering::Greater,
},
(false, true) => match l.options.nulls_first {
true => Ordering::Greater,
false => Ordering::Less,
},
(false, false) => match l.options.descending {
true => T::compare(&r.values, r_idx, &l.values, l_idx),
false => T::compare(&l.values, l_idx, &r.values, r_idx),
},
}
}
} I try to address comments and suggestions in latest PR. And for longer string compare regression for StringView, #15348 (comment) |
Added some new testing, we need to improve High Cardinality Performance for sorting with utf8_view, and the most performance regression is with sort_partitioned. Comparison: UTF8 vs UTF8_VIEW Sorting PerformanceBased on the benchmark results, we compare Low Cardinality Performance
Observations
High Cardinality Performance
Observations
Key Takeaways
|
I did some POC of the automatically concat_batches which is totally another improvement ticket besides this PR: Very good performance improvement i can see, need more testing and investigation. And it's not limited to utf8_view enabled, i did not apply this PR to the testing for above comments result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zhuqi-lucas -- this is quite cool and it is very neat it spawned a bunch of observations
@@ -294,16 +294,44 @@ impl CursorValues for StringViewArray { | |||
} | |||
|
|||
fn eq(l: &Self, l_idx: usize, r: &Self, r_idx: usize) -> bool { | |||
// SAFETY: Both l_idx and r_idx are guaranteed to be within bounds, | |||
// and any null-checks are handled in the outer layers. | |||
// Fast path: Compare the lengths (or a proxy of the lengths) before full byte comparison. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
unsafe { | ||
GenericByteViewArray::compare_unchecked(cursor, idx, cursor, idx - 1).is_eq() | ||
} | ||
} | ||
|
||
fn compare(l: &Self, l_idx: usize, r: &Self, r_idx: usize) -> Ordering { | ||
// SAFETY: Prior assertions guarantee that l_idx and r_idx are valid indices. | ||
// Null-checks are assumed to have been handled in the wrapper (e.g., ArrayValues). | ||
assert!(l_idx < l.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert
s are left in release builds of rust.
I think it would be a great project to improve the performance of utf8_view for sorting high cardinality - maybe we can add a ticket to track that work too |
Thank you @alamb for review, created a follow-up ticket for this also: Strange, latest testing can't produce the regression for some high cardinality result, only the sort partition we can always reproduced described already in above comments: Latest testing result:
I will continue the investigation in the new ticket. |
This one also deserve a new ticket to investigation, created a ticket now: |
Thanks again @zhuqi-lucas |
…servingMergeStream (apache#15348) * Perf: Support Utf8View datatype single column comparisons for SortPreservingMergeStream * Add safety and bench sql * fix * Fix * Add benchmark testing
…servingMergeStream (apache#15348) * Perf: Support Utf8View datatype single column comparisons for SortPreservingMergeStream * Add safety and bench sql * fix * Fix * Add benchmark testing
Which issue does this PR close?
Rationale for this change
Support Utf8View datatype single column comparisons for SortPreservingMergeStream
What changes are included in this PR?
Support Utf8View datatype single column comparisons for SortPreservingMergeStream
Are these changes tested?
Yes
Are there any user-facing changes?
Support Utf8View datatype single column comparisons for SortPreservingMergeStream