Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix performance regression for sort/gather on list/array columns #19564

Merged
merged 1 commit into from
Nov 1, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Oct 31, 2024

Fixes #19559

It turns out, our slow-paths for take_unchecked are very slow, because the implementation constructs a Growable with a Vec containing 1 unit-length Array per gather index

let arrays = indices
.iter()
.flat_map(|opt_idx| {
opt_idx.map(|index| {
let index = index.to_usize();
let slice = values.clone().sliced(index, 1);
capacity += slice.len();
slice
})
})
.collect::<Vec<ListArray<I>>>();
let arrays = arrays.iter().collect();
if let Some(validity) = indices.validity() {
let mut growable: GrowableList<I> = GrowableList::new(arrays, true, capacity);

And all of them end up being bit-counted in Growable::new :P

if !use_validity & arrays.iter().any(|array| array.null_count() > 0) {

Also, the reason why we regressed in list-gather performance was because we were previously not using that super-slow list gather implementation until a dispatch impl ChunkTakeUnchecked<IdxCa> for ListChunked { was added to it in #19327

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Oct 31, 2024
@nameexhaustion nameexhaustion changed the title fix: Fix performance regression for sort/gather on list/array columns fix: Fix performance regression for sort/gather on listcolumns Oct 31, 2024
@nameexhaustion nameexhaustion changed the title fix: Fix performance regression for sort/gather on listcolumns fix: Fix performance regression for sort/gather on list columns Oct 31, 2024
@nameexhaustion nameexhaustion marked this pull request as ready for review October 31, 2024 22:46
@nameexhaustion nameexhaustion changed the title fix: Fix performance regression for sort/gather on list columns fix: Fix performance regression for sort/gather on list/array columns Oct 31, 2024
@ritchie46
Copy link
Member

Thanks Simon. That slow path was the reason I made the change. I didn't realize we did never hit it though.

@ritchie46 ritchie46 merged commit 802e692 into pola-rs:main Nov 1, 2024
26 of 27 checks passed
@nameexhaustion nameexhaustion deleted the list-array-gather branch November 18, 2024 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance Regression in Sorting between 1.9.0 and 1.12.0
2 participants