-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python): in rolling aggregation functions, sort under-the-hood for user if data is unsorted (with warning) #11134
Conversation
e3ccb77
to
435a2e9
Compare
711fd19
to
6ed0188
Compare
…hood for user if data is unsorted (with warning)
6ed0188
to
9f8d828
Compare
Was there some discussion on this? I am curious if we should do this implicitly or require more strictness/awareness. |
yup the discussion's all in this thread #10991 (comment), sorry for not having linked it right away |
crates/polars-plan/src/dsl/mod.rs
Outdated
}, | ||
IsSorted::Not => { | ||
if options.warn_if_unsorted { | ||
eprintln!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use polars_error
's polars_warn
function?
unsafe { by = by.take_unchecked(&sorting_indices)? }; | ||
unsafe { series = s[0].take_unchecked(&sorting_indices)? }; | ||
let int_range = | ||
UInt32Chunked::from_iter_values("", 0..s[0].len() as u32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be IdxCa
and IdxSize
instead of u32
, no? EDIT: see below, you shouldn't save this at all and instead save sorting_indices
.
}, | ||
IsSorted::Not => { | ||
let res = rolling_fn(&series, options)?; | ||
let indices = &original_indices.unwrap().arg_sort(Default::default()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be another sort and take, instead this should be a scatter: out[sorting_indices[i]] = res[i]
. We currently lack a good scatter kernel though, as far as I'm aware. So this may be blocked until I get to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - OK marking as blocked then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @orlp - just checking whether a scatter kernel might be on the horizon, and if not whether you have suggestions on what to do instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli We don't have a scatter kernel yet, but we do have PolarsDataType::ZeroablePhysical
now, which allows you to write a reasonably efficient scatter in a couple lines of code.
You can look in gather_skip_nulls.rs
for an example, but basically you do
let mut out: Vec<T::ZeroablePhysical<'a>> = zeroed_vec(len);
unsafe {
for i in 0..n {
let out_idx = *sorted_indices.get_unchecked(i);
*out.get_unchecked_mut(out_idx) = res.get(i).into();
}
}
let out_arr = T::Array::from_zeroable_vec(out, dtype);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the behavior of nulls you may also have to construct a null bitmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Noob question but where would I get T
from in this function? Do I first need to refactor and make finish_rolling
generic?
closing for now as I think doing this correctly is a bit above me at the moment, but @orlp feel free to take over, would be very interested to see how you'd do it |
got a bit of time in the end, so trying to address this