Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend and simplify API for calculation of range-based rolling window offsets #17807
base: branch-25.04
Are you sure you want to change the base?
Extend and simplify API for calculation of range-based rolling window offsets #17807
Changes from 7 commits
8fba685
b4f1ba0
7f75f8f
8148603
f5aa177
4cc530c
596cb3e
64aa4ec
6898955
7d04ec8
0c5f162
548171c
c0f9437
bb09793
fec5788
a3eab98
fb1af0f
2f7b5e2
4405187
107ec10
3cc2bb5
9db3613
e1e535e
de42bc2
7d6d005
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm misunderstanding, you can do this with explicit template instantiation instead of defining helper functions (see the docs under "Function template instantiation", subheading "Explicit instantiation" if you're not familiar). I've made the other corresponding changes in
range_(following|preceding).cu
and at call sites, but please let me know if there's some other reason that you need these functions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm more familiar with what the code is doing, is the main reason that we're using a variant here that the bounded windows have a window size (delta) while the other two do not? If so, would things be simpler if we switched to a single class + enum? I guess it would be a little awkward to have to deal with having a delta and not in others.
Perhaps the better question in the long run is, is there a way for us to unify
range_window_type
andrange_window_bounds
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want
range_window_bounds
to go awayThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to provide this overload? Can we just require the caller to provide the
null_order
? Is there an easy API they could use to get it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The public API that calls in to these functions doesn't currently have a
null_order
argument. I have not yet deprecated that API in this PR (but could). But we need a way to deduce the null order (implemented asdeduce_null_order
in range_rolling.cu) to polyfill during the deprecation period.If we're happy to just remove the old API (without a deprecation period) then I don't need this function.
I suppose I don't actually need to make this API public though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring the null-order in the public APIs would break
spark-rapids
builds. This might not be that hard to resolve. (We'd need to do that segmented null count at our end.) But a deprecation period would be useful for planning.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to do the segmented null count, you just need to say where the nulls were sorted, which must be known, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't yet added tests of this function, want to bikeshed the interface first.
For example, should the
min_periods
be part of the request, do we want arolling_request
object similar to thegroupby_request
we have for grouped aggregations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t feel like I know the status quo of our rolling APIs or the downstream requirements well enough to opine on this without significant research time. I might be able to circle back to this but it’ll be at least a week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the thing to do for now is to remove this API and then when using the other stuff from python we might notice things we want.