Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend and simplify API for calculation of range-based rolling window offsets #17807

Open
wants to merge 25 commits into
base: branch-25.04
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
8fba685
New implementation of range-based window bound calculation
wence- Jan 23, 2025
b4f1ba0
Refactor to try and reduce/parallelise compile times
wence- Jan 24, 2025
7f75f8f
Better names for range-window column factories
wence- Jan 24, 2025
8148603
Simplify dispatch
wence- Jan 24, 2025
f5aa177
Reduce repetition in tests
wence- Jan 24, 2025
4cc530c
Explicitly write out ctor
wence- Jan 30, 2025
596cb3e
proclaim_return_type
wence- Jan 30, 2025
64aa4ec
Merge remote-tracking branch 'upstream/branch-25.04' into wence/fea/r…
wence- Feb 13, 2025
6898955
Use explicit template instantiation rather than forwarding functions
wence- Feb 13, 2025
7d04ec8
Move polyfill code to its use site
wence- Feb 13, 2025
0c5f162
Naming
wence- Feb 13, 2025
548171c
No need for an internal enum tag
wence- Feb 14, 2025
c0f9437
Merge remote-tracking branch 'upstream/branch-25.04' into wence/fea/r…
wence- Feb 24, 2025
bb09793
Fix stupid
wence- Feb 24, 2025
fec5788
Partial refactor following Vyas
wence- Feb 24, 2025
a3eab98
Grouping row info is named struct
wence- Feb 24, 2025
fb1af0f
Separate distance functors by window type
wence- Feb 24, 2025
2f7b5e2
More refactoring to reduce duplication
wence- Feb 25, 2025
4405187
Some docs
wence- Feb 25, 2025
107ec10
Tidy a bit more
wence- Feb 25, 2025
3cc2bb5
More doc fixes
wence- Feb 25, 2025
9db3613
Remove unnecessary files
wence- Feb 25, 2025
e1e535e
SFINAE for saturating add/sub
wence- Feb 25, 2025
de42bc2
Refactor a bit to reduce specialisations
wence- Feb 27, 2025
7d6d005
Use `static_cast`
wence- Mar 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -643,10 +643,13 @@ add_library(
src/reshape/interleave_columns.cu
src/reshape/tile.cu
src/rolling/detail/optimized_unbounded_window.cpp
src/rolling/detail/range_following.cu
src/rolling/detail/range_preceding.cu
src/rolling/detail/rolling_collect_list.cu
src/rolling/detail/rolling_fixed_window.cu
src/rolling/detail/rolling_variable_window.cu
src/rolling/grouped_rolling.cu
src/rolling/range_rolling.cu
src/rolling/range_window_bounds.cpp
src/rolling/rolling.cu
src/round/round.cu
Expand Down
84 changes: 83 additions & 1 deletion cpp/include/cudf/detail/rolling.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2024, NVIDIA CORPORATION.
* Copyright (c) 2021-2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -29,6 +29,26 @@
namespace CUDF_EXPORT cudf {
namespace detail {

namespace rolling {
/**
* @brief Direction tag for a range-based rolling window.
*/
enum class direction : bool {
PRECEDING, ///< A preceding window.
FOLLOWING, ///< A following window.
};
/**
* @brief Wrapper for preprocessed group information from sorted group keys.
*/
struct preprocessed_group_info {
rmm::device_uvector<size_type> const& labels; ///< Mapping from row index to group label
rmm::device_uvector<size_type> const& offsets; ///< Mapping from group label to row offsets
rmm::device_uvector<size_type> const&
nulls_per_group; ///< Mapping from group label to null count in the group
};

} // namespace rolling

/**
* @copydoc std::unique_ptr<column> rolling_window(
* column_view const& input,
Expand All @@ -48,5 +68,67 @@ std::unique_ptr<column> rolling_window(column_view const& input,
rmm::cuda_stream_view stream,
rmm::device_async_resource_ref mr);

/**
* @brief Make a column representing the window offsets for a range-based window
*
* @tparam Direction Is this a preceding window or a following one.
*
* @param group_keys Table defining grouping of the windows. May be empty. If
* non-empty, group keys must be sorted.
* @param orderby Column use to define window ranges. If @p group_keys is empty,
* must be sorted. If
* @p group_keys is non-empty, must be sorted within each group. As well as
* being sorted, must be sorted consistently with the @p order and @p null_order
* parameters.
* @param order The sort order of the @p orderby column.
* @param null_order The sort order of nulls in the @p orderby column.
* @param row_delta Pointer to scalar providing the delta for the window range.
* May be null, but only if the @p window_type is @p CURRENT_ROW or @p
* UNBOUNDED. Note that @p row_delta is always added to the current row value.
* @param window_type The type of window we are computing bounds for.
* @param stream CUDA stream used for device memory operations and kernel
* launches.
* @param mr Device memory resource used for allocations.
*/
template <rolling::direction Direction>
[[nodiscard]] std::unique_ptr<column> make_range_window(
column_view const& orderby,
std::optional<rolling::preprocessed_group_info> const& grouping,
order order,
null_order null_order,
range_window_type window,
rmm::cuda_stream_view stream,
rmm::device_async_resource_ref mr);

/**
* @copydoc `make_range_window`
*
* This behaves as `make_range_window`, but template-specialised to `PRECEDING` windows. This way we
* can compile it in a separate translation unit from `make_following_range_window`.
*/
[[nodiscard]] std::unique_ptr<column> make_preceding_range_window(
column_view const& orderby,
std::optional<rolling::preprocessed_group_info> const& grouping,
order order,
null_order null_order,
range_window_type window,
rmm::cuda_stream_view stream,
rmm::device_async_resource_ref mr);

/**
* @copydoc `make_range_window`
*
* This behaves as `make_range_window`, but template-specialised to `FOLLOWING` windows. This way we
* can compile it in a separate translation unit from `make_preceding_range_window`.
*/
[[nodiscard]] std::unique_ptr<column> make_following_range_window(
column_view const& orderby,
std::optional<rolling::preprocessed_group_info> const& grouping,
order order,
null_order null_order,
range_window_type window,
rmm::cuda_stream_view stream,
rmm::device_async_resource_ref mr);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misunderstanding, you can do this with explicit template instantiation instead of defining helper functions (see the docs under "Function template instantiation", subheading "Explicit instantiation" if you're not familiar). I've made the other corresponding changes in range_(following|preceding).cu and at call sites, but please let me know if there's some other reason that you need these functions.

Suggested change
/**
* @copydoc `make_range_window`
*
* This behaves as `make_range_window`, but template-specialised to `PRECEDING` windows. This way we
* can compile it in a separate translation unit from `make_following_range_window`.
*/
[[nodiscard]] std::unique_ptr<column> make_preceding_range_window(
column_view const& orderby,
std::optional<rolling::preprocessed_group_info> const& grouping,
order order,
null_order null_order,
range_window_type window,
rmm::cuda_stream_view stream,
rmm::device_async_resource_ref mr);
/**
* @copydoc `make_range_window`
*
* This behaves as `make_range_window`, but template-specialised to `FOLLOWING` windows. This way we
* can compile it in a separate translation unit from `make_preceding_range_window`.
*/
[[nodiscard]] std::unique_ptr<column> make_following_range_window(
column_view const& orderby,
std::optional<rolling::preprocessed_group_info> const& grouping,
order order,
null_order null_order,
range_window_type window,
rmm::cuda_stream_view stream,
rmm::device_async_resource_ref mr);

} // namespace detail
} // namespace CUDF_EXPORT cudf
124 changes: 124 additions & 0 deletions cpp/include/cudf/rolling.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,15 @@
#include <cudf/aggregation.hpp>
#include <cudf/rolling/range_window_bounds.hpp>
#include <cudf/types.hpp>
#include <cudf/utilities/default_stream.hpp>
#include <cudf/utilities/export.hpp>
#include <cudf/utilities/memory_resource.hpp>

#include <rmm/resource_ref.hpp>

#include <memory>
#include <optional>
#include <variant>

namespace CUDF_EXPORT cudf {
/**
Expand All @@ -31,6 +36,98 @@ namespace CUDF_EXPORT cudf {
* @file
*/

/**
* @brief Strongly typed wrapper for bounded closed rolling windows.
*
* The endpoints of this window are included.
*/
struct bounded_closed {
cudf::scalar const& delta; ///< Delta from the current row in the window. Must be valid,
///< behaviour is undefined if not.
};

/**
* @brief Strongly typed wrapper for bounded closed rolling windows.
*
* The endpoints of this window are excluded.
*/
struct bounded_open {
cudf::scalar const& delta; ///< Delta from the current row in the window. Must be valid,
///< behaviour is undefined if not.
};

/**
* @brief Strongly typed wrapper for unbounded rolling windows.
*
* This window runs to the begin/end of the current row's group.
*/
struct unbounded {};
/**
* @brief Strongly typed wrapper for current_row rolling windows.
*
* This window contains all rows that are equal to the current row.
*/
struct current_row {};

/**
* @brief The type of the range-based rolling window endpoint.
*/
using range_window_type = std::variant<unbounded, current_row, bounded_closed, bounded_open>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm more familiar with what the code is doing, is the main reason that we're using a variant here that the bounded windows have a window size (delta) while the other two do not? If so, would things be simpler if we switched to a single class + enum? I guess it would be a little awkward to have to deal with having a delta and not in others.

Perhaps the better question in the long run is, is there a way for us to unify range_window_type and range_window_bounds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want range_window_bounds to go away


/**
* @brief Constructs preceding and following columns given window range specifications.
*
* @param group_keys Possibly empty table of sorted keys defining groups.
* @param orderby Column defining window ranges. Must be sorted. If `group_keys` is non-empty, must
* be sorted groupwise.
* @param order Sort order of the `orderby` column.
* @param null_order Null sort order in the sorted `orderby` column.
* @param preceding Type of the preceding window.
* @param following Type of the following window.
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return pair of preceding and following columns that define the window bounds for each row,
* suitable for passing to `rolling_window`.
*/
std::pair<std::unique_ptr<column>, std::unique_ptr<column>> make_range_windows(
table_view const& group_keys,
column_view const& orderby,
order order,
null_order null_order,
range_window_type preceding,
range_window_type following,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());

/**
* @brief Constructs preceding and following columns given window range specifications.
*
* @param group_keys Possibly empty table of sorted keys defining groups.
* @param orderby Column defining window ranges. Must be sorted. If `group_keys` is non-empty, must
* be sorted groupwise.
* @param order Sort order of the `orderby` column.
* @param preceding Type of the preceding window.
* @param following Type of the following window.
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
*
* @note Using `make_range_window_bounds(table_view const&, column_view const&, order, null_order,
* window_type, window_type, rmm::cuda_stream_view, rmm::device_async_resource_ref)` is preferred,
* since this function requires the launch of an additional kernel to deduce the null order of the
* orderby column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to provide this overload? Can we just require the caller to provide the null_order? Is there an easy API they could use to get it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public API that calls in to these functions doesn't currently have a null_order argument. I have not yet deprecated that API in this PR (but could). But we need a way to deduce the null order (implemented as deduce_null_order in range_rolling.cu) to polyfill during the deprecation period.

If we're happy to just remove the old API (without a deprecation period) then I don't need this function.

I suppose I don't actually need to make this API public though...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're happy to just remove the old API...

Requiring the null-order in the public APIs would break spark-rapids builds. This might not be that hard to resolve. (We'd need to do that segmented null count at our end.) But a deprecation period would be useful for planning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do the segmented null count, you just need to say where the nulls were sorted, which must be known, I think.

*
* @return pair of preceding and following columns that define the window bounds for each row,
* suitable for passing to `rolling_window`.
*/
std::pair<std::unique_ptr<column>, std::unique_ptr<column>> make_range_windows(
table_view const& group_keys,
column_view const& orderby,
order order,
range_window_type preceding,
range_window_type following,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());

/**
* @brief Applies a fixed-size rolling window function to the values in a column.
*
Expand Down Expand Up @@ -577,6 +674,33 @@ std::unique_ptr<column> grouped_range_rolling_window(
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());

/**
* @brief Apply a grouping-aware range-based rolling window function to a sequence of columns.
*
* @param group_keys Possibly empty table of sorted keys defining groups.
* @param orderby Column defining window ranges. Must be sorted. If `group_keys` is non-empty, must
* be sorted groupwise.
* @param order Sort order of the `orderby` column.
* @param null_order Null sort order in the sorted `orderby` column.
* @param preceding Type of the preceding window.
* @param following Type of the following window.
* @param min_periods Minimum number of observations in the window required to have a value.
* @param requests Vector of pairs of columns and aggregation requests.
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return A table of results, one column per input request.
*/
std::unique_ptr<table> grouped_range_rolling_window(
table_view const& group_keys,
column_view const& orderby,
order order,
null_order null_order,
range_window_type preceding,
range_window_type following,
size_type min_periods,
std::vector<std::pair<column_view const&, rolling_aggregation const&>> requests,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());
Comment on lines +553 to +563
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet added tests of this function, want to bikeshed the interface first.

For example, should the min_periods be part of the request, do we want a rolling_request object similar to the groupby_request we have for grouped aggregations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t feel like I know the status quo of our rolling APIs or the downstream requirements well enough to opine on this without significant research time. I might be able to circle back to this but it’ll be at least a week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the thing to do for now is to remove this API and then when using the other stuff from python we might notice things we want.

/**
* @brief Applies a variable-size rolling window function to the values in a column.
*
Expand Down
Loading
Loading