-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented Decimal Transforms #17968
base: branch-25.04
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
4ee86e3
to
43fef79
Compare
43fef79
to
c3ee2eb
Compare
/ok to test |
/ok to test |
…into transform-decimals
bool const is_scalar = input.size() != output.size(); | ||
return column_type_name(input.type(), is_scalar); | ||
}); | ||
auto const add_column = [&](cudf::data_type data_type, bool is_scalar) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong to me. Why are we splitting up the value and scale?
Why can't we pass the numeric type as defined in cudf/fixed_point/fixed_point.hpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values have a stride of 1 (0 if it is a scalar) and the scale is always 0 (scalar). All rows in a column have the same scale.
I did consider wrapping the fixed point columns' row data pointer in a struct along with the scale, but the kernel segfaults (mismatched global and constant address spaces) as NVRTC needs to know the types of the addresses it is loading from (i.e. const or not, to know whether to use global or constant memory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm more worried about the interface of the transform functions, which is the part the user will be exposed to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we need to rethink this.
My understanding is the provided kernel (GENERIC_TRANSFORM_OP
) works on a single row which we provide the single data for in kernel.cu (In and Out types)
It seems we could/should be able to put together a single element appropriately for calling the UDF.
I'm thinking we rework the kernel.cu
to be more type-specific in that it can call something like
column_device_view::element<In>()
and mutable_column_device_view::element<Out>()
to resolve a real typed value to pass to the transform-op.
This may be too much for jitify but perhaps there is something else we can do here.
Description
This merge request implements transforms for decimal types.
It requires the UDF writer to manually decode the decimals from its representation and scale.
There's also an NVRTC/CCL issue with not recognizing
__int128_t
as an integral type in the device code that was fixed by defining__SIZEOF_INT128__
.Checklist