Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Decimal Transforms #17968

Open
wants to merge 8 commits into
base: branch-25.04
Choose a base branch
from

Conversation

lamarrr
Copy link
Contributor

@lamarrr lamarrr commented Feb 10, 2025

Description

This merge request implements transforms for decimal types.
It requires the UDF writer to manually decode the decimals from its representation and scale.
There's also an NVRTC/CCL issue with not recognizing __int128_t as an integral type in the device code that was fixed by defining __SIZEOF_INT128__.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Feb 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Java Affects Java cuDF API. pylibcudf Issues specific to the pylibcudf package CMake CMake build issue labels Feb 10, 2025
@lamarrr lamarrr removed Python Affects Python cuDF API. Java Affects Java cuDF API. pylibcudf Issues specific to the pylibcudf package labels Feb 11, 2025
@lamarrr
Copy link
Contributor Author

lamarrr commented Feb 13, 2025

/ok to test

@lamarrr lamarrr added feature request New feature or request non-breaking Non-breaking change labels Feb 13, 2025
@lamarrr
Copy link
Contributor Author

lamarrr commented Feb 13, 2025

/ok to test

@lamarrr lamarrr marked this pull request as ready for review February 13, 2025 18:15
@lamarrr lamarrr requested review from a team as code owners February 13, 2025 18:15
bool const is_scalar = input.size() != output.size();
return column_type_name(input.type(), is_scalar);
});
auto const add_column = [&](cudf::data_type data_type, bool is_scalar) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong to me. Why are we splitting up the value and scale?
Why can't we pass the numeric type as defined in cudf/fixed_point/fixed_point.hpp?

Copy link
Contributor Author

@lamarrr lamarrr Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values have a stride of 1 (0 if it is a scalar) and the scale is always 0 (scalar). All rows in a column have the same scale.

I did consider wrapping the fixed point columns' row data pointer in a struct along with the scale, but the kernel segfaults (mismatched global and constant address spaces) as NVRTC needs to know the types of the addresses it is loading from (i.e. const or not, to know whether to use global or constant memory).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more worried about the interface of the transform functions, which is the part the user will be exposed to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we need to rethink this.
My understanding is the provided kernel (GENERIC_TRANSFORM_OP) works on a single row which we provide the single data for in kernel.cu (In and Out types)
It seems we could/should be able to put together a single element appropriately for calling the UDF.
I'm thinking we rework the kernel.cu to be more type-specific in that it can call something like
column_device_view::element<In>() and mutable_column_device_view::element<Out>() to resolve a real typed value to pass to the transform-op.
This may be too much for jitify but perhaps there is something else we can do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants