Implemented Decimal Transforms #17968

lamarrr · 2025-02-10T13:41:47Z

Description

This merge request implements transforms for decimal types.
It requires the UDF writer to manually decode the decimals from its representation and scale.
There's also an NVRTC/CCL issue with not recognizing __int128_t as an integral type in the device code that was fixed by defining __SIZEOF_INT128__.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-02-10T13:41:50Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

lamarrr · 2025-02-13T14:30:20Z

/ok to test

lamarrr · 2025-02-13T16:23:00Z

/ok to test

…into transform-decimals

davidwendt · 2025-02-13T19:19:52Z

cpp/src/transform/transform.cpp

-      bool const is_scalar = input.size() != output.size();
-      return column_type_name(input.type(), is_scalar);
-    });
+  auto const add_column = [&](cudf::data_type data_type, bool is_scalar) {


This seems wrong to me. Why are we splitting up the value and scale?
Why can't we pass the numeric type as defined in cudf/fixed_point/fixed_point.hpp?

The values have a stride of 1 (0 if it is a scalar) and the scale is always 0 (scalar). All rows in a column have the same scale.

I did consider wrapping the fixed point columns' row data pointer in a struct along with the scale, but the kernel segfaults (mismatched global and constant address spaces) as NVRTC needs to know the types of the addresses it is loading from (i.e. const or not, to know whether to use global or constant memory).

I'm more worried about the interface of the transform functions, which is the part the user will be exposed to.

Perhaps we need to rethink this.
My understanding is the provided kernel (GENERIC_TRANSFORM_OP) works on a single row which we provide the single data for in kernel.cu (In and Out types)
It seems we could/should be able to put together a single element appropriately for calling the UDF.
I'm thinking we rework the kernel.cu to be more type-specific in that it can call something like
column_device_view::element<In>() and mutable_column_device_view::element<Out>() to resolve a real typed value to pass to the transform-op.
This may be too much for jitify but perhaps there is something else we can do here.

github-actions bot assigned lamarrr Feb 10, 2025

github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Java Affects Java cuDF API. pylibcudf Issues specific to the pylibcudf package CMake CMake build issue labels Feb 10, 2025

lamarrr force-pushed the transform-decimals branch from 4ee86e3 to 43fef79 Compare February 11, 2025 18:17

lamarrr removed Python Affects Python cuDF API. Java Affects Java cuDF API. pylibcudf Issues specific to the pylibcudf package labels Feb 11, 2025

lamarrr added 4 commits February 13, 2025 14:05

initial changes for decimal support

56e9f3a

added fixed point header

cd4b17f

added sizeof_int128 define to make int128_t compile

2b14dce

added int128_t support

c3ee2eb

lamarrr force-pushed the transform-decimals branch from 43fef79 to c3ee2eb Compare February 13, 2025 14:28

lamarrr added feature request New feature or request non-breaking Non-breaking change labels Feb 13, 2025

Merge branch 'branch-25.04' into transform-decimals

c5644ce

lamarrr added 2 commits February 13, 2025 18:14

refactoring

1468a6e

Merge branch 'transform-decimals' of https://github.com/lamarrr/cudf …

7a20a97

…into transform-decimals

lamarrr marked this pull request as ready for review February 13, 2025 18:15

lamarrr requested review from a team as code owners February 13, 2025 18:15

lamarrr requested review from mhaseeb123 and davidwendt February 13, 2025 18:15

Merge branch 'branch-25.04' into transform-decimals

d459c86

davidwendt reviewed Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented Decimal Transforms #17968

Implemented Decimal Transforms #17968

lamarrr commented Feb 10, 2025 •

edited

Loading

copy-pr-bot bot commented Feb 10, 2025

lamarrr commented Feb 13, 2025

lamarrr commented Feb 13, 2025

davidwendt Feb 13, 2025

lamarrr Feb 13, 2025 •

edited

Loading

lamarrr Feb 13, 2025

davidwendt Feb 14, 2025

Implemented Decimal Transforms #17968

Are you sure you want to change the base?

Implemented Decimal Transforms #17968

Conversation

lamarrr commented Feb 10, 2025 • edited Loading

Description

Checklist

copy-pr-bot bot commented Feb 10, 2025

lamarrr commented Feb 13, 2025

lamarrr commented Feb 13, 2025

davidwendt Feb 13, 2025

Choose a reason for hiding this comment

lamarrr Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

lamarrr Feb 13, 2025

Choose a reason for hiding this comment

davidwendt Feb 14, 2025

Choose a reason for hiding this comment

lamarrr commented Feb 10, 2025 •

edited

Loading

lamarrr Feb 13, 2025 •

edited

Loading