Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine UDF guide doc #320

Merged
merged 33 commits into from
Oct 24, 2023
Merged
Changes from 3 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
a5c4592
refine UDF doc
nvliyuan Sep 20, 2023
f8c2ecb
refine UDF doc
nvliyuan Sep 20, 2023
0147c33
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 21, 2023
7c2beb6
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 26, 2023
d691590
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 26, 2023
83d1f5d
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 26, 2023
2656e06
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 26, 2023
aa3ab06
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 26, 2023
783afe8
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 26, 2023
1a903ee
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 26, 2023
6b836a4
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 26, 2023
79c0fd0
limit 100 chars to each line and add more comments in codes
nvliyuan Sep 26, 2023
9911021
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
76283d3
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
5ed88f9
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
7b5fb67
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
5b9e0d9
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
b1e87ba
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
d1150e7
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
96180b7
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
7005dd8
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
5c5c03d
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
3c10f96
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 27, 2023
f44e61d
add link to prerequisites
nvliyuan Sep 27, 2023
abe6cd5
reformat the file
nvliyuan Sep 27, 2023
7b53c8f
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 28, 2023
31d9214
Update examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
nvliyuan Sep 28, 2023
085621a
update the wrong link
nvliyuan Sep 28, 2023
ba42f6d
update the wrong links
nvliyuan Oct 10, 2023
cc00105
update the links to the new url
nvliyuan Oct 17, 2023
7d8df07
verify the scripts and update the doc
nvliyuan Oct 18, 2023
3372b68
replaced the duplicated instructions with refer links
nvliyuan Oct 20, 2023
124d2a3
update the reference links
nvliyuan Oct 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,22 @@
# RAPIDS Accelerated UDF Examples
This project contains sample implementations of RAPIDS accelerated user-defined functions.

The ideal solution would be usually if you can translate UDFs to dataframe or SQL operation, besides doing that translation,
we also provide an easy-to-use feature called [UDF compiler extension](https://nvidia.github.io/spark-rapids/docs/additional-functionality/udf-to-catalyst-expressions.html)
nvliyuan marked this conversation as resolved.
Show resolved Hide resolved
to translate UDFs to Catalyst expressions. The extension is limited to only support compiling simple operations, For complitcated cases, you can choose to implement
a RAPIDS Accelerated UDF.
nvliyuan marked this conversation as resolved.
Show resolved Hide resolved

## Spark Scala UDF Examples
This is the best and simplest demo for us to getting started. From the code you can see there is an original CPU implementation
nvliyuan marked this conversation as resolved.
Show resolved Hide resolved
and this is how we write in a CPU way, we only need to implement the RapidsUDF interface which provides a single method we need to override called
evaluateColumnar. The CPU URLDecode function process the input row by row, but the GPU evaluateColumnar takes the number of rows and the number of
columns, then return a cudf columnvector, because GPU get its speed by doing operations on many rows at a time,
this is the way it can run faster than the CPU. In the evaluateColumnar function, there is a cudf implementation of URL decode
jlowe marked this conversation as resolved.
Show resolved Hide resolved
that we're leveraging, so we don't need to write any native C++ code, this is all done through the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable).
The benefit to implement in Java API is ease of development, but the memory model is not friendly for doing GPU operations
because JVM make the assumption that everything we're trying to do is on the heap memory, we need to free them in a timely manner with many try finally
blocks, and since the limitation of GPU memory, the more intermediate products you build the more you're going to be GPU memory bandwidth bound.
Note that we need to implement both CPU and GPU functions to avoid if the operation failed to CPU then UDF will crash the application.
nvliyuan marked this conversation as resolved.
Show resolved Hide resolved

- [URLDecode](src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala)
decodes URL-encoded strings using the
Expand All @@ -11,6 +26,17 @@ This project contains sample implementations of RAPIDS accelerated user-defined
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)

## Spark Java UDF Examples
Below are some showcases about implementing RAPIDS accelerated scala UDF by JNI binding codes and native code.
If there is no existing simple Java API we could leverage, we can write native custom code.
The Java class for the UDF is similar as the previous URLDecode/URLEncode demo, we need to implement a cosineSimilarity
nvliyuan marked this conversation as resolved.
Show resolved Hide resolved
function in C++ code and goes into the native code as quickly as possible, because it is easier to write the code
safely. In the native code, it `reinterpret_cast` the input to columnview, do some sanity checking and convert to list
nvliyuan marked this conversation as resolved.
Show resolved Hide resolved
columnviews, then compute the cosine similarity, finally return the unique pointer to a column, release the underlying resources.
nvliyuan marked this conversation as resolved.
Show resolved Hide resolved
jlowe marked this conversation as resolved.
Show resolved Hide resolved
On Java side we are going to wrap it in a columnvector and own that resource. In `cosine_similarity.cu` we implement
the computation as the actual CUDA kernel. In CUDA kernel part, we can leverage thrust template library to write the
standard algorithms for GPU parallelizing code.
The benefit for native code is doing the UDF with the least amount of GPU memory and it could be good for performance,
however the trade-off is we need to build against libcudf and it will take a long time, and it is an advanced feature.

- [URLDecode](src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java)
decodes URL-encoded strings using the
Expand All @@ -23,6 +49,7 @@ This project contains sample implementations of RAPIDS accelerated user-defined
between two float vectors using [native code](src/main/cpp/src)

## Hive UDF Examples
Below are some showcases about implementing RAPIDS accelerated hive UDF by JNI binding codes and native code.
nvliyuan marked this conversation as resolved.
Show resolved Hide resolved

- [URLDecode](src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java)
implements a Hive simple UDF using the
Expand Down
Loading