You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source/library-user-guide/adding-udfs.md
+82-5
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,87 @@ First we'll talk about adding an Scalar UDF end-to-end, then we'll talk about th
34
34
35
35
## Adding a Scalar UDF
36
36
37
-
A Scalar UDF is a function that takes a row of data and returns a single value. For example, this function takes a single i64 and returns a single i64 with 1 added to it:
37
+
A Scalar UDF is a function that takes a row of data and returns a single value. In order for good performance
38
+
such functions are "vectorized" in DataFusion, meaning they get one or more Arrow Arrays as input and produce
39
+
an Arrow Array with the same number of rows as output.
40
+
41
+
To create a Scalar UDF, you
42
+
43
+
1. Implement the `ScalarUDFImpl` trait to tell DataFusion about your function such as what types of arguments it takes and how to calculate the results.
44
+
2. Create a `ScalarUDF` and register it with `SessionContext::register_udf` so it can be invoked by name.
45
+
46
+
In the following example, we will add a function takes a single i64 and returns a single i64 with 1 added to it:
47
+
48
+
For brevity, we'll skipped some error handling, but e.g. you may want to check that `args.len()` is the expected number of arguments.
49
+
50
+
### Adding by `impl ScalarUDFImpl`
51
+
52
+
This a lower level API with more functionality but is more complex, also documented in [`advanced_udf.rs`].
For brevity, we'll skipped some error handling, but e.g. you may want to check that `args.len()` is the expected number of arguments.
62
-
63
141
This "works" in isolation, i.e. if you have a slice of `ArrayRef`s, you can call `add_one` and it will return a new `ArrayRef` with 1 added to each value.
The challenge however is that DataFusion doesn't know about this function. We need to register it with DataFusion so that it can be used in the context of a query.
76
154
77
-
### Registering a Scalar UDF
155
+
####Registering a Scalar UDF
78
156
79
157
To register a Scalar UDF, you need to wrap the function implementation in a [`ScalarUDF`] struct and then register it with the `SessionContext`.
80
158
DataFusion provides the [`create_udf`] and helper functions to make this easier.
81
-
There is a lower level API with more functionality but is more complex, that is documented in [`advanced_udf.rs`].
0 commit comments