Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance udf to take additional non-expr arguments #822

Closed
Tracked by #879
timsaucer opened this issue Aug 19, 2024 · 3 comments · Fixed by #890
Closed
Tracked by #879

Enhance udf to take additional non-expr arguments #822

timsaucer opened this issue Aug 19, 2024 · 3 comments · Fixed by #890
Labels
enhancement New feature or request

Comments

@timsaucer
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

There are some cases where you would like to pass additional data to a UDF that are not expressions. For example, suppose you wrote a UDF that did some kind of lookup and the function had a signature like

def my_lookup(column: Expr, lookup_values: dict[str, str]) -> Expr

It would be convenient to allow for passing in arbitrary data.

Describe the solution you'd like

Most likely this would mean a change to PyScalarUDF::__call__ and I could see changing the function definition to something like fn __call__(&self, args: Vec<PyAny>) -> PyResult<PyExpr>. It's not immediately obvious to me how to set additional parameters, but it may mean switching from SimpleScalarUDF to a different ScalarUDFImpl to carry the additional data.

Describe alternatives you've considered

Right now you would need to do one of these approaches

  • create a class that you can set these values on and then use the __call__ function as your UDF
  • take advantage of setting the variable scope for lookup_values
  • Pass the variables as literals

Additional context

@timsaucer timsaucer added the enhancement New feature or request label Aug 19, 2024
@timsaucer
Copy link
Contributor Author

This was implemented in #880 for window functions, just need to add it in for UDAF/UDF

@timsaucer
Copy link
Contributor Author

After some initial investigation, the way the window functions are using an instance has a problem in that when the instance is reused it's carrying over the past state. I bet that's why the aggregate functions were set up to use a class type and then instantiate them in to_rust_accumulator. I'm trying to understand why this does work because it seems like the only place we instantiate is during the create_udaf however we're clearly getting two calls to __init__ in the unit test when I add a second aggregation in the same test using the same udf. This requires further investigation.

@Michael-J-Ward
Copy link
Contributor

Did you figure out the double calls to __init__ ?

Was it because to_rust_accumulator creates a factory that is invoked each time the udaf is used in a query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants