Skip to content

Commit

Permalink
feat: expose binary_elementwise_into_string_amortized for plugin au…
Browse files Browse the repository at this point in the history
…thors
  • Loading branch information
MarcoGorelli committed Jul 27, 2024
1 parent 8542d5f commit 871f717
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 17 deletions.
15 changes: 0 additions & 15 deletions crates/polars-core/src/chunked_array/ops/apply.rs
Original file line number Diff line number Diff line change
Expand Up @@ -363,21 +363,6 @@ impl StringChunked {
});
StringChunked::from_chunk_iter(self.name(), chunks)
}

/// Utility that reuses an string buffer to amortize allocations.
/// Prefer this over an `apply` that returns an owned `String`.
pub fn apply_to_buffer<'a, F>(&'a self, mut f: F) -> Self
where
F: FnMut(&'a str, &mut String),
{
let mut buf = String::new();
let outer = |s: &'a str| {
buf.clear();
f(s, &mut buf);
unsafe { std::mem::transmute::<&str, &'a str>(buf.as_str()) }
};
self.apply_mut(outer)
}
}

impl BinaryChunked {
Expand Down
4 changes: 4 additions & 0 deletions crates/polars-core/src/chunked_array/ops/arity.rs
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,10 @@ where
ChunkedArray::from_chunk_iter(lhs.name(), iter)
}

/// Apply elementwise binary function which produces string, amortising allocations.
///
/// Currently unused within Polars itself, but it's a useful utility for plugin authors.
#[inline]
pub fn binary_elementwise_into_string_amortized<T, U, F>(
lhs: &ChunkedArray<T>,
rhs: &ChunkedArray<U>,
Expand Down
8 changes: 6 additions & 2 deletions docs/user-guide/expressions/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,15 @@ fn pig_latin_str(value: &str, output: &mut String) {
#[polars_expr(output_type=String)]
fn pig_latinnify(inputs: &[Series]) -> PolarsResult<Series> {
let ca = inputs[0].str()?;
let out: StringChunked = ca.apply_to_buffer(pig_latin_str);
let out: StringChunked = ca.apply_into_string_amortized(pig_latin_str);
Ok(out.into_series())
}
```

Note that we use `apply_into_string_amortized`, as opposed to `apply_values`, to avoid allocating a new string for
each row. If your plugin takes in multiple inputs, operates elementwise, and produces a `String` output,
then you may want to look at the `binary_elementwise_into_string_amortized` utility function in `polars::prelude::arity`.

This is all that is needed on the Rust side. On the Python side we must setup a folder with the same name as defined in
the `Cargo.toml`, in this case "expression_lib". We will create a folder in the same directory as our Rust `src` folder
named `expression_lib` and we create an `expression_lib/__init__.py`. The resulting file structure should look something like this:
Expand Down Expand Up @@ -160,7 +164,7 @@ fn append_kwargs(input: &[Series], kwargs: MyKwargs) -> PolarsResult<Series> {
let ca = input.str().unwrap();

Ok(ca
.apply_to_buffer(|val, buf| {
.apply_into_string_amortized(|val, buf| {
write!(
buf,
"{}-{}-{}-{}-{}",
Expand Down

0 comments on commit 871f717

Please sign in to comment.