Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add binary as numerical #19542

Closed
wants to merge 10 commits into from

Conversation

balbok0
Copy link
Contributor

@balbok0 balbok0 commented Oct 31, 2024

Related to #18991

Implements from_buffer method that casts from Binary to Numerical types. It is not the most efficient (i.e. size + reinterpret cast would probably better), but provides a consistent code for both LE and BE encodings. Name is inspired by numpy equivalent, but with polars style _ separator between words.

I also added match arm in cast that goes from Binary -> numerical type. It assumes LE encoding, as it is a vastly more popular from the two. Happy to remove that part of code, since it does make an unspecified assumption. Lastly, I noticed that there is missing Numerical -> Binary functionality. Out-of-scope for this MR IMO, but something I can do later on.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Oct 31, 2024
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for following up on this. I've left some comments.

impl_cast!(f32);
impl_cast!(f64);

impl Cast for f16 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support f16, so we can drop this.

@@ -93,7 +140,8 @@ pub fn binary_to_utf8<O: Offset>(
}

/// Casts a [`BinaryArray`] to a [`PrimitiveArray`], making any uncastable value a Null.
pub(super) fn binary_to_primitive<O: Offset, T>(
#[allow(dead_code)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this isn't casting anymore, and we shouldn't dispatch to this via cast I think we should move this all much higher int he polars dependency chain.

This should instead be implemented in polars-ops. Only the expressions should call into this and lower it is isn't something we should compile.

@@ -394,16 +403,26 @@ pub fn cast(
match to_type {
BinaryView => Ok(arr.to_binview().boxed()),
LargeUtf8 => Ok(binview_to::utf8view_to_utf8::<i64>(arr).boxed()),
UInt8 => binview_to_primitive_dyn::<u8>(&arr.to_binview(), to_type, options),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then we can remove all this from the cast kernels as well.

@balbok0
Copy link
Contributor Author

balbok0 commented Dec 12, 2024

Sorry, I was out for all of November, and only got back recently. Given how out of date this branch is, I will close this MR and re-open, with a much cleaner history + feedback applied. Hope this is ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants