Skip to content

Add Zero-Copy Conversion between Vec and MutableBuffer #3920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 25, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #3516

Rationale for this change

#3756 only added zero-copy conversion between Vec and Buffer in order to keep the scope of the change down. Unfortunately as discovered in #3917 this causes issues for the APIs that allow converting an Array back into a builder, as the builders use MutableBuffer and therefore are restricted by the alignment expectations of MutableBuffer.

What changes are included in this PR?

Adds zero-copy conversion to MutableBuffer by storing the Layout inline.

Are there any user-facing changes?

This changes what can and can't be converted to a MutableBuffer, and removes some low-level deprecated APIs. Downstream impact is likely to be extremely marginal.

@tustvold tustvold added the api-change Changes to the arrow API label Mar 23, 2023
@tustvold
Copy link
Contributor Author

I plan to run the benchmarks to confirm no performance regression

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 23, 2023
@@ -810,7 +800,8 @@ mod tests {
b.into_mutable().unwrap();

let b = Buffer::from_vec(vec![1_u32, 3, 5]);
let b = b.into_mutable().unwrap_err(); // Invalid layout
let b = b.into_mutable().unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now possible 🎉

/// Create a [`MutableBuffer`] from the provided [`Vec`] without copying
#[inline]
pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
// Safety
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the logic from Buffer::from_vec

pub fn with_capacity(capacity: usize) -> Self {
let capacity = bit_util::round_upto_multiple_of_64(capacity);
let ptr = alloc::allocate_aligned(capacity);
let layout = Layout::from_size_align(capacity, ALIGNMENT).unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous allocation methods used from_size_align_unchecked, this was technically incorrect as it didn't check capacity doesn't overflow isize

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need round_upto_multiple_of_64? from_size_align seems also rounding capacity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_size_align seems also rounding capacity

It doesn't round, it just verifies that the allocation isn't large enough to overflow isize and cause UB

// SAFETY: ALIGNMENT is a non-zero usize which is then casted
// to a *mut T. Therefore, `ptr` is not null and the conditions for
// calling new_unchecked() are respected.
unsafe { NonNull::new_unchecked(ALIGNMENT as *mut u8) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved from the alloc module

@tustvold tustvold requested a review from viirya March 23, 2023 20:07
@alamb
Copy link
Contributor

alamb commented Mar 23, 2023

I think @viirya should take a close look at this and make sure it is compatible with their usecase.

cc @avantgardnerio as I think you had a similar usecase with modifying data a while ago

@viirya
Copy link
Member

viirya commented Mar 23, 2023

I'll review this today or tomorrow.

@tustvold
Copy link
Contributor Author

I'll review this today or tomorrow.

Thank you ❤️

To be clear this is a strictly additive change, it allows into_mut for arrays containing Buffer created from Vec, in addition to the current support for Buffer created from MutableBuffer

Deallocation::Standard(layout) if layout.align() == ALIGNMENT => {
layout.size()
}
let layout = match bytes.deallocation() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment of bytes doesn't matter now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, we can handle any alignment now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Zero-Copy Conversion from Vec to/from MutableBuffer
3 participants