Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A datatype representing an allocation #505

Open
WorldSEnder opened this issue Dec 13, 2024 · 7 comments
Open

A datatype representing an allocation #505

WorldSEnder opened this issue Dec 13, 2024 · 7 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@WorldSEnder
Copy link

WorldSEnder commented Dec 13, 2024

Proposal

Problem statement

The allocator API is currently unstable. Going through the global {re}alloc functions is possible, but not as general. Still, all these methods are unsafe and require careful use. Most of the complexity comes from providing a consistent layout to the different methods for the associated pointer representing the allocation, checking for nullptr and not causing accidental double-free or leakage.

Further, while there is a way to manage an allocation of typed memory through Box<T>, there is no equivalent for the underlying untyped memory.

While the Allocator API is understandably still unstable and the exact contract with implementations is a bit of a moving target that needs to get stabilized, from a user persective only the unsatisfying alloc, realloc and dealloc methods can be used, which will most likely get deprecated (to my understanding, in favor of the Allocator API) either way in the forseeable future. The biggest pain points on stable are that:

  • You don't get notified if the allocator gives you more memory than you requested, potentially leading to superfluous calls to realloc
  • Can't (re)allocate 0-sized, this is often a separate codepath in every manual usage of alloc::allocate
  • realloc can't change the alignment of the allocation
  • raw pointers and layout data are Copy and express no ownership, making it necessary to carefully argue about duplication and leaking in panic paths
  • There is no stable realloc_zeroed

Most of these are solvable on unstable, though allocator_api is and was a moving target for the past 8 years with no particular stabilization date in sight. Additionally, the API is still unsafe. The proposed API is safe and fairly small, trading that for more explicitly tracked data in the added Allocation type.

Motivating examples or use cases

This PR manually implements buffer management that can't be done with a Vec due to working with heterogeneous datatypes with potentially different alignment. Similar datastructures are BoxBytes in bytemuck (which has conversion from a Box/Vec and deallocates but does not support resizing) and multiple questions on stackoverflow et al about over-aligned vectors.

Solution sketch

A new datatype Allocation in alloc::alloc that tracks all necessary information to safely use the allocation interface. While this often has a small overhead of some bits of information that is available externally (such as the datatype's alignment and capacity in the case of Vec), the unsafety requirements can be tracked and checked making this worth it.

// Not Copy
struct Allocation<A: Allocator = Global>  {
    alloc: A,
    ptr: NonNull<u8>,
    layout: Layout, // layout that was used to allocate the above
}
impl Allocation {
    // Forwards to alloc, handles layout.size() == 0 with a dangling ptr
    fn new(layout: Layout) -> Self;
    fn into_parts(self) -> (NonNull<u8>, Layout);
    unsafe fn from_parts(ptr: NonNull<u8>, layout: Layout) -> Self;
    fn layout(&self) -> Layout;
}
impl<A: Allocator> Allocation<A> {
    // Guarantees: aligned to requested layout, len >= requested size, reflecting the actual allocation
    // Pointer is valid for reads and writes until the next call to realloc or this Allocation being dropped
    fn as_slice(&self) -> NonNull<[MaybeUninit<u8>]>;
    // Calls either grow or shrink, compares against stored layout
    fn realloc(&mut self, new_layout: Layout);
    fn realloc_zeroed(&mut self, new_layout: Layout);
}
unsafe impl Sync<A: Allocator + Sync> for Allocation {}
unsafe impl Send<A: Allocator + Send> for Allocation {}
impl Drop for Allocation {
    // handles deallocation
}

// Extensions depending on allocator API
#[cfg(allocator_api)]
impl Allocation {
    fn try_new(layout: Layout) -> Result<Self, AllocError>;
}
#[cfg(allocator_api)]
impl<A: Allocator> Allocation<A> {
    fn new_in(layout: Layout, alloc: A) -> Self;
    fn try_new_in(layout: Layout, alloc: A) -> Result<Self, AllocError>;
    fn into_parts_with_alloc(self) -> (NonNull<u8>, Layout, A);
    unsafe fn from_parts_in(ptr: NonNull<u8>, layout: Layout, alloc: A) -> Self;
    fn try_realloc(&mut self, new_layout: Layout) -> Option<AllocError>;
    fn try_realloc_zeroed(&mut self, new_layout: Layout) -> Option<AllocError>;
}

Alternatives

All downsides mentioned in the motivation can in theory be worked around with existing API, but an unstable-enabled library like the std could provide a better and more performant interface than a crate working against stable rust ever could by internally using the yet unstable allocator API. This ACP could be stabilized without commiting to the exact details of the underlying Allocator API.

Very little of the above would be unsafe API, the alternative for consumers of std at the moment is writing a lot of unsafe code (which is possible, albeit probably error prone).

Links and related work

MVP implementation of some of the above interface

@WorldSEnder WorldSEnder added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Dec 13, 2024
@pitaj
Copy link

pitaj commented Dec 13, 2024

I think comparing only against the stable APIs is misguided. The first thing we're going to ask is "what about the Allocator trait"? For instance, at least this point is covered already by that

You don't get notified if the allocator gives you more memory than you requested, potentially leading to superfluous calls to realloc

@the8472
Copy link
Member

the8472 commented Dec 17, 2024

We discussed this during today's libs-api meeting and we're inclined to reject this on the ground that that this doesn't implement anything necessary to use the allocator APIs and can exist as a safer abstraction in a crate. One of the concerns was that most containers types, especially collections, but also Box-likes usually do not want to carry around the allocator and layout information for each value, because the for example the layout can be derived from T or a fat pointer.

Though we were also unsure which properties exactly you want from an allocation type. E.g. would it be be ok to make the alignment a generic const argument instead of a runtime value?
Would OveralignedBox<T> or a Box::<[MaybeUninit<u8>]>::resize (or grow/shrink) cover some use-cases?
We're interested if you have any input that could inform Allocator API design.

Most of these are solvable on unstable, though allocator_api is and was a moving target for the past 8 years with no particular stabilization date in sight.

This ACP could be stabilized without commiting to the exact details of the underlying Allocator API.

This does not seem obvious, some open questions around the allocator API - such as the Storage proposal - would change it in deep ways that would also mean that Allocation would not work for those types of memory.

We agree that it's unfortunate that not much progress has been made on the Allocator API, but adding even more unstable API surface is unlikely to improve that situation.

The difficulty here is that the exploration of the flexibility-complexity frontier requires a lot of work and is littered with dead proposals, which can be quite demotivating for the participants. At the same time the libs-team has quite limited review bandwidth for complex proposals.

@WorldSEnder
Copy link
Author

WorldSEnder commented Dec 19, 2024

Would OveralignedBox<T> or a Box::<[MaybeUninit<u8>]>::resize (or grow/shrink) cover some use-cases?
We're interested if you have any input that could inform Allocator API design.

I'm mainly interested in a stable way of changing the alignment of an allocation that avoids memcopying everything over to new memory. realloc allows changing only the size, though the unstable Allocator::{grow, shrink} methods could be used (at least it's legal to simply pass the currently allocated size and just change the alignment).

The above proposal would allow one to freely change both the requested size and align, i.e. the whole Layout, of the allocation freely. How a specific allocator protocol achieves this while avoiding memcopies as best as possible should not be the concern of the user, even though this would most likely imply storing additional information besides the pointer such as the allocated layout instead of reconstructing it from statically known information.

OveralignedBox sounds interesting and might be sufficient, but it's not super clear.

A bit of context on the why. The popular neural net libraries store the weights/tensor elements as a series of bytes and a datatype descriptor on disk. When deserializing, this datatype is thus not statically known. Having one code path per supported datatype (matching on the dynamic dtype from the file and forwarding to a monomorphized version of each method) leads to a lot of code duplication. Most of the time, the data is not even on the CPU in the first place, so no need to actually statically know it. In any case, a second method of constructing a tensor would be the user passing in a Vec<_> of elements. In this case, we do know the element type as a proper rust type. At some point, the data can then be converted back to a Vec<E> of elements.

So, how to actually support the above? The main pain point is connected to the deserialization protocol. serde gives us a Vec<u8> if we ask it for an owned sequence of bytes. This will always be allocated with alignment align_of::<u8>() == 1. Separately, we might decode a datatype descriptor of u32 from the serialized tensor. Most allocators place the data with some stricter alignment which allows us to read, e.g. a u32 from bytes 0..4 without having to copy to a better aligned allocation. But when the user asks for an owned copy of Vec<u32>, there is no way to do this in stable rust without a memcopy, to my knowledge, since this vec will deallocate with alignment align_of::<u32>() == 4 which would be undefined behaviour.

In this example we would like to take the allocation of a Vec<u8> (originating from serde) that in almost all cases will be placed at an address with alignment >=16, and return that allocation as a Vec<u32>. For this a call to Allocator::grow - passing the allocated size, but the stricter alignment for new_layout - would be sufficient. (Allocator::shrink could also be used, it's not clear to me which one is better suited here or we should have a different method entirely). But on stable, we can not avoid a memcopy.

@pitaj
Copy link

pitaj commented Dec 19, 2024

@WorldSEnder

Why does a user want an owned Vec instead of just a slice reference? If the allocation is already aligned then align_to will likely work perfectly.

Vec doesn't even provide a way to interact with the base allocation. You can get the pointer and capacity, but I don't think there are any promises about what happens if you were to call Allocator::grow with them. Certainly the new alignment wouldn't be maintained on the next realloc.

I implore you to try out the unstable Allocator API. I think you can make a thin shim allocator that always requests the desired alignment:

struct Overaligned<A, const ALIGN: usize>(A);

impl<A: Allocator, const ALIGN: usize> Allocator for Overaligned<A, ALIGN> {
    fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
        self.0.allocate(layout.align_to(ALIGN).map_err(|_| AllocError)?)
    }

    ...
}

// Then
#[derive(serve::Deserialize)]
struct YourData {
    vec: Vec<u8, Overaligned<Global>>
}

@the8472

I don't see why we'd need a separate type for OveralignedBox. Couldn't we just have a new constructor Box::with_alignment?

@scottmcm
Copy link
Member

scottmcm commented Dec 19, 2024

Further, while there is a way to manage an allocation of typed memory through Box<T>, there is no equivalent for the underlying untyped memory.

For any case where the alignment is statically known, rather than dynamically, there's Box<MaybeUninit<T>>, Box<[MaybeUninit<X>]> or Box<(Y, [MaybeUninit<u8>])>, for an appropriate type X or Y depending on how general a case needs to be handled.

Maybe a smaller thing that could be done here, if people want to reuse those allocations, would be to have something like the inverse to https://doc.rust-lang.org/std/boxed/struct.Box.html#method.write for separating a value and the allocation again, for easy re-use of that allocation later?

Couldn't we just have a new constructor Box::with_alignment?

No, because there's nowhere for the Box to store that alignment, and thus it wouldn't be able to pass the correct alignment in drop.


But overall, the mentioned direction of "close because allocation stored with its Layout doesn't need to be in std" makes sense to me. Adding another unstable type doesn't seem likely to solve the current problem of something being unstable.

@WorldSEnder
Copy link
Author

WorldSEnder commented Dec 19, 2024

@pitaj

Why does a user want an owned Vec instead of just a slice reference?

Why not. I don't think it's exotic to expect to take ownership of the tensor data if a user knows the datatype that their tensor contains, yet the framework needs to be more versatile around it.

Vec doesn't even provide a way to interact with the base allocation.

The docs are not 100% clear on this, but very close. You can though allocate the storage of a Vec<T>, and, by going through a Box<[T]>::from_raw(_).into() end up with a Vec that uses that storage. boxed#memory-layout makes me think that I'm safe here. The other direction of using the allocation from Vec::into_raw_parts (or manually reimplementing this) is arguably less guaranteed to work, but I don't think it's controversial to say that you can take control of the allocated memory of the Vec and use it as if it was allocated from the global allocator (suitably checking that the vector isn't empty and actually has some backing storage - ZSTs are also not relevant to me).

So what I'm saying is that I'm pretty sure you can legally:

  • go from Vec<A> to a pointer+layout that is "currently allocated" and "memory fitting" for the global allocator. One might have to through a Vec<MaybeUninit<_>> first to avoid problems with spare capacity, but after that set_len and Into<Box<[_]>> should be fine while avoiding any memcopy or reallocations (though not perfectly guaranteed I suppose, at least in practice). Then use Box::into_raw.
  • use Allocator API::grow/shrink to realign that allocation to any alignment of your choice
  • go back to a Vec<B> of a different element type, with different alignment, as long as the capacity can fit a whole number of elements. Again, perhaps go through a Box<[_]> and Vec<MaybeUninit<_>> if you care about only following documented behaviour instead of directly calling Vec::from_raw_parts.

The interesting step is in (2) which, as I said, can currently only be achieved unstably and in a bit of a roundabout way. The allocator might (and for my use-cases most likely should) be able to avoid a memcopy by making a reasonable effort to do so. If the passed-in pointer already is aligned to whatever alignment B requires, the allocator would still need to sanctify this operation so that the returned pointer validly "fits" the new alignment, without having to hit the underlying system allocator.

I think you can make a thin shim allocator that always requests the desired alignment.

The alignment is (potentially) different per datatype and "leaking" that statically in the type information á la Vec<_, Overaligned<_>> has the same problem as using a Vec<OverAligned<T>> with an overaligned element type. It's certainly possible, but I certainly don't think this API would be "clean", it leaks implementation details to the user.

@pitaj
Copy link

pitaj commented Dec 19, 2024

Why not. I don't think it's exotic to expect to take ownership of the tensor data if a user knows the datatype that their tensor contains, yet the framework needs to be more versatile around it.

It's not customary in Rust to pass around raw data by value. It's idiomatic to pass through the least permissive type possible, and there is even a clippy lint needless_pass_by_value which generally recommends passing slice references instead of Vec.

The alignment is (potentially) different per datatype

Just use the maximum foreseeable alignment of any datatype in use. That's what you're hoping the global allocator is doing anyways.

and "leaking" that statically in the type information á la Vec<_, Overaligned<_>> has the same problem as using a Vec<OverAligned<T>> with an overaligned element type.

No, because size must be a multiple of alignment. Each element of a hypothetical Vec<Overaligned<u8, 4>> would take four bytes of memory, meaning it would use four times the memory. Whereas Vec<u8, Overaligned<Global, 4>> uses the same amount of memory, with the same element size, but is just aligned differently.

It's certainly possible, but I certainly don't think this API would be "clean", it leaks implementation details to the user.

Not if you wrap it in your own type, which you probably should anyways for convenience. Or you could just provide slice references with the requested element type and everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

4 participants