Skip to content

Support for serialization using abomonation #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 15, 2017

Conversation

milibopp
Copy link
Collaborator

Resolves #277.

I ended up implementing the trait by hand after all. This is basically, because abomonation_derive is not quite as flexible for our types, as it needs to be.

I intend to change that upstream (see TimelyDataflow/abomonation#4), but for now the manual implementation should suffice.

cc @frankmcsherry
I would be very glad, if you could provide some feedback on whether these implementations look sane to you. Almost all of them are pure delegations. I am somewhat uncertain about the usage of slices in src/core/matrix_array.rs though. The test cases seem to work out, but I want to make sure I am not overlooking a critical issue here.

@frankmcsherry
Copy link

frankmcsherry commented Aug 14, 2017

Hello!

Yes, I think the slice use in matrix_array.rs is the only scary one. I have two thoughts on it:

  1. In the current git repo, slices have been removed from Abomonation due to memory safety issues. Mainly, you could take a &'a mut [u8] and deserialize a reference to a &'b [T] for any 'b, which you could then copy and which would ultimately become invalid. This fix to this is (I think) to have a trait Abomonation<'a> which can deserialize things with lifetimes at most 'a, allowing us to state that slices are ok if the lifetime isn't too long. Independent of how it is resolved, the next version will probably break this implementation (even though this version doesn't grab an inappropriate lifetime).

  2. I'm least confident in the correctness of deserialization for &[T] in Abomonation; it could work, but it could also be UB depending on how the Rust core team comes down. Internally it creates a new &mut [T] from the &[T], which would be totally undefined behavior for Rust slices (or whenever Rust re-introduces the use of NOALIAS for LLVM) , but as there never was a &[T] in the first place in Abomonation, it is unclear .. well it is all unclear. So I'd recommend avoiding it. Also part of the reason it has been removed.

  3. I'm taking a read through GenericArray and it seems possible to directly port the logic for e.g. Vec serialization rather than go through slices. To explain, the Vec logic is "generic" (not in the PL sense) for containers that can expose a &mut T reference to their contained elements. We iterate through each of the &mut T references, either calling entomb or exhume as appropriate for the method.

For example, here is the exhume method for Vec<T> where T: Abomonation:

#[inline]
    unsafe fn exhume<'a,'b>(&'a mut self, bytes: &'b mut [u8]) -> Option<&'b mut [u8]> {

        // extract memory from bytes to back our vector
        let binary_len = self.len() * mem::size_of::<T>();
        if binary_len > bytes.len() { None }
        else {
            let (mine, mut rest) = bytes.split_at_mut(binary_len);
            let slice = std::slice::from_raw_parts_mut(mine.as_mut_ptr() as *mut T, self.len());
            std::ptr::write(self, Vec::from_raw_parts(slice.as_mut_ptr(), self.len(), self.len()));
            for element in self.iter_mut() {
                let temp = rest;             // temp variable explains lifetimes (mysterious!)
                rest = try_option!(element.exhume(temp));
            }
            Some(rest)
        }
    }

I think this almost directly applies, except that (edit: see next comment):

  1. Verify that instances of GenericArray are value typed (not pointers), and that once memcpyd into place the remaining work is only visiting and updating each of their members if they have owned data. This seems to be the case, based on e.g. https://github.com/fizyk20/generic-array/blob/master/src/lib.rs#L133-L139, but ... Once so determined, we skip the lines that write back to self. In the Vec implementation we had to correct a pointer, but no pointers for GenericArray.
  2. Instead of self.iter_mut() you would want self.as_mut_slice().iter_mut().

If you think that GenericArray and the nalgebra types are pervasive, I'm happy to add them to abomonation, with the caveat that if they vanish from popularity in the future (e.g. type-level constants) maybe their implementations vanish too.

Also, the tests seem sane but I would mention that it is super hard to test Abomonation without round-tripping things through different address spaces. It is not uncommon to have a bug, but the pointer still points at legitimately interpreted memory and so passes the test. Not sure about how best to test this, though (perhaps re-rigging the tests to forcibly drop the input used for serialization?).

@frankmcsherry
Copy link

frankmcsherry commented Aug 14, 2017

Actually, I think the abomonation methods should be much simpler for GenericArray than for Vec. I forgot that since there is no dynamic memory for this type, there is no need to write to or consume from bytes.

My guess is that the implementation should look roughly like the tuple implementation (based on my understanding of GenericArray):

impl<T: Abomonation> Abomonation for GenericArray<T,N> {
    #[inline]
    unsafe fn embalm(&mut self) {
        // provide each element the opportunity to embalm (clean up representation).
        for element in self.as_mut_slice().iter_mut() {
            element.embalm();
        }
    }
    #[inline]
    unsafe fn entomb(&self, bytes: &mut Vec<u8>) {
        // provide each element the opportunity to entomb data.
        for element in self.as_slice().iter() { element.entomb(bytes); }
    }
    #[inline]
    unsafe fn exhume<'a,'b>(&'a mut self, but bytes: &'b mut [u8]) -> Option<&'b mut [u8]> {
        // provide each element the opportunity to exhume data.
        for element in self.as_mut_slice().iter_mut() {
            let temp = bytes; // temp variable explains lifetimes (mysterious!)
            bytes = try_option!(element.exhume(temp));
        }

        Some(bytes)
    }
}

I think you should be able to slot this in to MatrixArray and have it work, mutatis mutandis.

This is more robust than delegating to a slice, which has been removed
upstream due to unsafety. Since we can rely on there being no pointer
indirection in a GenericArray, we just iterate over the array.
@frankmcsherry
Copy link

The new changes look totally sane to me. Hard to know without being the one testing it, but it looks right. :)

@milibopp
Copy link
Collaborator Author

milibopp commented Aug 14, 2017

@frankmcsherry Thank you very much for the very thorough review. I adopted your suggested implementation for MatrixArray in the latest commit. I wanted to avoid touching another crate (generic-array) for this at first if possible, as it complicates the feature relationship and requires more coordination to get done. However, of course, generic-array could easily take this, too.

cc @fizyk20 if you are interested ^^

I'll have another look at the tests. Adding a drop is easy enough to do.

@frankmcsherry
Copy link

No worries. :)

I'd be happy to move some of these impls to Abomonation, but just a little worried about the back-compat issues (since I can't track if any are actually used). Perhaps they could land there with feature flags? This must have come up before as a pattern, coherence-be-damned.

@milibopp
Copy link
Collaborator Author

milibopp commented Aug 14, 2017

If I were in your position, I would not take in impls for other crates' types, unless you depend on them anyway. It is much more natural for a type provider to maintain the implementation than for the trait provider. Most importantly, you may need access to private fields in order to implement the trait, which can't be done out-of-crate. Also, the implementation tends to change more frequently with the type, as traits are usually more stable as they are contracts by design.

I guess that issue is part of the reason why the community tends to gravitate to common abstractions (like serde) to keep the boilerplate of optional features to a minimum

So yeah, coherence has its ups and downs. There have been heated debates around the issue, see rust-lang/rfcs#493 for example. I don't think anybody felt comfortable bringing modular typeclasses (from ML) or instance arguments (from Agda) to Rust, mostly because safety and semantics of common abstractions like HashMap rely on coherence.

@milibopp
Copy link
Collaborator Author

@sebcrozet from my point of view, this is done. feel free to merge.

@sebcrozet sebcrozet merged commit afef662 into dimforge:master Aug 15, 2017
@sebcrozet
Copy link
Member

Thanks! This will be released on the version 0.13 of nalgebra.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants