Proposal for a re-worked API #246

thomaseizinger · 2023-02-24T00:37:08Z

We are in the process of adopting quick-protobuf in rust-libp2p and a few points came up whilst using the API.

The naming of various traits and functions is too similar: For example, there is MessageWrite::write_message and Writer::write_message. The latter uses the former but also includes a length-prefix. We've tripped over this multiple times.
There is serialize_into_vec which writes a length-prefix but it is not documented.
There are three traits implemented for a message, which all need to be imported so you can work with them.
There is a BytesWriter, a Writer, a WriterBackend and the MessageWrite. The relationship is not entirely clear from the naming.

My suggestions would be:

Unify MessageInfo, MessageRead and MessageWrite into a single trait Message.
The standard library already has an abstraction for writing bytes: std::io::Write. Can Message::write perhaps simply take a &mut std::io::Write? This would remove the layer of constructing a Writer with a WriterBackend. A std::fs::File already implements Write and so does std::vec::Vec.
On a related thought, we might want to consider offering an AsyncMessage type that uses the futures abstraction AsyncRead and AsyncWrite.
This might be controversial but I think writing a message with a length-prefix is out of scope for this library. Length-prefixing messages is a protocol for how to parties can interact. I think we are better off with focusing on reading and writing protobufs in this library. I am guessing protobufs themselves are not self-descriptive for this very reason: It is too opinionated to be really generally useful.
Provide only a handful of utility functions at the module root:
- quick_protobuf::from_slice::<M>(&[u8]) -> Result<M>
- quick_protobuf::to_vec::<M>(M) -> Vec<u8>
- serde is a battle-tested library. We should be able to take some inspiration from what functions they provide: https://docs.rs/serde_json/1.0.93/serde_json/#functions
We will need some abstractions within the library that the generated-code can use to avoid duplication. I think those should be in a separate module like codegen or internal which clearly signals to users that they shouldn't use these directly.

Let me know what you think. I am open to working on this and sending PRs :)

The text was updated successfully, but these errors were encountered:

snproj · 2023-03-06T02:15:39Z

Ah I was planning to remove the length prefix in serialize_into_vec() too as per my comment in #202. I guess you guys would be ok with that too? I don't think we should necessarily retain the old function with the length prefix in any form either; so far no one seems to have voiced out in favor of that.

thomaseizinger · 2023-03-06T02:19:41Z

Cool!

Yeah I am okay with it being removed. That isn't the only part of my proposal though so I am keen to hear your opinion on the remaining points :)

snproj · 2023-03-06T08:29:20Z

Re: Suggestion 1

Click to open

As far as I know, MessageInfo is meant to be kinda optional (as in, users can choose to generate MessageInfo implementations for autogenerated structs through a cmd line arg), so unifying all three might be a bit messy. So far, I don't see technical issues unifying MessageRead and MessageWrite, though. Would that make things easier?

Re: Suggestion 2

Click to open

Rewriting BytesWriter

Hmm, I assume the focus here is on improving the user interface (as opposed to making the library codebase smaller). If so, one way I can think of doing it is to rewrite BytesWriter to have its buf field be std::io::Write instead of &mut [u8]. Message::write() (which I'm assuming is the same as MessageWrite::write_message()) would then take in std::io::Write, and internally convert it into BytesWriter and then Writer and proceed as per normal. We might then be able to make Writer, WriterBackend and BytesWriter private. We still need Writer and WriterBackend to exist since they serve to encapsulate protobuf-specific writing functions (i.e. write a varint, write with a tag, etc).

So far, I don't think this is ideal

However, ultimately I don't think this is something we want to do; I'd have a few arguments in favor of the current system:

Although write_message() takes in &mut Writer, this is easily constructed from any &mut [u8] using Writer::new(BytesWriter::new(buf)). I feel that being able to use any &mut [u8] might be a bit less restrictive than having to use something that implements std::io::Write.
On that note, I'm not sure what the interface would be for users of no_std.
Using my proposal above (the rewritten BytesWriter) would put the responsibility of optimizing byte writing and error checking on the user's choice implementation of std::io::Write; I would want the library to take care of that instead, since speed optimization is a core focus of this library. In the current system, since we tend to use Writer::new(BytesWriter::new(buf)), where buf is &mut [u8], our library has direct access to the user's byte array through the implementation of BytesWriter.
Users who wish to use std::io::Write should still be able to use the current system by rewriting BytesWriter as proposed above.

Having said that, I'm open to counterarguments! I'm not very experienced, so do let me know if I'm straight up not making sense haha 😅

Re: Suggestion 3

Click to open

I'm not familiar with futures, but I'm not opposed to the idea of this as an addon. I assume that these would be additional structs that wouldn't really affect the existing users, and would provide essentially the same services as the existing ones; just that they would be used in codebases dealing with asynchronous reading and writing.

It would be quite a big addition, so I probably won't have time to work on it myself; I'd welcome a PR though! If you do decide to pursue it, perhaps it would be ideal to open an issue / draft PR early on while working on it, so that other users can discuss their needs before investing large amounts of time in it.

Out of curiosity, have you any thoughts on using tokio's equivalents (tokio::io::AsyncRead, etc)?

Re: Suggestion 4

Yep, I intend to submit a PR to remove the length prefixes soon.

Re: Suggestion 5

Click to open

I'm not sure if I understand this suggestion correctly, but currently, the module root directly exposes:

deserialize_from_slice()
serialize_into_slice()
serialize_into_vec()

If I'm not wrong, deserialize_from_slice() is the same as the proposed from_slice().

Meanwhile, serialize_into_vec() is almost the same as the proposed into_vec(), except that serialize_into_vec() returns Result<Vec<u8>>, which seems more in line with serde's into_vec() signature than Vec<u8>.

Since a core focus of this library is on speed optimization, I would want to keep serialize_into_slice() available too, since it's an option that doesn't require memory allocation.

Re: Suggestion 6

Sorry I'm not very sure what you mean by duplication; could you briefly clarify?

thomaseizinger · 2023-10-01T22:56:11Z

Sorry for the late reply, this kind of went under the radar 😅

As far as I know, MessageInfo is meant to be kinda optional (as in, users can choose to generate MessageInfo implementations for autogenerated structs through a cmd line arg), so unifying all three might be a bit messy. So far, I don't see technical issues unifying MessageRead and MessageWrite, though. Would that make things easier?

It is really about reducing the API surface for the user. All three traits are auto-generated, right? But all three have to imported when you want to interact with the generated code.

Hmm, I assume the focus here is on improving the user interface (as opposed to making the library codebase smaller).

100%. I think the main APIs exposed to the user should be as simple as possible and serve the majority of use cases. Ideally, they only have to deal with concepts they are already familiar with, like std::io::Write.

We might then be able to make Writer, WriterBackend and BytesWriter private. We still need Writer and WriterBackend to exist since they serve to encapsulate protobuf-specific writing functions (i.e. write a varint, write with a tag, etc).

That would be ideal! :)

Not sure how ergonomic the no_std use-case needs to be served but it is likely the less popular one so as long as it is possible, it should be good enough?

I'm not sure if I understand this suggestion correctly, but currently, the module root directly exposes:

It is mostly about following similar naming patterns and avoiding unnecessary "naming". If I am using quick_protobuf, I already know that I'll be serializing / deserializing. The function name doesn't have to say that :)
I am a big fan of calling top-level module functions via their crate name. This adds a lot of context to the code without having to look up imports which makes reading and understanding code much easier because it doesn't interrupt your flow. For example:

fn main() {
	let message = quick_protobuf::from_slice::<MyMessage>(&buf).unwrap();
}

Since a core focus of this library is on speed optimization, I would want to keep serialize_into_slice() available too, since it's an option that doesn't require memory allocation.

Sure! But again, we should drop the serialize_ prefix because it is redundant :)

thomaseizinger mentioned this issue Feb 24, 2023

feat: migrate to quick-protobuf libp2p/rust-libp2p#3312

Merged

4 tasks

thomaseizinger mentioned this issue Nov 16, 2023

feat(quick-protobuf-codec): reduce allocations during encoding libp2p/rust-libp2p#4782

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for a re-worked API #246

Proposal for a re-worked API #246

thomaseizinger commented Feb 24, 2023

snproj commented Mar 6, 2023 •

edited

Loading

thomaseizinger commented Mar 6, 2023

snproj commented Mar 6, 2023

Rewriting BytesWriter

So far, I don't think this is ideal

thomaseizinger commented Oct 1, 2023

Proposal for a re-worked API #246

Proposal for a re-worked API #246

Comments

thomaseizinger commented Feb 24, 2023

snproj commented Mar 6, 2023 • edited Loading

thomaseizinger commented Mar 6, 2023

snproj commented Mar 6, 2023

Re: Suggestion 1

Re: Suggestion 2

Rewriting BytesWriter

So far, I don't think this is ideal

Re: Suggestion 3

Re: Suggestion 4

Re: Suggestion 5

Re: Suggestion 6

thomaseizinger commented Oct 1, 2023

snproj commented Mar 6, 2023 •

edited

Loading