Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for a re-worked API #246

Open
thomaseizinger opened this issue Feb 24, 2023 · 4 comments
Open

Proposal for a re-worked API #246

thomaseizinger opened this issue Feb 24, 2023 · 4 comments

Comments

@thomaseizinger
Copy link

We are in the process of adopting quick-protobuf in rust-libp2p and a few points came up whilst using the API.

  • The naming of various traits and functions is too similar: For example, there is MessageWrite::write_message and Writer::write_message. The latter uses the former but also includes a length-prefix. We've tripped over this multiple times.
  • There is serialize_into_vec which writes a length-prefix but it is not documented.
  • There are three traits implemented for a message, which all need to be imported so you can work with them.
  • There is a BytesWriter, a Writer, a WriterBackend and the MessageWrite. The relationship is not entirely clear from the naming.

My suggestions would be:

  1. Unify MessageInfo, MessageRead and MessageWrite into a single trait Message.
  2. The standard library already has an abstraction for writing bytes: std::io::Write. Can Message::write perhaps simply take a &mut std::io::Write? This would remove the layer of constructing a Writer with a WriterBackend. A std::fs::File already implements Write and so does std::vec::Vec.
  3. On a related thought, we might want to consider offering an AsyncMessage type that uses the futures abstraction AsyncRead and AsyncWrite.
  4. This might be controversial but I think writing a message with a length-prefix is out of scope for this library. Length-prefixing messages is a protocol for how to parties can interact. I think we are better off with focusing on reading and writing protobufs in this library. I am guessing protobufs themselves are not self-descriptive for this very reason: It is too opinionated to be really generally useful.
  5. Provide only a handful of utility functions at the module root:
  6. We will need some abstractions within the library that the generated-code can use to avoid duplication. I think those should be in a separate module like codegen or internal which clearly signals to users that they shouldn't use these directly.

Let me know what you think. I am open to working on this and sending PRs :)

@snproj
Copy link
Collaborator

snproj commented Mar 6, 2023

Ah I was planning to remove the length prefix in serialize_into_vec() too as per my comment in #202. I guess you guys would be ok with that too? I don't think we should necessarily retain the old function with the length prefix in any form either; so far no one seems to have voiced out in favor of that.

@thomaseizinger
Copy link
Author

Cool!

Yeah I am okay with it being removed. That isn't the only part of my proposal though so I am keen to hear your opinion on the remaining points :)

@snproj
Copy link
Collaborator

snproj commented Mar 6, 2023

Re: Suggestion 1

Click to open

As far as I know, MessageInfo is meant to be kinda optional (as in, users can choose to generate MessageInfo implementations for autogenerated structs through a cmd line arg), so unifying all three might be a bit messy. So far, I don't see technical issues unifying MessageRead and MessageWrite, though. Would that make things easier?

Re: Suggestion 2

Click to open

Rewriting BytesWriter

Hmm, I assume the focus here is on improving the user interface (as opposed to making the library codebase smaller). If so, one way I can think of doing it is to rewrite BytesWriter to have its buf field be std::io::Write instead of &mut [u8]. Message::write() (which I'm assuming is the same as MessageWrite::write_message()) would then take in std::io::Write, and internally convert it into BytesWriter and then Writer and proceed as per normal. We might then be able to make Writer, WriterBackend and BytesWriter private. We still need Writer and WriterBackend to exist since they serve to encapsulate protobuf-specific writing functions (i.e. write a varint, write with a tag, etc).

So far, I don't think this is ideal

However, ultimately I don't think this is something we want to do; I'd have a few arguments in favor of the current system:

  1. Although write_message() takes in &mut Writer, this is easily constructed from any &mut [u8] using Writer::new(BytesWriter::new(buf)). I feel that being able to use any &mut [u8] might be a bit less restrictive than having to use something that implements std::io::Write.
  2. On that note, I'm not sure what the interface would be for users of no_std.
  3. Using my proposal above (the rewritten BytesWriter) would put the responsibility of optimizing byte writing and error checking on the user's choice implementation of std::io::Write; I would want the library to take care of that instead, since speed optimization is a core focus of this library. In the current system, since we tend to use Writer::new(BytesWriter::new(buf)), where buf is &mut [u8], our library has direct access to the user's byte array through the implementation of BytesWriter.
  4. Users who wish to use std::io::Write should still be able to use the current system by rewriting BytesWriter as proposed above.

Having said that, I'm open to counterarguments! I'm not very experienced, so do let me know if I'm straight up not making sense haha 😅

Re: Suggestion 3

Click to open

I'm not familiar with futures, but I'm not opposed to the idea of this as an addon. I assume that these would be additional structs that wouldn't really affect the existing users, and would provide essentially the same services as the existing ones; just that they would be used in codebases dealing with asynchronous reading and writing.

It would be quite a big addition, so I probably won't have time to work on it myself; I'd welcome a PR though! If you do decide to pursue it, perhaps it would be ideal to open an issue / draft PR early on while working on it, so that other users can discuss their needs before investing large amounts of time in it.

Out of curiosity, have you any thoughts on using tokio's equivalents (tokio::io::AsyncRead, etc)?

Re: Suggestion 4

Yep, I intend to submit a PR to remove the length prefixes soon.

Re: Suggestion 5

Click to open

I'm not sure if I understand this suggestion correctly, but currently, the module root directly exposes:

  • deserialize_from_slice()
  • serialize_into_slice()
  • serialize_into_vec()

If I'm not wrong, deserialize_from_slice() is the same as the proposed from_slice().

Meanwhile, serialize_into_vec() is almost the same as the proposed into_vec(), except that serialize_into_vec() returns Result<Vec<u8>>, which seems more in line with serde's into_vec() signature than Vec<u8>.

Since a core focus of this library is on speed optimization, I would want to keep serialize_into_slice() available too, since it's an option that doesn't require memory allocation.

Re: Suggestion 6

Sorry I'm not very sure what you mean by duplication; could you briefly clarify?

@thomaseizinger
Copy link
Author

Sorry for the late reply, this kind of went under the radar 😅

As far as I know, MessageInfo is meant to be kinda optional (as in, users can choose to generate MessageInfo implementations for autogenerated structs through a cmd line arg), so unifying all three might be a bit messy. So far, I don't see technical issues unifying MessageRead and MessageWrite, though. Would that make things easier?

It is really about reducing the API surface for the user. All three traits are auto-generated, right? But all three have to imported when you want to interact with the generated code.

Hmm, I assume the focus here is on improving the user interface (as opposed to making the library codebase smaller).

100%. I think the main APIs exposed to the user should be as simple as possible and serve the majority of use cases. Ideally, they only have to deal with concepts they are already familiar with, like std::io::Write.

We might then be able to make Writer, WriterBackend and BytesWriter private. We still need Writer and WriterBackend to exist since they serve to encapsulate protobuf-specific writing functions (i.e. write a varint, write with a tag, etc).

That would be ideal! :)

Not sure how ergonomic the no_std use-case needs to be served but it is likely the less popular one so as long as it is possible, it should be good enough?

I'm not sure if I understand this suggestion correctly, but currently, the module root directly exposes:

It is mostly about following similar naming patterns and avoiding unnecessary "naming". If I am using quick_protobuf, I already know that I'll be serializing / deserializing. The function name doesn't have to say that :)
I am a big fan of calling top-level module functions via their crate name. This adds a lot of context to the code without having to look up imports which makes reading and understanding code much easier because it doesn't interrupt your flow. For example:

fn main() {
	let message = quick_protobuf::from_slice::<MyMessage>(&buf).unwrap();
}

Since a core focus of this library is on speed optimization, I would want to keep serialize_into_slice() available too, since it's an option that doesn't require memory allocation.

Sure! But again, we should drop the serialize_ prefix because it is redundant :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants