Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash should serialize as byte string #412

Open
casey opened this issue Jul 9, 2024 · 5 comments
Open

Hash should serialize as byte string #412

casey opened this issue Jul 9, 2024 · 5 comments

Comments

@casey
Copy link

casey commented Jul 9, 2024

Currently, Hash serializes as a sequence of integers, which is inefficient if the underlying codec supports a more compact byte string type. For example, serializing Hash to CBOR will produce an up-to 66 byte integer sequence (the precise size of the sequence will depend on the value of the hash) instead of a fixed size 34 byte byte string.

Serializing to and deserializing from a byte string can be accomplished with serde_bytes:

pub struct Hash(#[serde(with = "serde_bytes") [u8; OUT_LEN]);
@oconnor663
Copy link
Member

oconnor663 commented Jul 10, 2024

Agreed. What do you think of dd0afd6? (https://github.com/BLAKE3-team/BLAKE3/compare/serde_bytes)

@casey
Copy link
Author

casey commented Jul 10, 2024

Looks awesome! Nice test too, 0x58 (decimal 88) indicates a CBOR byte string with a byte length, and 32 is the length. And great that serde_bytes can still deserialize the old representation.

@BurningEnlightenment
Copy link
Collaborator

Two notes from someone who wrote a CBOR serialization library in the past:

  • Use 0xfe / 254 for byte test data instead of 0xff / 255 due to the fact that 0xfe is not a well formed CBOR item.
  • due to the CBOR item structure, it is a lot easier (i.e. possible at all) to read items written in hex form. So I suggest converting the current test vector from decimal encoding to hexadecimal encoding.

oconnor663 added a commit that referenced this issue Jul 15, 2024
This mostly reverts commits 8416b16 and
dd0afd6.

Changing the serialization of Hash can only be backwards-compatible in
self-describing formats like CBOR. In non-self-describing formats like
bincode, the deserializer has to know in advance which serialization
format was used.

Fixes #414.

Reopens #412.
@oconnor663 oconnor663 reopened this Jul 15, 2024
@oconnor663
Copy link
Member

#414 pointed out that this change was backwards-incompatible for non-self-describing serialization formats. I've published v1.5.3 to revert it.

Does anyone have ideas for a change along these lines that wouldn't break formats like bincode?

jefferyq2 pushed a commit to jefferyq2/BLAKE3 that referenced this issue Jul 23, 2024
jefferyq2 pushed a commit to jefferyq2/BLAKE3 that referenced this issue Jul 23, 2024
This mostly reverts commits 8416b16 and
dd0afd6.

Changing the serialization of Hash can only be backwards-compatible in
self-describing formats like CBOR. In non-self-describing formats like
bincode, the deserializer has to know in advance which serialization
format was used.

Fixes BLAKE3-team#414.

Reopens BLAKE3-team#412.
@burdges
Copy link

burdges commented Jul 28, 2024

It's a nice example of serde's shortcomings. Afaik, crypto crates typically either avoid serde entirely, thus serde to serde users only care about one or two formats, or else define their serialization as bytes, preferably fixed length, and provide optional serde wrappers which do bytes, as fixed length if possible.

You'll need some new type here, with impl Froms both ways, but the default matters, as the other becomes niche. I'd guess the default should favor CBOR, like you first did here, but maybe ask if current behavior actually preferable for any serde target format?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants