refactor: Optimize Hexadecimal Serialization and Deserialization for u64 #21

codeesura · 2023-12-06T18:53:16Z

Pull Request type

Refactoring (no functional changes, no API changes)

What is the current behavior?

Before these changes, the serialize and deserialize methods for u64 in the NumAsHex trait were implemented with less emphasis on performance optimization. The serialize method utilized a dynamic approach for string construction, and deserialize handled hexadecimal string parsing without specific optimization techniques. While fully functional, these implementations had potential areas for performance improvement, especially in scenarios involving large-scale or frequent serialization/deserialization operations.

Resolves: #NA

What is the new behavior?

Optimized the serialize method in the NumAsHex trait for u64 to use a fixed-size buffer, enhancing performance and reducing memory overhead.
Refined the deserialize method for improved efficiency in parsing hexadecimal strings into u64, using a more direct approach and safe overflow handling with checked_mul and checked_add.

Does this introduce a breaking change?

No

Other information

This PR focuses on enhancing performance without altering the functionality of the existing code. It has been thoroughly tested to ensure that the new implementations of serialize and deserialize maintain the same behavior as their predecessors.

0xLucqs · 2023-12-20T14:34:32Z

Multiple things, don't remove the useful documentation, sure it makes the code less complex but without a benchmark it's hard to say optimize

crates/starknet-types-rpc/src/custom_serde/num_as_hex.rs

nils-mathieu · 2024-01-12T12:34:22Z

I am not sure how this modification makes the serialization routines faster. They are doing the same exact thing, except without using iterators.

In the deserialization function, you add a .collect<Vec<_>>() which copies part of the input to the heap. That's bad because allocations are generally slow, and in the context of parsing a number (which is very common), that overhead cannot be ignored. That allocation can be removed, sure, but then it's the same as for the serialization routine: the code does exactly the same thing, just without iterators.

If you really need to optimize those functions, I think the best is to junp down to unsafe code and access the string without checking bounds within the loops, and avoid checking whether the final string is valid UTF-8 (we know it is because we wrote ascii characters to it).

That being said, is that gain really worth introducing unsafe code into the codebase?

Oppen · 2024-01-12T14:08:43Z

If you really need to optimize those functions, I think the best is to junp down to unsafe code and access the string without checking bounds within the loops, and avoid checking whether the final string is valid UTF-8 (we know it is because we wrote ascii characters to it).

The bounds checks are likely optimized out if you check the length beforehand. What may be more interesting here is replacing String::from_utf8 by String::from_utf8_unchecked and, as you mentioned, skipping the copy altogether when skipping zeros by working with slices. str::trim_start_matches is probably what we want, although a more manual approach may or may not be faster.

In any case, without a benchmark this is all a bit moot. I'm in favor of it if it simplifies the code, not so much if we claim it's faster, make it more complex, and provide no evidence of speed improvement.

Oppen · 2024-01-12T14:12:31Z

Also, if we have no benchmark I would much rather just use from_str_radix and let the stdlib work for us.

nils-mathieu · 2024-01-19T14:00:58Z

Also, if we have no benchmark I would much rather just use from_str_radix and let the stdlib work for us.

Ahah, now that I think about it, using stdlib might even be faster than what I had wrote 😅

optimize hexadecimal

02821b4

codeesura requested review from spapinistarkware, AbdelStark, dan-starkware and xJonathanLEI as code owners December 6, 2023 18:53

Merge branch 'starknet-io:main' into hexa

07e6f94

codeesura requested review from 0xLucqs and pefontana as code owners December 20, 2023 15:34

codeesura added 2 commits December 20, 2023 18:37

Update num_as_hex.rs

880f511

Update num_as_hex.rs

071ee91

Oppen reviewed Dec 21, 2023

View reviewed changes

crates/starknet-types-rpc/src/custom_serde/num_as_hex.rs Outdated Show resolved Hide resolved

codeesura and others added 4 commits December 21, 2023 13:18

hex parsing by trimming leading zeros

6e23eff

add comment

e130143

use alloc for workflow

c0fbaae

Merge branch 'starknet-io:main' into hexa

304d8dc

codeesura requested a review from Oppen December 22, 2023 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Optimize Hexadecimal Serialization and Deserialization for u64 #21

refactor: Optimize Hexadecimal Serialization and Deserialization for u64 #21

codeesura commented Dec 6, 2023 •

edited

Loading

0xLucqs commented Dec 20, 2023

nils-mathieu commented Jan 12, 2024

Oppen commented Jan 12, 2024

Oppen commented Jan 12, 2024

nils-mathieu commented Jan 19, 2024

refactor: Optimize Hexadecimal Serialization and Deserialization for u64 #21

Are you sure you want to change the base?

refactor: Optimize Hexadecimal Serialization and Deserialization for u64 #21

Conversation

codeesura commented Dec 6, 2023 • edited Loading

Pull Request type

What is the current behavior?

What is the new behavior?

Does this introduce a breaking change?

Other information

0xLucqs commented Dec 20, 2023

nils-mathieu commented Jan 12, 2024

Oppen commented Jan 12, 2024

Oppen commented Jan 12, 2024

nils-mathieu commented Jan 19, 2024

codeesura commented Dec 6, 2023 •

edited

Loading