Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] use base64 over base64-bytestring #1631

Closed
wants to merge 1 commit into from
Closed

Conversation

chessai
Copy link
Contributor

@chessai chessai commented Apr 20, 2023

I added SIMD to base64 (See PR).

Per profiling, chainweb-node spends about 6% of its time performing base64 decode on blocks. According to the microbenchmarks, this should hopefully offer a speedup of ~15x on decode of things that are the average block size.

The other nice thing about the base64 library is that it has the option of being more type-safe.
All these assertions I've added here are unsafe, but, they were just implicit before.

@jwiegley
Copy link
Collaborator

Does this pass replay?

@chessai chessai marked this pull request as draft April 20, 2023 06:10
@chessai
Copy link
Contributor Author

chessai commented Apr 20, 2023

Does this pass replay?

No; I think that we actually do need to keep base64-bytestring for certain operations where we care about the on-chain error messages. Right now, a replay fails immediately because of one such instance.

Copy link
Contributor

@larskuhtz larskuhtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should double check that all use of base64 is the URL variant without padding.

Regarding replay: this should be fine since chainweb doesn't put base64 encoded things on-chain. Base64 is used only as textual representation of binary data (mostly in JSON). Merkle hashes of on-chain data are computed on the plain binary representation.

However, I am not 100% certain about the use of base64 within pact service. In theory it shouldn't affect on-chain data (in particular errors in pact service shouldn't be included on-chain), but we should double check that.

Comment on lines -328 to -329
, base64-bytestring >= 1.0 && < 1.1
-- due to change in error message that is inlined in pact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 the upper bound is tracked in the cabal.project file.

@@ -325,8 +325,7 @@ library
, attoparsec >= 0.13
, base >= 4.12 && < 5
, base16-bytestring >= 0.1
, base64-bytestring >= 1.0 && < 1.1
-- due to change in error message that is inlined in pact
, base64 >= 0.5 && < 0.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this known to break with base64 >=0.6? Otherwise please remove, since we don't manually track "soft" upper bounds.

Suggested change
, base64 >= 0.5 && < 0.6
, base64 >= 0.5

@@ -746,7 +750,7 @@ getDecimal = do

mockDecode :: ByteString -> Either String MockTx
mockDecode s = do
s' <- B64.decode s
s' <- first T.unpack $ B64U.decodeBase64 $ B64.assertBase64 @'B64.UrlPadded @_ s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned about B64.UrlPadded. Does this mean the it uses padding? All use of base64 throughout chainweb should be base64Url without padding.

Comment on lines -607 to +611
. first (Base64DecodeException . T.pack)
. B64.decode
. first Base64DecodeException
. B64.decodeBase64
. B64.assertBase64 @'B64.StdPadded @_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the functions for the unpadded (and non-url) variants used anywhere in the code? If not we should consider removing them, so that nobody uses them. This is to keep used encodings simple and consistent.

@@ -325,8 +325,7 @@ library
, attoparsec >= 0.13
, base >= 4.12 && < 5
, base16-bytestring >= 0.1
, base64-bytestring >= 1.0 && < 1.1
-- due to change in error message that is inlined in pact
, base64 >= 0.5 && < 0.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that version 0.5 of base64 hasn't yet been released. The latest version on Hackage is 0.4.2.4. 0.5 is only available as source repository from Github. If 0.5 is actually required, we would have to ask the maintainer (@emilypi?) to release 0.5 on Hackage or include it as source repository package in cabal.project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hasn't been released and I"m still working on the PR, which will significantly affect this PR.

@emilypi
Copy link
Member

emilypi commented Apr 20, 2023

I like your enthusiasm @chessai! However, this is jumping the gun alittle bit. The PR that is going in to serve as the basis for base64-0.5 is going to streamline alot of the functionality changed in this PR that does alot of redundant work (adding pads, removing pads, converting to/from text etc), so I ask that we hold off on it here until I'm confident I can nail down the api.

That said, with the addition of SIMD support, I do endorse this change, and I would like to revert from base64-bytestring at some point, because it is very suboptimal (by design, per Bos + HVR's request).

Does this pass replay?

It should @jwiegley, since we're going in at the chainweb level, and the problems with replay w.r.t. Base64 are with the Pact natives wholesale committing to the error format returned by base64-bytestring. I'm working on reverting those changes in the latest base64 major version so that we can switch to that in Pact, but the changes here do not affect Pact, so they shouldn't affect block hashes.

@chessai
Copy link
Contributor Author

chessai commented Apr 20, 2023

I like your enthusiasm @chessai! However, this is jumping the gun alittle bit. The PR that is going in to serve as the basis for base64-0.5 is going to streamline alot of the functionality changed in this PR that does alot of redundant work (adding pads, removing pads, converting to/from text etc), so I ask that we hold off on it here until I'm confident I can nail down the api.

That said, with the addition of SIMD support, I do endorse this change, and I would like to revert from base64-bytestring at some point, because it is very suboptimal (by design, per Bos + HVR's request).

Does this pass replay?

It should @jwiegley, since we're going in at the chainweb level, and the problems with replay w.r.t. Base64 are with the Pact natives wholesale committing to the error format returned by base64-bytestring. I'm working on reverting those changes in the latest base64 major version so that we can switch to that in Pact, but the changes here do not affect Pact, so they shouldn't affect block hashes.

Thanks,

Yeah, after talking to you yesterday/digging around in your PR/the codebase more, it seems like this is indeed jumping the gun.
I just saw those speedups and I was really excited.

I'll leave this open for now, because I do think it's something we should return to.

@chessai chessai closed this Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants