-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chainhash, wire, btcutil, main: Memory efficient txhash #1978
chainhash, wire, btcutil, main: Memory efficient txhash #1978
Conversation
Pull Request Test Coverage Report for Build 7197319343
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimization!
chaincfg/chainhash/hashfuncs.go
Outdated
@@ -31,3 +34,15 @@ func DoubleHashH(b []byte) Hash { | |||
first := sha256.Sum256(b) | |||
return Hash(sha256.Sum256(first[:])) | |||
} | |||
|
|||
// DoubleHashRaw calculates hash(hash(h)) and returns the resulting bytes. | |||
func DoubleHashRaw(h hash.Hash) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps specify that the hash.Hash
instance should already have the message pre-image written to it?
I think with that added context, the test below is also easier to understand.
wire/msgtx.go
Outdated
buf := bytes.NewBuffer(make([]byte, 0, msg.SerializeSizeStripped())) | ||
_ = msg.SerializeNoWitness(buf) | ||
return chainhash.DoubleHashH(buf.Bytes()) | ||
txHash := sha256.New() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other optimization that I've found to really speed things up a lot of caching the result after the first instance. During validation, we end up recomputing this hash a few times: merkle tree check, sighash calculation, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btcutil.Tx
caches the hash right? Maybe try to use that more instead of the raw MsgTx
. Though I've not looked yet and maybe that's not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btcutil.Tx caches the hash right?
Yeah that caches (the hash), but in most instances we use wire.MsgTx
instead, mainly for historical reasons (but also because we decode from the wire into that struct). For other areas like sighash calculation, we don't quite want just the hash but also want the fully serialized transaction bytes as well. I was doing some syncing tests the other day and did some profiling of a catch up sync well past the checkpoints: #1376 (comment)
btcutil/psbt/go.mod
Outdated
@@ -24,3 +24,5 @@ require ( | |||
replace github.com/btcsuite/btcd/btcutil => ../ | |||
|
|||
replace github.com/btcsuite/btcd => ../.. | |||
|
|||
replace github.com/btcsuite/btcd/chaincfg/chainhash => ../../chaincfg/chainhash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid making the circular dependency problem between some of the packages more annoying, I think we should first push the change in chaincfg/chainhash
, push a new tag, then update these two go.mod
files to the latest tag.
I wanted to get rid of the other local replace directives as well in another PR, just never got to it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I make a separate PR that only adds the DoubleHashRaw
function?
I can definitely break this up into multiple PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to avoid needing more local replace directives, this needs to be done in two PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ordering wise we'll need to:
- Remove this section.
- Merge the PR.
- Tag a new version.
- Make a new PR that actually uses the new functions.
839657c
to
cd8309f
Compare
The last 2 force pushes change the function signature of
escapes to the heap with how it was implemented in the previous commits. It ended up allocating less memory than the current master but did allocate 2 objects per hash instead of 1. This change allows the compiler to keep the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool optimization!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good, one question about the modification made to the benchmarks
wire/bench_test.go
Outdated
for i := 0; i < b.N; i++ { | ||
var buf bytes.Buffer | ||
if err := genesisCoinbaseTx.Serialize(&buf); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up including the serialization over head in each invocation, which I don't think is what we want? So you can either use StopTimer
, or just move it back to lie outside the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading the commit message, I think that' was the purpose of the BenchmarkTxHash
benchmark: to account for that extra overhead. Also if you add ReportAllocs
, then all these calls will also report on the allocations.
// in byte slice. Pre-allocating here saves an allocation on the second | ||
// hash as we can reuse it. This allocation also does not escape to the | ||
// heap, saving an allocation. | ||
buf := make([]byte, 0, HashSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a tangible difference here (re allocations) with delcaring this as a var buf [HashSize]byte
vs using make and a slice here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping re this, not terribly blocking though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the same right?
// DoubleHashRaw calculates hash(hash(w)) where w is the resulting bytes from
// the given serialize function and returns the resulting bytes as a Hash.
func DoubleHashRaw(serialize func(w io.Writer) error) Hash {
// Encode the transaction into the hash. Ignore the error returns
// since the only way the encode could fail is being out of memory
// or due to nil pointers, both of which would cause a run-time panic.
h := sha256.New()
_ = serialize(h)
// This buf is here because Sum() will append the result to the passed
// in byte slice. Pre-allocating here saves an allocation on the second
// hash as we can reuse it. This allocation also does not escape to the
// heap, saving an allocation.
var buf [HashSize]byte
first := h.Sum(buf[:])
h.Reset()
h.Write(first)
res := h.Sum(buf[:])
return *(*Hash)(res)
}
This will fail tests because the buf
already has 0 values occupying it.
[I] calvin@bitcoin ~/b/g/b/g/s/g/b/b/c/chainhash (memory-efficient-txhash)> go test
--- FAIL: TestDoubleHashFuncs (0.00s)
hashfuncs_test.go:148: DoubleHashRaw("") = 0000000000000000000000000000000000000000000000000000000000000000, want 5df6e0e2761359d30a8275058e299fcc0381534545f55cf43e41983f5d4c9456
hashfuncs_test.go:148: DoubleHashRaw("a") = 0000000000000000000000000000000000000000000000000000000000000000, want bf5d3affb73efd2ec6c36ad3112dd933efed63c4e1cbffcfa88e2759c144f2d8
hashfuncs_test.go:148: DoubleHashRaw("ab") = 0000000000000000000000000000000000000000000000000000000000000000, want a1ff8f1856b5e24e32e3882edd4a021f48f28a8b21854b77fdef25a97601aace
hashfuncs_test.go:148: DoubleHashRaw("abc") = 0000000000000000000000000000000000000000000000000000000000000000, want 4f8b42c22dd3729b519ba6f68d2da7cc5b2d606d05daed5ad5128cc03e6c6358
hashfuncs_test.go:148: DoubleHashRaw("abcd") = 0000000000000000000000000000000000000000000000000000000000000000, want 7e9c158ecd919fa439a7a214c9fc58b85c3177fb1613bdae41ee695060e11bc6
hashfuncs_test.go:148: DoubleHashRaw("abcde") = 0000000000000000000000000000000000000000000000000000000000000000, want 1d72b6eb7ba8b9709c790b33b40d8c46211958e13cf85dbcda0ed201a99f2fb9
hashfuncs_test.go:148: DoubleHashRaw("abcdef") = 0000000000000000000000000000000000000000000000000000000000000000, want ce65d4756128f0035cba4d8d7fae4e9fa93cf7fdf12c0f83ee4a0e84064bef8a
hashfuncs_test.go:148: DoubleHashRaw("abcdefg") = 0000000000000000000000000000000000000000000000000000000000000000, want dad6b965ad86b880ceb6993f98ebeeb242de39f6b87a458c6510b5a15ff7bbf1
hashfuncs_test.go:148: DoubleHashRaw("abcdefgh") = 0000000000000000000000000000000000000000000000000000000000000000, want b9b12e7125f73fda20b8c4161fb9b4b146c34cf88595a1e0503ca2cf44c86bc4
hashfuncs_test.go:148: DoubleHashRaw("abcdefghi") = 0000000000000000000000000000000000000000000000000000000000000000, want 546db09160636e98405fbec8464a84b6464b32514db259e235eae0445346ffb7
hashfuncs_test.go:148: DoubleHashRaw("abcdefghij") = 0000000000000000000000000000000000000000000000000000000000000000, want 27635cf23fdf8a10f4cb2c52ade13038c38718c6d7ca716bfe726111a57ad201
hashfuncs_test.go:148: DoubleHashRaw("Discard medicine more than two years old.") = 0000000000000000000000000000000000000000000000000000000000000000, want ae0d8e0e7c0336f0c3a72cefa4f24b625a6a460417a921d066058a0b81e23429
hashfuncs_test.go:148: DoubleHashRaw("He who has a shady past knows that nice guys finish last.") = 0000000000000000000000000000000000000000000000000000000000000000, want eeb56d02cf638f87ea8f11ebd5b0201afcece984d87be458578d3cfb51978f1b
hashfuncs_test.go:148: DoubleHashRaw("I wouldn't marry him with a ten foot pole.") = 0000000000000000000000000000000000000000000000000000000000000000, want dc640bf529608a381ea7065ecbcd0443b95f6e4c008de6e134aff1d36bd4b9d8
hashfuncs_test.go:148: DoubleHashRaw("Free! Free!/A trip/to Mars/for 900/empty jars/Burma Shave") = 0000000000000000000000000000000000000000000000000000000000000000, want 42e54375e60535eb07fc15c6350e10f2c22526f84db1d6f6bba925e154486f33
hashfuncs_test.go:148: DoubleHashRaw("The days of the digital watch are numbered. -Tom Stoppard") = 0000000000000000000000000000000000000000000000000000000000000000, want 4ed6aa9b88c84afbf928710b03714de69e2ad967c6a78586069adcb4c470d150
hashfuncs_test.go:148: DoubleHashRaw("Nepal premier won't resign.") = 0000000000000000000000000000000000000000000000000000000000000000, want 590c24d1877c1919fad12fe01a8796999e9d20cfbf9bc9bc72fa0bd69f0b04dd
hashfuncs_test.go:148: DoubleHashRaw("For every action there is an equal and opposite government program.") = 0000000000000000000000000000000000000000000000000000000000000000, want 37d270687ee8ebafcd3c1a32f56e1e1304b3c93f252cb637d57a66d59c475eca
hashfuncs_test.go:148: DoubleHashRaw("His money is twice tainted: 'taint yours and 'taint mine.") = 0000000000000000000000000000000000000000000000000000000000000000, want 306828fd89278838bb1c544c3032a1fd25ea65c40bba586437568828a5fbe944
hashfuncs_test.go:148: DoubleHashRaw("There is no reason for any individual to have a computer in their home. -Ken Olsen, 1977") = 0000000000000000000000000000000000000000000000000000000000000000, want 49965777eac71faf1e2fb0f6b239ba2fae770977940fd827bcbfe15def6ded53
hashfuncs_test.go:148: DoubleHashRaw("It's a tiny change to the code and not completely disgusting. - Bob Manchek") = 0000000000000000000000000000000000000000000000000000000000000000, want df99ee4e87dd3fb07922dee7735997bbae8f26db20c86137d4219fc4a37b77c3
hashfuncs_test.go:148: DoubleHashRaw("size: a.out: bad magic") = 0000000000000000000000000000000000000000000000000000000000000000, want 920667c84a15b5ee3df4620169f5c0ec930cea0c580858e50e68848871ed65b4
hashfuncs_test.go:148: DoubleHashRaw("The major problem is with sendmail. -Mark Horton") = 0000000000000000000000000000000000000000000000000000000000000000, want 5e817fe20848a4a3932db68e90f8d54ec1b09603f0c99fdc051892b776acd462
hashfuncs_test.go:148: DoubleHashRaw("Give me a rock, paper and scissors and I will move the world. CCFestoon") = 0000000000000000000000000000000000000000000000000000000000000000, want 6a9d47248ed38852f5f4b2e37e7dfad0ce8d1da86b280feef94ef267e468cff2
hashfuncs_test.go:148: DoubleHashRaw("If the enemy is within range, then so are you.") = 0000000000000000000000000000000000000000000000000000000000000000, want 2e7aa1b362c94efdbff582a8bd3f7f61c8ce4c25bbde658ef1a7ae1010e2126f
hashfuncs_test.go:148: DoubleHashRaw("It's well we cannot hear the screams/That we create in others' dreams.") = 0000000000000000000000000000000000000000000000000000000000000000, want e6729d51240b1e1da76d822fd0c55c75e409bcb525674af21acae1f11667c8ca
hashfuncs_test.go:148: DoubleHashRaw("You remind me of a TV show, but that's all right: I watch it anyway.") = 0000000000000000000000000000000000000000000000000000000000000000, want 09945e4d2743eb669f85e4097aa1cc39ea680a0b2ae2a65a42a5742b3b809610
hashfuncs_test.go:148: DoubleHashRaw("C is as portable as Stonehedge!!") = 0000000000000000000000000000000000000000000000000000000000000000, want 1018d8b2870a974887c5174360f0fbaf27958eef15b24522a605c5dae4ae0845
hashfuncs_test.go:148: DoubleHashRaw("Even if I could be Shakespeare, I think I should still choose to be Faraday. - A. Huxley") = 0000000000000000000000000000000000000000000000000000000000000000, want 97c76b83c6645c78c261dcdc55d44af02d9f1df8057f997fd08c310c903624d5
hashfuncs_test.go:148: DoubleHashRaw("The fugacity of a constituent in a mixture of gases at a given temperature is proportional to its mole fraction. Lewis-Randall Rule") = 0000000000000000000000000000000000000000000000000000000000000000, want 6bcbf25469e9544c5b5806b24220554fedb6695ba9b1510a76837414f7adb113
hashfuncs_test.go:148: DoubleHashRaw("How can you write a big system without C++? -Paul Glick") = 0000000000000000000000000000000000000000000000000000000000000000, want 1041988b06835481f0845be2a54f4628e1da26145b2de7ad1be3bb643cef9d4f
FAIL
exit status 1
FAIL github.com/btcsuite/btcd/chaincfg/chainhash 0.001s
wire/msgtx.go
Outdated
buf := bytes.NewBuffer(make([]byte, 0, msg.SerializeSizeStripped())) | ||
_ = msg.SerializeNoWitness(buf) | ||
return chainhash.DoubleHashH(buf.Bytes()) | ||
return chainhash.DoubleHashRaw(msg.SerializeNoWitness) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I think this just needs a rebase! |
DoubleHashRaw provides a simple function for doing double hashes. Since it uses the digest instead of making the caller allocate a byte slice, it can be more memory efficient vs other double hash functions.
cd8309f
to
dff8559
Compare
Rebased For ibd and block verification, I think #2023 is going to be better but that one's still in the works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to land, but we'll need to do the module dance to make sure we don't break things along the way (see the newly added comment).
btcutil/psbt/go.mod
Outdated
@@ -24,3 +24,5 @@ require ( | |||
replace github.com/btcsuite/btcd/btcutil => ../ | |||
|
|||
replace github.com/btcsuite/btcd => ../.. | |||
|
|||
replace github.com/btcsuite/btcd/chaincfg/chainhash => ../../chaincfg/chainhash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ordering wise we'll need to:
- Remove this section.
- Merge the PR.
- Tag a new version.
- Make a new PR that actually uses the new functions.
// in byte slice. Pre-allocating here saves an allocation on the second | ||
// hash as we can reuse it. This allocation also does not escape to the | ||
// heap, saving an allocation. | ||
buf := make([]byte, 0, HashSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping re this, not terribly blocking though.
dff8559
to
375f79d
Compare
Removed commits that were changing the go.mod files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 📸
Will tag the new version, then make a new PR updating relevant |
When profiling the ibd, I noticed that
TxHash()
is allocating quite a bit of memory.I also noticed this with utreexod in the past and I resolved this by creating a new double hash function. I've benchmarked it and made is more efficient in this PR.
Benchstat for
BenchmarkTxHash()
showed a slight bit of increase insec/op
which is likely due to the overhead of serializing into ahash.Hash
instead of thebytes.Buffer
in the previousTxHash()
. However, for real life cases, the newTxHash()
will be faster as we're saving on the unnecessary serialization intobytes.Buffer
as sha256.Sum256()will callWrite()
into the hash anyways.The memory allocation savings is ~37% compared to the old
TxHash()
. This matters a lot becauseTxHash()
gets called a lot.For bigger transactions, the memory savings is even greater. I performed the same test but with
multiTx
instead ofgenesisCoinbaseTx
.Below is the benchstat for that test. We see less of a penalty for speed and see ~41% savings in memory allocated.
The benchmarks for
BenchmarkDoubleHash*
show a significant speedup forDoubleHashRaw()
. This is likely because the other hashes have to calldigest.Write()
insha256.Sum256()
while theDoubleHashRaw()
function gets to skip that.I can also post some before and after ibd profiling if requested.