-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor use of memory / too many allocations in HashLeaf
, leading to GC to work in overdrive
#216
Comments
If the same allocation is reused in HashLeaf--would that make it not threadsafe? If so, should be considered, I believe trees are often computed in parallel. |
HashLeaf
, leading to GC to work in overdrive
I would try the folllwing fwiw. Change these appends Lines 190 to 192 in 564300a
to minMaxNIDs := append(append(make([]byte, 0, resLen), nID...), nID...) and see if that removes the need for additional allocs. Similar for the other appends: Lines 195 to 197 in 564300a
|
Also, it is important to use benchstat for any related changes (since we disabled go-bencher recently): https://pkg.go.dev/golang.org/x/perf/cmd/benchstat Should include both mem and CPU. |
This could be somewhat related: #212 |
Unassigning myself to allow someone else to take over. |
In this issue #216, it was raised that 1. More time was spent allocating than hashing in the HashLeaf function 2. The min and max namespace functions were taking up a large amount of RAM --- ## More time was spent allocating than hashing in the HashLeaf function The benchmark test described in the issue listed above no longer exists, and when I run the `BenchmarkComputeRoot` in the nmt package [HERE](https://github.com/bruscar010/nmt/blob/main/nmt_test.go#L696) I can't replicate memory allocation taking longer than the hashing, although this may be related to my test parameters. Below are FlameGraphs of CPU/Mem usage (I've updated the Benchmark to have 20k leaf nodes with 512 bytes transaction size) [Base Mem Flamegraph](https://flamegraph.com/share/6a04ffab-c866-11ef-9832-26c3e5347170) [Base CPU Flamegraph](https://flamegraph.com/share/efce1d16-c868-11ef-b027-3a58f36d25ce) ``` BenchmarkComputeRoot/64-leaves-12 100 141785 ns/op 65166 B/op 1535 allocs/op BenchmarkComputeRoot/128-leaves-12 100 280652 ns/op 123894 B/op 3074 allocs/op BenchmarkComputeRoot/256-leaves-12 100 501392 ns/op 254611 B/op 6154 allocs/op ``` I've instead focused on the second issue --- ## The min and max namespace functions were taking up a large amount of RAM ### Solution 1: Reduce the number of calls to MinNamespace & MaxNamespace The HashNode function made 4 duplicate function calls to MaxNamespace/MinNamespace for the same node. ``` 1. HashNode -> ValidateNodes -> ValidateNodeFormat.Min/MaxNamespace 2. HashNode -> ValidateNodes -> validateSiblingsNamespaceOrder -> ValidateNodeFormat(left|right).Min/MaxNamespace 3. HashNode -> ValidateNodes -> validateSiblingsNamespaceOrder.Min/MaxNamespace 4. HashNode.Min/MaxNamespace ``` After reducing to a single call, here is the new benchmark output (running `go test -run="none" -bench=. -benchtime=100x -benchmem`) ``` BenchmarkComputeRoot/64-leaves-12 100 137292 ns/op 60100 B/op 905 allocs/op BenchmarkComputeRoot/128-leaves-12 100 237664 ns/op 113758 B/op 1804 allocs/op BenchmarkComputeRoot/256-leaves-12 100 462802 ns/op 234241 B/op 3604 allocs/op ``` ### Solution 2: Update the MinNamespace/MaxNamespace to return a slice of the existing array rather than creating a new underlying array Copying the slice stops the possibility [array capacity memory leaks](https://itnext.io/mastering-memory-management-in-go-avoiding-slice-related-leaks-7d5e97ee48d4) if for some reason the namespace ID variable's lifetime is longer than the lifetime of the original slice, but I don't think that is the case. [Slicing with a max capacity also stops the potential for appends to the namespace ID slice editing the underlying](https://build-your-own.org/blog/20230316_go_full_slice/) `ns|ns|hash` array. Benchmark output after this change ``` BenchmarkComputeRoot/64-leaves-12 100 136436 ns/op 58180 B/op 653 allocs/op BenchmarkComputeRoot/128-leaves-12 100 241836 ns/op 109747 B/op 1297 allocs/op BenchmarkComputeRoot/256-leaves-12 100 470231 ns/op 225924 B/op 2583 allocs/op ``` [New Mem Flamegraph](https://flamegraph.com/share/71b8ef18-c86a-11ef-b027-3a58f36d25ce) [New CPU Flamegraph](https://flamegraph.com/share/7f46d1b4-c86a-11ef-9832-26c3e5347170) --- The improvements made by the first solution are made redundant by the second, but I was initially reluctant to replace the slice copy in case it would lead to a memory leak. As I got more familiar with the package, I became more confident that a slicing the array may be more appropriate. **Note** - Given that validation is done in the `Push` method, adding an `UnsafeHashNode` that skips validation was also a method I considered, but I felt that this could lead to more complications & would not make any notable differences compared to either of the other solutions. - After reworking the `ValidateSiblings` function, I also removed the associated tests, but it looks like the same test cases also exist in `TestValidateNodes` [HERE](https://github.com/bruscar010/nmt/blob/main/hasher_test.go#L684) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced namespace validation methods with improved error reporting and context. - Introduced `nsIDRange` struct for better namespace ID range management. - **Refactor** - Streamlined namespace-related functions in `MinNamespace` and `MaxNamespace`. - Improved code readability and maintainability in namespace merkle tree hasher. - **Tests** - Added a benchmark for `ComputeRoot` with 20,000 leaves. - Removed existing sibling node validation test. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: francis <[email protected]>
I would also run bump node and run it with the change to see in pyroscope the diff |
More time is spent in HashLeaf allocating, than hashing:
https://github.com/celestiaorg/nmt/blob/master/hasher.go#L190
https://flamegraph.com/share/a54e891a-1114-11ee-b13f-de9431916b05
Possibly related:
In celestia-node, min and max namespace take up 200MB (20%) of RAM
The text was updated successfully, but these errors were encountered: