Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor use of memory / too many allocations in HashLeaf, leading to GC to work in overdrive #216

Open
musalbas opened this issue Jun 28, 2023 · 7 comments

Comments

@musalbas
Copy link
Member

musalbas commented Jun 28, 2023

More time is spent in HashLeaf allocating, than hashing:

image
https://github.com/celestiaorg/nmt/blob/master/hasher.go#L190
https://flamegraph.com/share/a54e891a-1114-11ee-b13f-de9431916b05


Possibly related:

In celestia-node, min and max namespace take up 200MB (20%) of RAM

image

@musalbas musalbas changed the title Poor use of memory / too many allocations for nmt.MinNamespace and nmt.MaxNamespace, leading to GC to work in overdrive Poor use of memory / too many allocations, leading to GC to work in overdrive Jun 28, 2023
@musalbas
Copy link
Member Author

If the same allocation is reused in HashLeaf--would that make it not threadsafe? If so, should be considered, I believe trees are often computed in parallel.

@musalbas musalbas changed the title Poor use of memory / too many allocations, leading to GC to work in overdrive Poor use of memory / too many allocations in HashLeaf, leading to GC to work in overdrive Jun 28, 2023
@liamsi
Copy link
Member

liamsi commented Jun 28, 2023

I would try the folllwing fwiw. Change these appends

nmt/hasher.go

Lines 190 to 192 in 564300a

minMaxNIDs := make([]byte, 0, resLen)
minMaxNIDs = append(minMaxNIDs, nID...) // nID
minMaxNIDs = append(minMaxNIDs, nID...) // nID || nID

to

minMaxNIDs :=  append(append(make([]byte, 0, resLen), nID...), nID...) 

and see if that removes the need for additional allocs. Similar for the other appends:

nmt/hasher.go

Lines 195 to 197 in 564300a

leafPrefixedNData := make([]byte, 0, len(ndata)+1)
leafPrefixedNData = append(leafPrefixedNData, LeafPrefix)
leafPrefixedNData = append(leafPrefixedNData, ndata...)

@liamsi
Copy link
Member

liamsi commented Jun 28, 2023

Also, it is important to use benchstat for any related changes (since we disabled go-bencher recently): https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

Should include both mem and CPU.

@liamsi
Copy link
Member

liamsi commented Jun 28, 2023

This could be somewhat related: #212

@staheri14
Copy link
Collaborator

Unassigning myself to allow someone else to take over.

@staheri14 staheri14 removed their assignment Dec 2, 2024
rootulp pushed a commit that referenced this issue Jan 17, 2025
In this issue #216, it was
raised that
1. More time was spent allocating than hashing in the HashLeaf function
2. The min and max namespace functions were taking up a large amount of
RAM

---

## More time was spent allocating than hashing in the HashLeaf function

The benchmark test described in the issue listed above no longer exists,
and when I run the `BenchmarkComputeRoot` in the nmt package
[HERE](https://github.com/bruscar010/nmt/blob/main/nmt_test.go#L696) I
can't replicate memory allocation taking longer than the hashing,
although this may be related to my test parameters.

Below are FlameGraphs of CPU/Mem usage (I've updated the Benchmark to
have 20k leaf nodes with 512 bytes transaction size)

[Base Mem
Flamegraph](https://flamegraph.com/share/6a04ffab-c866-11ef-9832-26c3e5347170)
[Base CPU
Flamegraph](https://flamegraph.com/share/efce1d16-c868-11ef-b027-3a58f36d25ce)

```
BenchmarkComputeRoot/64-leaves-12         	     100	    141785 ns/op	   65166 B/op	    1535 allocs/op
BenchmarkComputeRoot/128-leaves-12        	     100	    280652 ns/op	  123894 B/op	    3074 allocs/op
BenchmarkComputeRoot/256-leaves-12        	     100	    501392 ns/op	  254611 B/op	    6154 allocs/op
```

I've instead focused on the second issue

---

## The min and max namespace functions were taking up a large amount of
RAM

### Solution 1: Reduce the number of calls to MinNamespace &
MaxNamespace

The HashNode function made 4 duplicate function calls to
MaxNamespace/MinNamespace for the same node.
```
1. HashNode -> ValidateNodes -> ValidateNodeFormat.Min/MaxNamespace
2. HashNode -> ValidateNodes -> validateSiblingsNamespaceOrder -> ValidateNodeFormat(left|right).Min/MaxNamespace
3. HashNode -> ValidateNodes -> validateSiblingsNamespaceOrder.Min/MaxNamespace
4. HashNode.Min/MaxNamespace
```

After reducing to a single call, here is the new benchmark output
(running `go test -run="none" -bench=. -benchtime=100x -benchmem`)
```
BenchmarkComputeRoot/64-leaves-12         	     100	    137292 ns/op	   60100 B/op	     905 allocs/op
BenchmarkComputeRoot/128-leaves-12        	     100	    237664 ns/op	  113758 B/op	    1804 allocs/op
BenchmarkComputeRoot/256-leaves-12        	     100	    462802 ns/op	  234241 B/op	    3604 allocs/op
```

### Solution 2: Update the MinNamespace/MaxNamespace to return a slice
of the existing array rather than creating a new underlying array

Copying the slice stops the possibility [array capacity memory
leaks](https://itnext.io/mastering-memory-management-in-go-avoiding-slice-related-leaks-7d5e97ee48d4)
if for some reason the namespace ID variable's lifetime is longer than
the lifetime of the original slice, but I don't think that is the case.
[Slicing with a max capacity also stops the potential for appends to the
namespace ID slice editing the
underlying](https://build-your-own.org/blog/20230316_go_full_slice/)
`ns|ns|hash` array.

Benchmark output after this change
```
BenchmarkComputeRoot/64-leaves-12         	     100	    136436 ns/op	   58180 B/op	     653 allocs/op
BenchmarkComputeRoot/128-leaves-12        	     100	    241836 ns/op	  109747 B/op	    1297 allocs/op
BenchmarkComputeRoot/256-leaves-12        	     100	    470231 ns/op	  225924 B/op	    2583 allocs/op
```

[New Mem
Flamegraph](https://flamegraph.com/share/71b8ef18-c86a-11ef-b027-3a58f36d25ce)
[New CPU
Flamegraph](https://flamegraph.com/share/7f46d1b4-c86a-11ef-9832-26c3e5347170)

---

The improvements made by the first solution are made redundant by the
second, but I was initially reluctant to replace the slice copy in case
it would lead to a memory leak. As I got more familiar with the package,
I became more confident that a slicing the array may be more
appropriate.

**Note**
- Given that validation is done in the `Push` method, adding an
`UnsafeHashNode` that skips validation was also a method I considered,
but I felt that this could lead to more complications & would not make
any notable differences compared to either of the other solutions.
- After reworking the `ValidateSiblings` function, I also removed the
associated tests, but it looks like the same test cases also exist in
`TestValidateNodes`
[HERE](https://github.com/bruscar010/nmt/blob/main/hasher_test.go#L684)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced namespace validation methods with improved error reporting
and context.
- Introduced `nsIDRange` struct for better namespace ID range
management.

- **Refactor**
- Streamlined namespace-related functions in `MinNamespace` and
`MaxNamespace`.
- Improved code readability and maintainability in namespace merkle tree
hasher.

- **Tests**
	- Added a benchmark for `ComputeRoot` with 20,000 leaves.
	- Removed existing sibling node validation test.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: francis <[email protected]>
@cristaloleg
Copy link
Contributor

Should we close after #285 & #287 ?

@Wondertan
Copy link
Member

I would also run bump node and run it with the change to see in pyroscope the diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants