Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lbrycrdd block check performance degrades non-linearly with txset #389

Open
hashbender opened this issue May 28, 2020 · 18 comments
Open

lbrycrdd block check performance degrades non-linearly with txset #389

hashbender opened this issue May 28, 2020 · 18 comments

Comments

@hashbender
Copy link

hashbender commented May 28, 2020

Background
I run Luxor mining pool. Since ~5/21 we've seen larger numbers of transactions on the network, which is a good thing, but we've seen this impact the performance of lbrycrdd significantly.

Issue
Blocks should not take >1s to Connect. If they do, this causes a huge problem for miners, especially for short block times like Lbry.

Expectation
Performance in the same order of magnitude as BTC -

Bench output from lbrycrdd for block 771809

2020-05-28T16:11:32Z   - Load block from disk: 18.95ms [2.92s]
2020-05-28T16:11:32Z     - Sanity checks: 1.92ms [1.32s (0.88ms/blk)]
2020-05-28T16:11:32Z     - Fork checks: 0.05ms [0.08s (0.05ms/blk)]
2020-05-28T16:11:34Z       - Connect 398 transactions: 1822.49ms (4.579ms/tx, 2.925ms/txin) [327.93s (217.75ms/blk)]
2020-05-28T16:11:34Z     - Verify 623 txins: 1822.81ms (2.926ms/txin) [328.06s (217.84ms/blk)]
2020-05-28T16:11:34Z     - Index writing: 11.27ms [8.90s (5.91ms/blk)]
2020-05-28T16:11:34Z     - Callbacks: 0.05ms [0.09s (0.06ms/blk)]
2020-05-28T16:11:34Z   - Connect total: 1836.48ms [338.82s (224.98ms/blk)]
2020-05-28T16:11:34Z TrieCache size: 30670 nodes on block 771810, batch writes 3671856 bytes.
2020-05-28T16:11:34Z   - Flush: 95.29ms [68.03s (45.17ms/blk)]
2020-05-28T16:11:34Z   - Writing chainstate: 0.06ms [0.09s (0.06ms/blk)]
2020-05-28T16:11:34Z UpdateTip: new best=f46a34be4f5c791cf0b1d8667bae134c7eb10ee3dcfd89f04514d2e44a39cbdf height=771809 version=0x20000000 log2_work=73.716032 txb=398 tx=13122143 date='2020-05-28T15:47:20Z' progress=0.937661 cache=176.6MiB(548393txo)
2020-05-28T16:11:34Z   - Connect postprocess: 0.82ms [2.73s (1.81ms/blk)]
2020-05-28T16:11:34Z - Connect block: 1951.61ms [412.60s (273.97ms/blk)]

Bench output for a similar block from btc (height 632057)

2020-05-28T16:32:32Z   - Load block from disk: 26.86ms [12.35s]
2020-05-28T16:32:32Z     - Sanity checks: 6.81ms [1.82s (2.91ms/blk)]
2020-05-28T16:32:32Z     - Fork checks: 0.10ms [0.04s (0.06ms/blk)]
2020-05-28T16:32:33Z       - Connect 3034 transactions: 517.85ms (0.171ms/tx, 0.091ms/txin) [466.69s (746.70ms/blk)]
2020-05-28T16:32:33Z     - Verify 5688 txins: 518.47ms (0.091ms/txin) [471.33s (754.12ms/blk)]
2020-05-28T16:32:33Z     - Index writing: 18.54ms [11.76s (18.82ms/blk)]
2020-05-28T16:32:33Z     - Callbacks: 0.18ms [0.06s (0.10ms/blk)]
2020-05-28T16:32:33Z   - Connect total: 545.06ms [485.48s (776.77ms/blk)]
2020-05-28T16:32:33Z   - Flush: 14.22ms [8.43s (13.48ms/blk)]
2020-05-28T16:32:33Z   - Writing chainstate: 0.08ms [0.05s (0.07ms/blk)]
2020-05-28T16:32:33Z UpdateTip: new best=00000000000000000009a30ace6dd38643db1798e34c15a8119e067ee0d4eed9 height=632057 version=0x20c00000 log2_work=91.984454 tx=533818995 date='2020-05-28T16:18:15Z' progress=0.999994 cache=384.0MiB(2932736txo) warning='74 of last 100 blocks have unexpected version'
2020-05-28T16:32:33Z   - Connect postprocess: 6.06ms [7.94s (12.70ms/blk)]
2020-05-28T16:32:33Z - Connect block: 592.27ms [514.24s (822.78ms/blk)]

The most concerning problem is the tx validation speed. You'll notice it's 50x slower on LBRY than BTC.

Further, it seems to get worse as more transactions are added. Notice lbry block 771819 only has 55, and processes at 0.493 ms / txin.
Verify 55 txins: 27.14ms (0.493ms/txin)

This leads me to believe there is some non-linear regression

Reproducer
I can reliably reproduce this issue by running the lbrycrdd daemon with -debug=bench and see how long the block connect takes.

Version:
LBRYcrd Core Daemon version v0.17.3.2-be118de

Machine
2015 Macbook Pro, 2.8 GHz Quad-Core Intel Core i7
16 GB 1600 MHz DDR3

Config
Default config.

@BrannonKing
Copy link
Member

@nitronick600 , I can look into this. In the meantime will you please run the same timing test with v0.17.4.5 and v0.19.1.2 ? You will have to reindex once when going to 17.4.5 but not when going to v19 from there. Run with -dbcache=1200 on those.

@hashbender
Copy link
Author

Alright, I'll run the reindex now for 17.4.5

If I recall correctly, we need to run v0.17.3.2 because of a consensus issue in the later versions?

@hashbender
Copy link
Author

This is also causing the getblocktemplate RPC to take multiple seconds to return a value.

@BrannonKing
Copy link
Member

The consensus issue in 17.4.x is repaired in 17.4.5. It is marked as pre-release so that we could get some more internal miles on it, increase our confidence in it.

@hashbender
Copy link
Author

"CreateNewBlock() packages: 1.56ms (457 packages, 19 updated descendants), validity: 6329.01ms (total 6330.57ms)"

This block generation time is pretty common now.

@BrannonKing
Copy link
Member

I want to clarify a few things:

  1. In both codebases the "connect x transactions" time is included in the "verify x txins" on the line after. That's probably a bug.
  2. In the "connect x transactions" phase, lbrycrd does substantially more work than bitcoin. lbrycrd parses out all the claims included in the transactions and adds them to the claim trie (or removes them in the case of a spend). It then does an "increment block" phase on the trie where it updates any claims that are becoming active or expiring, and handles any new takeover height records.
  3. In the "validity" stage of CreateNewBlock, it does all that claim trie work just mentioned for the new block template.
  4. I'm certain that it's possible to optimize the trie code more, but you should know that a lot of work has gone into optimizing it already. I believe that addressing coins.sqlite contains the full TXO, and this is too slow #386 would help.

@bvbfan
Copy link
Collaborator

bvbfan commented May 29, 2020

computeNodeHash is most expensive function as well as getMerkleHash according to Valgrind cachegrind. Also if we move connected transactions bench log before merkle hash check, we can notice merkle is a 2 to 10x slower

- Connect 416 transactions: 204.59ms (0.492ms/tx, 0.401ms/txin) [7.63s (95.43ms/blk)]
2020-05-29T08:37:21Z 
- Verify 510 txins: 571.33ms (1.120ms/txin) [16.19s (202.32ms/blk)]

Verify is exactly how long is merkle hash computation, that's on v17_master, sqlite backend.
If i recall my attempt to use one query with step/final function, i remember that step function is called in unordered way, despite query is ordered, looks like it's called before actual result ordering, that results in using a map/flat_map to have children ordered then map to vector conversion or anther computation function taking a map. So i left the approach.
cachegrind.zip
Screenshot_20200529_141049

@bvbfan
Copy link
Collaborator

bvbfan commented May 29, 2020

- Connect 1711 transactions: 2210.91ms (1.292ms/tx, 0.762ms/txin) [3.46s (494.53ms/blk)]
- Verify 2902 txins: 5492.32ms (1.893ms/txin) [8.34s (1191.10ms/blk)]

On block 771091 we have 5.5 sec. on merkle hash.
I backport my query from new hash PR, where it has a children present check, it saves some time but it still slow. We should improve merkle hash query somehow.
debug.log
I made a branch of the backport query merkle_improve, @BrannonKing you can check it.

@hashbender
Copy link
Author

hashbender commented May 29, 2020

17.4.5 doesn't seem any better:

2020-05-29T15:12:36Z   - Load block from disk: 0.00ms [344.33s]
2020-05-29T15:12:36Z     - Sanity checks: 0.17ms [78.61s (0.10ms/blk)]
2020-05-29T15:12:36Z     - Fork checks: 0.03ms [119.62s (0.15ms/blk)]
2020-05-29T15:12:49Z       - Connect 193 transactions: 13842.09ms (71.721ms/tx, 26.983ms/txin) [26992.48s (34.95ms/blk)]
2020-05-29T15:12:49Z     - Verify 513 txins: 13842.18ms (26.983ms/txin) [27089.23s (35.07ms/blk)]
2020-05-29T15:12:50Z     - Index writing: 388.68ms [1710.53s (2.21ms/blk)]
2020-05-29T15:12:50Z     - Callbacks: 0.05ms [38.66s (0.05ms/blk)]
2020-05-29T15:12:50Z   - Connect total: 14233.13ms [29289.18s (37.92ms/blk)]
2020-05-29T15:12:50Z   - Flush: 196.57ms [8700.50s (11.27ms/blk)]
2020-05-29T15:12:50Z   - Writing chainstate: 0.12ms [4231.59s (5.48ms/blk)]
2020-05-29T15:12:50Z UpdateTip: new best=2c1b20404b68f240d8cf81254d4bd407a6a5aa229f1a94ef70d7a4286a25ea43 height=772335 version=0x20000000 log2_work=73.717776 txb=193 tx=13233281 date='2020-05-29T15:12:21Z' progress=0.998687 cache=25.3MiB(1079txo)
2020-05-29T15:12:50Z   - Connect postprocess: 11.89ms [81.29s (0.11ms/blk)]
2020-05-29T15:12:50Z - Connect block: 14441.72ms [42646.90s (55.22ms/blk)]

Will try 19.1.2 next

@bvbfan
Copy link
Collaborator

bvbfan commented May 29, 2020

@nitronick600 If you can compile (clone branch merkle_improve follow the instructions on readme) and try it.

@hashbender
Copy link
Author

19.1.2 isn't any better either:

2020-05-29T15:41:13Z Syncing txindex with block chain from height 749986
2020-05-29T15:41:30Z   - Load block from disk: 0.01ms [0.01s]
2020-05-29T15:41:30Z     - Sanity checks: 0.07ms [0.00s (0.08ms/blk)]
2020-05-29T15:41:30Z     - Fork checks: 0.07ms [0.00s (0.04ms/blk)]
2020-05-29T15:41:34Z       - Connect 117 transactions: 4279.97ms (36.581ms/tx, 27.792ms/txin) [37.46s (2676.07ms/blk)]
2020-05-29T15:41:34Z     - Verify 154 txins: 4280.05ms (27.793ms/txin) [37.47s (2676.22ms/blk)]
2020-05-29T15:41:34Z     - Index writing: 1.75ms [0.03s (1.92ms/blk)]
2020-05-29T15:41:34Z     - Callbacks: 0.04ms [0.00s (0.05ms/blk)]
2020-05-29T15:41:34Z   - Connect total: 4283.35ms [37.50s (2678.91ms/blk)]
2020-05-29T15:41:35Z   - Flush: 119.16ms [2.14s (152.72ms/blk)]
2020-05-29T15:41:35Z   - Writing chainstate: 0.06ms [0.00s (0.06ms/blk)]
2020-05-29T15:41:35Z UpdateTip: new best=ba92d47cf65695fe5d3edef80c27670cf9628c263a1205dbcf03546ff3862b0c height=772349 version=0x20000000 log2_work=73.717818 txb=117 tx=13235608 date='2020-05-29T15:41:21Z' progress=0.999366 cache=0.2MiB(647txo)
2020-05-29T15:41:35Z   - Connect postprocess: 1.14ms [0.97s (69.23ms/blk)]
2020-05-29T15:41:35Z - Connect block: 4403.71ms [40.62s (2901.34ms/blk)]

As an aside, it took almost 15 minutes to startup.

@hashbender
Copy link
Author

Have we been able to make any progress on this?

@bvbfan
Copy link
Collaborator

bvbfan commented Jun 18, 2020

We have progress on, #390 was merged to v17_master which makes things faster. You can compile by yourself or wait for release, if you use HDD you may interested to try to increase db cache,
-dbcache=8192

@bvbfan
Copy link
Collaborator

bvbfan commented Jun 25, 2020

@nitronick600 you can test new release https://github.com/lbryio/lbrycrd/releases/tag/v0.17.4.6

@hashbender
Copy link
Author

We're running the branch now merkle_improve branch now, but I don't have enough info to say if this fixes the problem. It does seem that blocks are much faster to verify; will get back to you on submitblock

@bvbfan
Copy link
Collaborator

bvbfan commented Jun 25, 2020

In the release 0.17.4.6 has another one improvement especially indexing by claim id instead of claim name which gives another boost. So the release is faster than merkle_improve.

@hashbender
Copy link
Author

Will this require a reindex?

@bvbfan
Copy link
Collaborator

bvbfan commented Jun 25, 2020

No, just use new executable. Memory usage is lowered by default if you prefer to use higher memory (will be faster) -dbcache=4096 as command line argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants