E35 prune small batches #9088

awskii · 2023-12-27T17:02:55Z

Ordering

AggregatorV3Context pruning is happening in following order:

Index pruning started from lowest txNum such that txFrom <= txn <= txTo. Progress is going towards bigger txNumbers.
Therefore, History pruning goes in same direction and happens along with key pruning via callback.
Domain pruning starts from Latest() key which is the biggest key available. We use inverted steps (^step) as a suffix for domain keys which gives us an opportunity to prune smallest steps first. So, from largest available key and smallest available step going backwards to bigger steps and smaller keys. If for given key we met savedStep > pruneStep we safely going to PrevNoDup() key without scanning and skipping steps.

Limiting

Pruning progress obviously changes state therefore affects execution - invalid reads of obsolete values could happen if pruning is broken.
Pruning indices and histories is coupled, since history table is bounded to index key and txn entries. Since index is a mapping txNum -> {key, key', ...}, looks easier to limit their pruning by txNums at once instead of going through whole list selecting by limit keys.
AggregatorV3Context.PruneSmallBatches() always set txFrom=0 since it's purpose to keep db clean but one step at a time.

domain pruning is limited by amount of keys removed at once. For slow disks and big db (>150G) domain pruning could be very slow: Database keep growing, slowing down pruning as well to 100.000 kv's per 10min session which is not enough to keep db of a constant size. So, using smaller values for --batchSize could solve the problem due to more frequent call of Prune and small changes put into db.

Domain can be pruned if savedPruneProgress key is not for this table nil, or smallest domain key has values of savedStep < pruneStep in domain files. The downside of looking up onto smallest step is that smallest key not guaranteed to be changed in each step which could give us invalid estimate on smallest available key. Saved prune progress indicates that we did not finished latest cleanup but does not give us step number. Could be used meta tables which would contain such an info (smallest step in table?).

takeouts, keep in mind

--batchSize should be smaller on slower disks (even of size 16-64M) to keep db small. Balanced batchSize could increase throughput preserving db size.
We have some internal functions which relies on this ordering like define available steps in db
When --batchSize is reached, commitment evaluated and puts update into that batch which becomes x1.4-x2 of size

AskAlexSharov · 2023-12-27T17:30:59Z

erigon-lib/state/aggregator_v3.go

+	defer cancel()
+
+	for {
+		if err := ac.Prune(context.Background(), tx, 100); err != nil {


Need check that prune with limit is “consistent” - for example Index.Prune doesn ‘t exit before collector.Load (if limit reached).

thanks for pointing on it

awskii · 2023-12-28T00:34:48Z

this pulled up few problems with context management and limits as well.

awskii · 2023-12-28T23:45:40Z

Since limit is decreased after each hit of prunable key, we sometimes end up with no prune progress over history at all (when limit is a small number). For now i split variables and both domains and history prunes limit keys each.

Did not decided yet if prune continuation still make sense or it's better to be removed right now, need to make a few more comparisons.

Also this PR raises the question: do we want to prune domains index tables as well (which looks meaningful) or just prune it during Unwind process only.

AskAlexSharov · 2023-12-29T02:38:29Z

limit - i think just need change meaning of this variable. avoid it's mutation - and pass same limit to all domains/history/indices.
do we want to prune domains index tables - don't understand this question. we wanna prune everything I guess.

AskAlexSharov · 2024-01-12T13:16:16Z

erigon-lib/state/inverted_index.go

-				break
+
+			txNum := binary.BigEndian.Uint64(txnm)
+			if txNum < stat.MinTxNum {


don't understand - does this code lost [from, to) semantic or not

awskii added 2 commits December 27, 2023 16:55

wip save

90e956d

save

7516903

AskAlexSharov approved these changes Dec 27, 2023

View reviewed changes

awskii added 2 commits December 27, 2023 18:08

save

a3f2b58

save

e0d7ed0

awskii and others added 2 commits December 28, 2023 21:06

wip save test

42901c7

Merge branch 'e35' into e35_prune_small

6e7ae96

save

a15f973

awskii and others added 19 commits December 29, 2023 04:01

save

e8125a5

save

3e8903e

save

4d64cb8

save

d3aec5d

Merge branch 'e35' into e35_prune_small

3ee60b4

save

eb3cdb6

save

bab881a

save

dbdaf62

save

a80a50b

save

6ccd5d2

Merge branch 'e35' into e35_prune_small

77ae6b4

Merge branch 'e35' into e35_prune_small

f1a0564

save

05fe3b5

save

89590a9

save

cb41783

Merge branch 'e35' into e35_prune_small

1029b6c

save

5e34e84

save

ec46532

save

1754b88

awskii added 8 commits January 11, 2024 01:35

save

98415bc

save

77a0102

save

be1894d

Merge branch 'e35' into e35_prune_small

03284cc

save

f82ac2d

save

8200e7c

save

32a96e1

save.

9db9320

AskAlexSharov reviewed Jan 12, 2024

View reviewed changes

awskii and others added 18 commits January 12, 2024 15:05

save

a7338f3

save

e30d54d

save

adbb324

save

96a39fa

Merge origin/e35 into e35_prune_small

434ac0f

test

e345d9d

test

6cc3267

test

dd5bc2a

test

8d06fba

no save, test

fa79710

test

7b5c92b

test

035242a

save

795e38f

save ok

7b2adb4

ok

c5579c8

save

cd9655d

Merge branch 'e35' into e35_prune_small

265a705

ok!

8a65996

awskii changed the title ~~E35 [wip] prune small~~ E35 prune small batches Jan 16, 2024

awskii merged commit 40f8b12 into e35 Jan 16, 2024
7 checks passed

awskii deleted the e35_prune_small branch January 16, 2024 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E35 prune small batches #9088

E35 prune small batches #9088

awskii commented Dec 27, 2023 •

edited

Loading

AskAlexSharov Dec 27, 2023

awskii Dec 27, 2023

awskii commented Dec 28, 2023

awskii commented Dec 28, 2023 •

edited

Loading

AskAlexSharov commented Dec 29, 2023

AskAlexSharov Jan 12, 2024

E35 prune small batches #9088

E35 prune small batches #9088

Conversation

awskii commented Dec 27, 2023 • edited Loading

Ordering

Limiting

takeouts, keep in mind

AskAlexSharov Dec 27, 2023

Choose a reason for hiding this comment

awskii Dec 27, 2023

Choose a reason for hiding this comment

awskii commented Dec 28, 2023

awskii commented Dec 28, 2023 • edited Loading

AskAlexSharov commented Dec 29, 2023

AskAlexSharov Jan 12, 2024

Choose a reason for hiding this comment

awskii commented Dec 27, 2023 •

edited

Loading

awskii commented Dec 28, 2023 •

edited

Loading