LedgerDB: prune on garbage collection instead of on every change #1513

amesgen · 2025-05-19T12:26:24Z

This is in preparation for #1424

This PR is intended to be reviewed commit-by-commit.

Currently, we prune the LedgerDB (ie remove all but the last k+1 states) every time we adopt a longer chain. This means that we can not rely on the fact that other threads (like the copyAndSnapshot ChainDB background) actually observe all immutable ledger states, just as described in the caveats of our Watcher abstraction.

However, a predictable ledger snapshotting rule (#1424) requires this property; otherwise, when the node is under high load and/or we are adopting multiple blocks in quick succession, the node might not be able to create a snapshot for its desired block.

This PR changes this fact: Now, when adopting new blocks, the LedgerDB is not immediately pruned. Instead, the a new dedicated background thread for ledger maintenance tasks (flushing/snapshotting/garbage collection) in the ChainDB will periodically (on every new immutable block) wake up and (in particular) garbage collect the LedgerDB based on a slot number.

Also, this makes the semantics more consistent with the existing garbage collection of previously-applied blocks in the LedgerDB, and also with how the ChainDB works, where we also don't immediately delete blocks from the VolatileDB once they are buried beneath k+1 blocks.

See #1513 (comment) for benchmarks demonstrating that the peak memory usage does not increase while syncing (where we now briefly might hold more than k+1 ledger states in memory).

jasagredo

Looks good.

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2.hs

amesgen · 2025-06-05T14:17:03Z

Sync benchmarks are looking good (mainnet, first 1e6 slots/blocks):

LMDB benchmark (of course, this is a bit degenerate as Byron doesn't have tables, but this still serves as a regression test for the DbChangelog aspects which are touched by this PR).

Note that d1b6215 is crucial; otherwise, there is a significant (2x) regression in max heap size.

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/API.hs

...ros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V1/DbChangelog.hs

...boros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/LedgerSeq.hs

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/API.hs

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V1.hs

...ros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ChainDB/Impl/Background.hs

geo2a · 2025-07-01T08:08:53Z

...ros-consensus/changelog.d/20250626_193647_alexander.esgen_ledgerdb_garbage_collect_states.md

@@ -0,0 +1,6 @@
+### Breaking
+
+- Changed pruning of immutable ledger states to happen on LedgerDB/ChainDB


The changelog entry here seems misleading, but please correct me if I'm wrong.

If I understand correctly, we are now pruning the LedgerDB in copyAndSnapshotRunner, i.e. when moving blocks from VolatileDB to ImmutableDB. In the scheduled ChainDB GCs, we will not prune the LedgerDB at all.

Good catch, I removed the "ChainDB" part here (it happens as part of LedgerDB garbage collection, which we execute as part of the copyAndSnapshot background thread in the ChainDB 👍)

Based on this discussion and your justified confusion in #1513 (comment): Maybe we actually want to have a separate ChainDB background thread that does LedgerDB snapshotting + pruning?

Pro: clearer separation of concerns; taking a snapshot does not block copying blocks to the ImmutableDB (this improvement is independent of this PR)

Especially the latter point might be more relevant with Optional random delay when creating snapshots #1573

Con: more ceremony, more threads mean more RTS contention (seems weak)

FTR we discussed this in the sync today, and decided that a new ChainDB background thread makes sense.

Done in d1b6215

amesgen · 2025-07-02T08:19:05Z

Thanks for the great reviews, I hope I addressed your comments. Interesting changes:

d627e84 for removing LedgerDbPruneKeeping prompted by LedgerDB: prune on garbage collection instead of on every change #1513 (comment)
Found a bug in the LedgerDB state machine test (LedgerDB.StateMachine test: actually test rollbacks #1576) which this PR now depends on as I enriched it due to LedgerDB: prune on garbage collection instead of on every change #1513 (comment)

It is not necessary to perform the garbage collection of the LedgerDB and the map of invalid blocks in the same STM transaction. In the past, this was important, but it is not anymore, see #1507.

Primarily, this is an optimization to reduce the maximum memory usage (more relevant with the in-memory backend) when pruning happens on garbage collection instead of while adding new blocks to the LedgerDB, see the added commit and the benchmark in the pull request. Previously, LedgerDB garbage collection happened as part of VolatileDB garbage collection, which was intentionally rate-limited. Also, it resolves the current (somewhat weird) behavior that we do not copy any blocks to the ImmutableDB when we are taking a snapshot (which can take >2 minutes), and consequently also not garbage-collecting the VolatileDB. It also synergizes with the planned feature to add a random delay when taking snapshots.

Also make sure to account for the fact that the DbChangelog might have gotten pruned between opening and committing the forker.

regarding the previous few commits

@k

It was already superseded in the most important places due to `LedgerDbPruneBeforeSlot`. Its remaining use cases are non-essential: - Replay on startup. In this case, we never roll back, so not maintaining k states is actually an optimization here. We can also remove the now-redundant `InitDB.pruneDb` function. - Internal functions used for db-analyser. Here, we can just as well use `LedgerDbPruneAll` (which is used by `pruneToImmTipOnly`) as we never need to roll back. - Testing. In particular, we remove some DbChangelog tests that previously ensured that only at most @k@ states are kept. This is now no longer true; that property is instead enforced by the LedgerDB built on top of the DbChangelog. A follow-up commit in this PR enriches the LedgerDB state machine test to make sure that the public API functions behave appropriately, ensuring that we don't lose test coverage (and also testing V2, which previously didn't have any such tests).

Make sure that we correctly fail when trying to roll back too far.

amesgen changed the base branch from cardano-node-10.4-backports to main May 20, 2025 15:03

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch 2 times, most recently from 8b48bb3 to 045f1cc Compare May 20, 2025 15:15

amesgen changed the base branch from main to amesgen/v2-ledgerseq-close May 20, 2025 15:15

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch 4 times, most recently from 13e5533 to 68402ed Compare May 20, 2025 17:25

jasagredo approved these changes May 21, 2025

View reviewed changes

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2.hs Show resolved Hide resolved

amesgen force-pushed the amesgen/v2-ledgerseq-close branch from 981971e to 0c5b137 Compare May 28, 2025 12:00

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 68402ed to 4d6fd67 Compare May 28, 2025 12:00

amesgen mentioned this pull request May 28, 2025

LedgerDB.V2: make sure to actually close handles #1516

Merged

jasagredo mentioned this pull request May 29, 2025

Consensus release for node 10.6 #1541

Closed

jasagredo added this to Consensus Team Backlog Jun 5, 2025

jasagredo moved this to 🏗 In progress in Consensus Team Backlog Jun 5, 2025

jasagredo assigned amesgen Jun 5, 2025

amesgen mentioned this pull request Jun 5, 2025

LedgerDB V2: prevent race conditions between using (duplicating) and closing LedgerTableHandle s #1551

Closed

amesgen force-pushed the amesgen/v2-ledgerseq-close branch from 0c5b137 to 7900088 Compare June 5, 2025 11:28

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 4d6fd67 to 7049fd4 Compare June 5, 2025 14:16

Base automatically changed from amesgen/v2-ledgerseq-close to main June 5, 2025 21:18

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch 2 times, most recently from 2e01b1c to b9e25f5 Compare June 10, 2025 15:47

amesgen changed the base branch from main to amesgen/ledgerdb-v2-locking June 10, 2025 15:49

amesgen force-pushed the amesgen/ledgerdb-v2-locking branch from 19faf20 to 4010598 Compare June 10, 2025 17:54

amesgen mentioned this pull request Jun 10, 2025

LedgerDB.V2: opportunistically reduce lock contention when closing a Forker #1557

Open

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from b9e25f5 to 894940c Compare June 10, 2025 18:09

Base automatically changed from amesgen/ledgerdb-v2-locking to main June 11, 2025 09:07

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 894940c to a8fa7e2 Compare June 30, 2025 08:11

amesgen marked this pull request as ready for review June 30, 2025 08:22

amesgen requested review from nfrisby, fraser-iohk, dnadales and geo2a as code owners June 30, 2025 08:22

amesgen mentioned this pull request Jun 30, 2025

LedgerDB: implement predictable snapshotting #1575

Open

amesgen moved this from 🏗 In progress to 👀 In review in Consensus Team Backlog Jun 30, 2025

jasagredo reviewed Jun 30, 2025

View reviewed changes

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from a8fa7e2 to b503dc3 Compare June 30, 2025 11:52

dnadales approved these changes Jun 30, 2025

View reviewed changes

geo2a reviewed Jul 1, 2025

View reviewed changes

...ros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ChainDB/Impl/Background.hs Outdated Show resolved Hide resolved

geo2a reviewed Jul 1, 2025

View reviewed changes

geo2a approved these changes Jul 1, 2025

View reviewed changes

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from b503dc3 to 6c78fad Compare July 2, 2025 08:14

amesgen changed the base branch from main to amesgen/ledgerdb-state-machine-precondition-bug July 2, 2025 08:16

amesgen mentioned this pull request Jul 2, 2025

Optional random delay when creating snapshots #1573

Open

Base automatically changed from amesgen/ledgerdb-state-machine-precondition-bug to main July 2, 2025 14:48

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 6c78fad to 48bb1fe Compare July 7, 2025 08:27

amesgen added 11 commits July 9, 2025 12:01

LedgerDB.garbageCollect: allow (non-STM) effectful cleanup

fab864d

It is not necessary to perform the garbage collection of the LedgerDB and the map of invalid blocks in the same STM transaction. In the past, this was important, but it is not anymore, see #1507.

LedgerDB: introduce slot-based pruning

7b8adcd

LedgerDB.V1: prune on garbage collection instead of on every change

36ed517

Also make sure to account for the fact that the DbChangelog might have gotten pruned between opening and committing the forker.

LedgerDB.V1: adapt queries for DbChangelog of length >k

05b946d

LedgerDB.V2: prune on garbage collection instead of on every change

58d8b7b

LedgerDB.V2: adapt queries for DbChangelog of length >k

3e44253

LedgerDB.garbageCollect: update documentation

2f277c6

regarding the previous few commits

LedgerDB.StateMachine test: test invalid rollbacks

8b2a38e

Make sure that we correctly fail when trying to roll back too far.

Add changelogs

ad7acfa

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 48bb1fe to ad7acfa Compare July 9, 2025 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LedgerDB: prune on garbage collection instead of on every change #1513

LedgerDB: prune on garbage collection instead of on every change #1513

Uh oh!

amesgen commented May 19, 2025 •

edited

Loading

Uh oh!

jasagredo left a comment

Uh oh!

Uh oh!

amesgen commented Jun 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

geo2a Jul 1, 2025

Uh oh!

amesgen Jul 2, 2025

Uh oh!

amesgen Jul 2, 2025

Uh oh!

amesgen Jul 7, 2025

Uh oh!

amesgen commented Jul 2, 2025

Uh oh!

Uh oh!

		@@ -0,0 +1,6 @@
		### Breaking

		- Changed pruning of immutable ledger states to happen on LedgerDB/ChainDB

LedgerDB: prune on garbage collection instead of on every change #1513

Are you sure you want to change the base?

LedgerDB: prune on garbage collection instead of on every change #1513

Uh oh!

Conversation

amesgen commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasagredo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amesgen commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

geo2a Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

amesgen Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

amesgen Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

amesgen Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

amesgen commented Jul 2, 2025

Uh oh!

Uh oh!

amesgen commented May 19, 2025 •

edited

Loading

amesgen commented Jun 5, 2025 •

edited

Loading