Skip to content

LedgerDB V2: handles aren't closed in the common case #1515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
amesgen opened this issue May 19, 2025 · 0 comments · May be fixed by #1516
Open

LedgerDB V2: handles aren't closed in the common case #1515

amesgen opened this issue May 19, 2025 · 0 comments · May be fixed by #1516
Assignees

Comments

@amesgen
Copy link
Member

amesgen commented May 19, 2025

The V2 LedgerDB (introduced as part of UTxO HD in #1412) maintains a LedgerSeq at its core:

newtype LedgerSeq m l = LedgerSeq {
getLedgerSeq :: AnchoredSeq (WithOrigin SlotNo) (StateRef m l) (StateRef m l)
} deriving (Generic)

data StateRef m l = StateRef {
state :: !(l EmptyMK)
, tables :: !(LedgerTablesHandle m l)
} deriving (Generic)

Now, when we prune a LedgerSeq, we want to close the LedgerTablesHandles from the prefix that we are removing. The current code tries to do this, it it is not effective. Namely, prune returns the LedgerSeq prefix containing states to prune:

prune (LedgerDbPruneKeeping (SecurityParam k)) (LedgerSeq ldb) =
if toEnum nvol <= unNonZero k
then (LedgerSeq $ Empty (AS.anchor ldb), LedgerSeq ldb)
else
-- We remove the new anchor from the @fst@ component so that its handle is
-- not closed.
B.bimap (LedgerSeq . dropNewest 1) LedgerSeq $ AS.splitAt (nvol - fromEnum (unNonZero k)) ldb
where
nvol = AS.length ldb

And then in other places, we try to close the handles:

let (l, s) = prune (LedgerDbPruneKeeping (foeSecurityParam env)) (LedgerSeq newdb)
pure ((toClose, l), s)

mapM_ (close . tables) $ AS.toOldestFirst discardedBySelection ++ AS.toOldestFirst discardedByPruning

closeLedgerSeq :: Monad m => LedgerSeq m l -> m ()
closeLedgerSeq = mapM_ (close . tables) . toOldestFirst . getLedgerSeq

The crucial observation is that toOldestFirst does not return the anchor of the LedgerSeq/AnchoredSeq, so we do not close this. In particular, in the very common case of adopting one block (such that the immutable tip also advances by one block), which is the common case both while syncing and caught-up, we do not close anything.

The fix is simply to also consider the anchor of the sequence of handles to close.


Currently, this isn't problematic as the only V2 backend is the in-memory backend at the moment, and for it, closing handles doesn't matter (as the GC will still claim the ledger state once it is no longer referenced). However, in the upcoming LSM tree backend, this will be crucial.


Concrete "demonstration" of the bug:

Apply this diff on d582af5 (main at time of writing):

Show diff
diff --git a/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/InMemory.hs b/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/InMemory.hs
index 770a85862d..c4e2ac3115 100644
--- a/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/InMemory.hs
+++ b/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/InMemory.hs
@@ -90,7 +90,8 @@ newInMemoryLedgerTablesHandle ::
 newInMemoryLedgerTablesHandle someFS@(SomeHasFS hasFS) l = do
   !tv <- newTVarIO (LedgerTablesHandleOpen l)
   pure LedgerTablesHandle {
-      close =
+      close = do
+        !_ <- error "foo"
         atomically $ writeTVar tv LedgerTablesHandleClosed
     , duplicate = do
         hs <- readTVarIO tv
diff --git a/ouroboros-consensus/test/storage-test/Test/Ouroboros/Storage/LedgerDB/StateMachine.hs b/ouroboros-consensus/test/storage-test/Test/Ouroboros/Storage/LedgerDB/StateMachine.hs
index 5015d44106..33f842b615 100644
--- a/ouroboros-consensus/test/storage-test/Test/Ouroboros/Storage/LedgerDB/StateMachine.hs
+++ b/ouroboros-consensus/test/storage-test/Test/Ouroboros/Storage/LedgerDB/StateMachine.hs
@@ -270,14 +270,14 @@ instance StateModel Model where
   arbitraryAction _ model@(Model chain secParam) =
     frequency $ [ (2, pure $ Some GetState)
                 , (2, pure $ Some ForceTakeSnapshot)
-                , (1, Some . DropAndRestore <$> QC.choose (0, fromIntegral $ AS.length chain))
+                , (0, Some . DropAndRestore <$> QC.choose (0, fromIntegral $ AS.length chain))
                 , (4, Some <$> do
                       let maxRollback =
                             min
                               (fromIntegral . AS.length $ chain)
                               (BT.unNonZero $ maxRollbacks secParam)
-                      numRollback  <- QC.choose (0, maxRollback)
-                      numNewBlocks <- QC.choose (numRollback, numRollback + 2)
+                      numRollback  <- pure 0
+                      numNewBlocks <- pure 1
                       let
                         chain' = case modelRollback numRollback model of
                           UnInit     -> error "Impossible"

Then, running cabal run storage-test -- -p InMemV2 succeeds, even though we would expect it to fail if we would actually close the handles.

Concretely, this patch

  • modifies the close logic of the LedgerTablesHandle to always fail with an imprecise exception, and
  • modifies the LedgerDB to only generate ValidateAndCommit commands that add exactly one block, and disables DropAndRestore.

Then, if V2 were correctly closing pruned LedgerTablesHandles, then we would expect the test to fail; however, it passes just fine. Note that it does fail when keeping the test unmodified, showing that we at least sometimes close LedgerTablesHandles.

@amesgen amesgen self-assigned this May 19, 2025
@amesgen amesgen moved this to 👀 In review in Consensus Team Backlog May 19, 2025
@amesgen amesgen moved this to 👀 In review in Consensus Team Backlog May 19, 2025
@amesgen amesgen linked a pull request May 19, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 👀 In review
Development

Successfully merging a pull request may close this issue.

1 participant