Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTxO-HD targeting main #1267

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

UTxO-HD targeting main #1267

wants to merge 1 commit into from

Conversation

jasagredo
Copy link
Contributor

@jasagredo jasagredo commented Sep 26, 2024

Description

The changes from UTxO-HD span over ouroboros-consensus, ouroboros-consensus-diffusion and ouroboros-consensus-cardano. The core change is:

  • The UTxO set is extracted from the LedgerState in the form of LedgerTables.
  • These tables are stored in the LedgerDB, which can keep them in memory or on disk.
  • When performing an action that requires UTxOs, we have to ask the LedgerDB for those. This might perform IO.

Here I will explain how I would review this enormous PR. Instead of listing files I will describe concepts, and my suggestion is to go look at the mentioned files (or search for the concepts) then mark the file as viewed to offload it from the brain.

The ledger tables

  • The first step would be to understand the concept of LedgerTables, see Ouroboros.Consensus.Ledger.Tables.* modules. The LedgerTables are parametrized by l (in the end it will be by blk) and by mk (or MapKinds). MapKinds are just types parametrized by the Key and Value of l. These will be TxIn|TxOut for unitary blocks and CanonicalTxIn|HardForkTxOut for hard fork blocks.
  • LedgerTables are barbies-like, see Ouroboros.Consensus.Ledger.Tables.Combinators.
  • LedgerTables are (most commonly) empty (EmptyMK), a (possibly restricted) UTxO set (ValuesMK), a set of TxIns (KeysMK), a sequence of differences (DiffMK) or a combination of values + diffs (TrackingMK). The only non-obvious one is DiffMK which is a map of sequences of changes to a value (in the UTxO case values don't change, they are created and destroyed, so there will be at most 2 elements there). On top of that there is a DiffSeqMK which is a fingertree of differences. Only used in V1 (see below).
  • The LedgerState is itself parametrized by this same mk. The data instances will then make use of that mk to define tables associated with the block. So the byron ledger state ignores it, the shelley ledger state has a new field with the tables and the hard fork ledger state will propagate the mk through the telescope, therefore having an mk of the particular state in the Telescope.
  • The LedgerTables can live on their own, which for unitary blocks don't make a difference, but for the Cardano Block, we go from an mk passed to the Telescope (therefore tables at the tip of the Telescope) to CardanoLedgerTables, in which each value is a HardForkTxOut. This cost is non-trivial and we only want to pay it when applying a new block/transaction.
  • LedgerTables can be extracted and injected into the ledger state via (un)stowLedgerTables.
  • The ledger tables of the Extended ledger state are the same as the ones form the LedgerState.
  • A very important bit that maybe was not clear above is that the HardForkBlock has no canonical tables because our definitions are not compositional for the HF block, only the CardanoBlock has "hard fork tables". See the constraints of HasHardForkLedgerTables.

Applying and ticking (Ouroboros.Consensus.Ledger.Abstract/Basics)

When ticking a block, some differences might be created, and no values are needed. So the types go from l EmptyMK to Ticked1 l DiffMK. This is the case at least in two moments: when going from Byron to Shelley (all values are created here) and when going from Shelley to Allegra (avvm addresses are deleted). See the relevant functions: translateLedgerStateByronToShelley and translateLedgerStateShelleyToAllegra.

When applying a block, we get the inputs needed (getBlockKeySets then read those from the LedgerDB), tick the ledger state without tables (possibly creating diffs), apply those diffs on the values from the LedgerDB, then call the ledger rules. We then diff the input and output tables to get a set of differences from applying a block, to which we will prepend the ones from ticking. See applyBlockResult and the Shelley functions for applying blocks.

The story with transactions is pretty similar.

The LedgerDB versions (Ouroboros.Consensus.Storage.LedgerDB)

There are two flavors of the LedgerDB, each one having two implementations:

  • V1 (Ouroboros.Consensus.Storage.LedgerDB.V1): we keep a sequence of EmptyMK ledger states and dump the values into a BackingStore. We can get back values from the backing store at any ledger state, by opening a BackingStoreValueHandle and reading from it. The BackingStore consists of a "complete" UTxO set at some anchor and then a sequence of differences. To get values at a given point we have to read the anchor, then reapply the differences up to the desired point. This is "wasteful" if done in memory (why keep diffs and have to reapply them every time if we can just apply them in place?) but it is useful on the on-disk implementation which puts the "complete" UTxO set on the disk, offloading it from memory. There are two implementations:
    • OnDisk: It uses LMDB underneath. See the Ouroboros.Consensus.Storage.LedgerDB.V1.BackingStore.Impl.LMDB.* modules.
    • InMemory: Not intended for real use. As mentioned above it is wasteful. It serves as a reference impl for the OnDisk implementation.
  • V2 (Ouroboros.Consensus.Storage.LedgerDB.V2): We keep a sequence of StateRefs, which are EmptyMK ledger states together with a tables handle from which we can read values monadically. This is very similar to the previous LedgerDB, in which we kept a sequence of (complete) LedgerStates. There are two implementations:
    • InMemory
    • LSM: still a WIP

Evaluating forks

In order to evaluate forks, we created the concept of Forkers, where each LedgerDB implementation has their own concept. They are just an abstract interface that allows to query for values and push differences that eventually can be dumped back into the LedgerDB (only by ChainSelection, others use ReadOnlyForkers). Note that they allocate resources so there is some juggling with ResourceRegistries there.

Ledger queries (Ouroboros.Consensus.Ledger.Query)

Some queries will have to look at the UTxO set, in particular GetUtxoByAddress, GetUtxoWhole and GetUtxoByTxin. We categorize them by the means of QueryFootprint. We will process each one of them differently.

Other queries use QFNoTables, GetUtxoByTxIn uses QFLookupTables and will have to read a single value from the tables, and GetUtxoWhole and GetUtxoByAddress use QFTraverseTables as they will have to scan the whole UTxO set.

For the HardForkBlock there is another class Ouroboros.Consensus.HardFork.Combinator.Ledger.Query.BlockSupportsHFLedgerQuery which has faster implementations than projecting the tables into the particular tip of the Telescope, because we can usually judge whether we want the result without upgrading the TxOut to the latest era.

In essence, queries are now monadic. Queries that don't look at the UTxO set are artificially monadic (just a pure of the already existing logic).

The mempool

The mempool in essence will have to acquire (read only) forkers on the LedgerDB at the tip, then read values for the incoming transactions and apply them. The returned diffs are appended to the ones in the mempool, which keeps a TrackingMK with the current values and past diffs.

When revalidating transactions we cannot know if the UTxO set changed so we will have to re-read the values from the (new) forker.

The internal state is now a TMVar because we need to acquire >> read tables >> update where read tables is in IO and the others are in STM.

The snapshots

We now store snapshots in a new format:

  • V1-OnDisk: a copy of the lmdb database and a (Haskell-CBOR) serialization of the LedgerState.
  • V*-InMemory: a (Haskell-CBOR) serialization of the UTxO set and a (Haskell-CBOR) serialization of the LedgerState.

Note that for V2 we can take snapshots at any time of the immutable tip, but for v1 we have to take flush some differences from the BackingStore into the anchor to advance it to the immutable tip.

This is abstracted by either implementation in Ouroboros.Consensus.Storage.LedgerDB.V*...tryTakeSnapshot

The forging loop

The forging loop didn't change much. Each iteration runs with a resource registry (to allocate the forkers). Then we use the forker to provide values for the mempool snapshot acquisition, in case of a revalidation.

Changes in Byron/Shelley/Cardano

The changes here are mostly fulfilling everything that was described above, to make all the types match. There are some specific things which are interesting to look at because they might be non-trivial:

  • Translation functions (with the two examples I already mentioned)
  • The TxIn|TxOut data instances, the LedgerState data instance and the HasLedgerTables instances
  • applyBlock for shelley. The cardano one is just the HFC one, which injects the CardanoTables into the tip of the Telescope (here is where we do the costly step, but it usually won't be that costly because the UTxO set for a block is small).
  • The Cardano.Ledger module which defines the CardanoTxIn and CardanoTxOut.

Other changes

The rest of the changes are mainly just following GHC adjusting the types here and there. Most other code doesn't use tables so an abstract mk or EmptyMK is used to make the kind well-formed.

Copy link
Contributor Author

@jasagredo jasagredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pass over the non-testing libraries.

Comment on lines +44 to +49
let keys = foldl' (<>) emptyLedgerTables
$ map getTransactionKeySets
$ [ txForgetValidated . TxSeq.txTicketTx $ tx
| tx <- TxSeq.toList $ isTxs is
]
values <- readUntickedTables keys
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could keep track of the original tables and if the hash didn't change, reuse those.

chainSelection' curChainAndLedger chains'
Just chains' ->
chainSelection' curChainAndLedger chains' >>= \case
Nothing -> pure curChainAndLedger
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we forgetting to close the curForker here?

Comment on lines +562 to +571
ForkerOpen
| ForkerCloseUncommitted
| ForkerCloseCommitted
| ForkerReadTablesStart
| ForkerReadTablesEnd
| ForkerRangeReadTablesStart
| ForkerRangeReadTablesEnd
| ForkerReadStatistics
| ForkerPushStart
| ForkerPushEnd
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably much more information could be put here, but we will only trace this on debugging where hopefully the meaning is sufficiently narrow to figure out what is going on. Tracing this on a normal node will (probably) crash it.

Comment on lines +8 to +16
-- | See "Ouroboros.Consensus.Storage.LedgerDB.BackingStore.API" for the
-- documentation. This module just puts together the implementations for the
-- API, currently two:
--
-- * "Ouroboros.Consensus.Storage.LedgerDB.BackingStore.Impl.InMemory": a @TVar@
-- holding a "Data.Map".
--
-- * "Ouroboros.Consensus.Storage.LedgerDB.BackingStore.Impl.LMDB": an external
-- disk-based database.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The links are wrong, should have a .V1. interfix.

Comment on lines +255 to +284
data BackingStoreTrace =
BSOpening
| BSOpened !(Maybe FS.FsPath)
| BSInitialisingFromCopy !FS.FsPath
| BSInitialisedFromCopy !FS.FsPath
| BSInitialisingFromValues !(WithOrigin SlotNo)
| BSInitialisedFromValues !(WithOrigin SlotNo)
| BSClosing
| BSAlreadyClosed
| BSClosed
| BSCopying !FS.FsPath
| BSCopied !FS.FsPath
| BSCreatingValueHandle
| BSValueHandleTrace !(Maybe Int) !BackingStoreValueHandleTrace
| BSCreatedValueHandle
| BSWriting !SlotNo
| BSWritten !(WithOrigin SlotNo) !SlotNo
deriving (Eq, Show)

data BackingStoreValueHandleTrace =
BSVHClosing
| BSVHAlreadyClosed
| BSVHClosed
| BSVHRangeReading
| BSVHRangeRead
| BSVHReading
| BSVHRead
| BSVHStatting
| BSVHStatted
deriving (Eq, Show)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, these are only for debugging, otherwise the node would explode.

{-# OPTIONS_GHC -Wno-orphans #-}

-- | A 'DbChangelog' is the component of the
-- 'Ouroboros.Consensus.Storage.LedgerDB.LedgerDB' implementation that
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong link

Comment on lines +637 to +645
case reason of
ChainDB.InFutureExceedsClockSkew {} -> pure ()
ChainDB.ValidationError err ->
case err of
ExtValidationErrorHeader{} -> pure ()
ExtValidationErrorLedger{} ->
whenJust
(NE.nonEmpty (map (txId . txForgetValidated) txs))
(lift . removeTxs mempool)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this came from main, was it removed recently?


instance ShelleyBasedEra era => SerializeHardForkTxOut '[ShelleyBlock proto era] where
encodeHardForkTxOut _ = SL.toEraCBOR @era
decodeHardForkTxOut _ = SL.fromEraCBOR @era
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be fromEraShareCBOR, same as in the normal shelley block. There are no mainnet ledgerstates of the unary shelley HF block, so it doesn't really matter I think.

@jasagredo jasagredo marked this pull request as ready for review October 24, 2024 13:29
@jasagredo jasagredo changed the title WIP: UTxO-HD targeting main UTxO-HD targeting main Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant