Implementation-Independent Ledger Conformance Test Suite #4892

sierkov · 2025-02-16T22:14:38Z

Motivation

The availability of Plutus Conformance Test Suite has fostered the development of numerous Plutus implementations in Rust, Python, JavaScript, Go, and C++. These implementations support diverse community use cases and enable experimentation with various aspects of Plutus, including performance.

The community is actively working on Cardano Ledger implementations in Rust, Go, and C++. However, without a high-quality, implementation-independent test suite like the one available for Plutus, each implementation must rely on its own custom tests. This increases the risk of non-conformant implementations being actively used simply due to the lack of a universally available conformance test suite. Furthermore, a language-independent test suite would facilitate collaboration, allowing contributions from any team to benefit all implementations.

The task consists of two stages:

Alignment: Discuss and define the requirements and general approach for an implementation-independent test suite.
Development: Build the test suite, with the the C++ implementation committing a share of the necessary resources to support its development.

More details follow. Feedback and suggestions are welcome.

Requirements

The test suite must be language- and environment-agnostic.
It must be reasonably simple to use in major programming languages.
It must follow a black-box approach, evaluating outputs based on given inputs, to enable experimentation with alternative algorithms such as batching and parallelization.
It should support parallel execution of individual tests to leverage modern hardware and shorten development cycles.

Assumptions Driving the Proposed Approach

The initial proposed approach is based on the following assumptions:

Every ledger implementation requires a functional CBOR decoder and corresponding parsers for Cardano block formats.
A ledger implementation can be fully modeled as a black box that processes a sequence of blocks and produces a new ledger state.
A common approach to testing a new ledger implementation is to process blocks from Cardano’s mainnet and testnets. The results are then compared against those produced by the reference implementation (Cardano Node). Therefore, it should be possible to leverage some of those data when creating the test suite.
The formal specification of the Cardano Ledger serves as the authoritative source for ledger rules and should be used to generate a comprehensive set of test cases. However, automatic source code generation from the formal specification is limited to only a few programming languages, thereby affecting the test suite’s primary objective: implementation independence.

Proposed approach

Test case inputs should use the standard CBOR format used for storing blocks on the Cardano mainnet.
Test case outputs should use the state snapshot format from the latest stable version of the Cardano Node, which is also CBOR-encoded. This allows for easy benchmarking against the reference implementation. Additionally, if the snapshot format changes, regenerating outputs is straightforward, as test case inputs remain standard Cardano blocks.
Each test case consists of an initial ledger state (including the genesis configuration), a sequence of blocks, and a final ledger state.
Initial test cases can be created by sampling blocks from the Cardano mainnet and modifying them as needed to ensure consistency of generated sequences with ledger specification rules.
A comprehensive test set can be generated either programmatically, using a block generator based on the formal ledger specification, or manually, depending on cost-effectiveness.

KtorZ · 2025-02-18T08:49:28Z

Hey @sierkov, we're fully onboard with this idea as part of the Amaru project and have even already started working towards it. So let me outline a few of the ongoing efforts:

I - Simulation testing

Inspired by the Maelstrom project from Jepsen, we're working on a simulation engine for node-to-node interactions that is (unlike Jepsen's): deterministic and language-agnostic. The idea is to leverage message-passing between processes and simulate faults between them.

This effort is particularly focused on the overall consensus behavior of nodes, with the hope to offer a validation bench for the Ouroboros protocols as a whole. See more under pragma-org/simulation-testing.

II - Ledger snapshots

To test conformance of the Amaru ledger against the Haskell's ledger, we compare epoch snapshots produced from both ends. We currently roll with two snapshots:

Stake distribution snapshots: contains account balances per stake credentials, as well as their delegation. It also contains pools specific information such as their protocol parameters, block produced during the epoch and overall active stake. See this example.
Rewards summary snapshots: contains information more specific to the rewards calculation, such as the treasury and reserves amount during the epoch, or the pools global efficiency (eta), as well as the values of the rewards pot and leader rewards for each pool. See another example here

As you can see, those snapshot are in JSON, mainly for two reasons:

They're easier to format this way, and compare as diff for testing. Performances isn't much of a concern here since they're only serialized in this way for testing.
It allows to more easily document them (using JSON-schema and the myriad of visualisation tools that come with them).

We are yet to document the steps to produce them from a Haskell node but in brief, we leverage the mini-protocols and dump the NewEpochState at specific point from which we then slightly re-arranged the data into a more suitable format.

III - Test vectors

Overall, we're also often taking specific CBOR-serialised blocks from mainnet, preprod or preview as easy source of validations. This strategy is however sub-optimal at this point because: (a) we do not yet measure coverage, so there's little guarantee that we actually cover the entire serialiser and (b) it only ever puts the SUT in front of valid data, and never in front of potentially problematic one.

So, we are also when necessary generating test vectors from the Haskell codebase containing both valid and invalid scenarios. This has been the case, for example, for block headers which we used to validate both the deserialisation parts as well as the KES / VRF primitives with specific adversarial mutations.

All-in-all, as you can see, we try to keep those things as language-agnostic as possible, with the idea to upstream some of them to the Cardano blueprint led by @ch1bo.

WhatisRT · 2025-02-18T16:03:26Z

In principle it shouldn't be too difficult to turn the formal specification into one that can operate on CBOR serialized data. We already have a translation for most datatypes that appear in blocks and states between Haskell and Agda, and obviously the ledger can deserialize CBOR into the Haskell types. So just generating a trace checker out of this should be quite doable in not too much time. I'd assume at least half of the effort would just be getting all the build infrastructure going. This is actually very closely related to a project that I'd really like to do at some point, which is to verify mainnet using the Agda formalization. There are multiple ways to do that, but this would be one.

The bigger problems with this approach would be completeness (there are some things missing from the ledger formalization) and how to actually generate good traces. Completeness is obviously quite high on our priority list but it will take a while. For generating good traces, you could probably also reuse some of the ledger infrastructure that exists, but I don't know how difficult that would be.

KtorZ · 2025-02-18T16:40:44Z

@WhatisRT it shouldn't be too difficult to turn the formal specification into one that can operate on CBOR serialized data

You mean to say that you are volunteering for this task 😶 ?

The bigger problems with this approach would be completeness

For lack of better tools, code coverage can at least give some answer (although it will only highlight the obvious non-covered path). We can at least start with that?

WhatisRT · 2025-02-18T20:08:13Z

You mean to say that you are volunteering for this task 😶 ?

I don't really have time to work on this myself (too busy with Leios), but if there is an ask from the community I'm sure it can be prioritized.

For lack of better tools, code coverage can at least give some answer (although it will only highlight the obvious non-covered path). We can at least start with that?

Absolutely! What we have right now works quite well for conformance testing the implementation, and of course completeness is exactly the same problem there.

ch1bo · 2025-02-19T09:19:37Z

Thanks for opening this discussion @sierkov and for pointing me to it @KtorZ.

You mean to say that you are volunteering for this task 😶 ?

I don't really have time to work on this myself (too busy with Leios), but if there is an ask from the community I'm sure it can be prioritized

@WhatisRT @KtorZ @sierkov I would be interested to take a stab at and have capacity in course of the cardano-blueprint initiative, if you confirm something like the following would be used by you?

I do understand this would roughly mean:

Generalize conformance tests (bi-simulation) of cardano-ledger and ledger-formal-specifications such that it can be used by by @KtorZ and @sierkov from their respective Rust and C++ environments
- Executable-in-the-loop?
- Compile Rust/C++ with C bindings into a test driver?
Only operates on the outermost level, i.e. LEDGER rule that takes a ledger state and transition is a Tx
- This is quite black-boxy, but maybe enough for a first prototype
- What format should the LedgerState have to be implementation independent?
Ability to run conformance tests against
- Generated CBOR using generators of cardano-ledger - currently done in their tests?
- Historic chain data - which data and format to use?

I would love to have input from @lehins on these things too.

WhatisRT · 2025-02-19T13:02:23Z

Only operates on the outermost level, i.e. LEDGER rule that takes a ledger state and transition is a Tx

This is quite black-boxy, but maybe enough for a first prototype

I'd at least put something like TICK in an MVP as well, so you can test going across the epoch boundary.

In principle it shouldn't be too difficult to provide tests for other subsystems as well, but their usefulness may depend on implementation details. We've also had this problem when conformance testing the Haskell implementation against the spec, where certain things happen at different places of the hierarchy. To deal with this, we've provided a 'Conformance' version of the spec, which has an equivalence proof to the actual spec and aligns better with the implementation. We could do something similar for other implementations that want to structure their logic differently, but that's a relatively big piece of work.

sierkov · 2025-02-20T01:30:11Z

@KtorZ, @WhatisRT, @ch1bo, thank you very much for the additional context and ideas.
Given the many points raised, I’ve structured my response into the following sections for clarity:

Project context – An overview of the C++ ledger implementation's testing and areas we’d like to address with this suite.
Data format – The benefits of using blocks as inputs, complete ledger state as outputs, and a story why the project transitioned from JSON to CBOR ledger snapshots.
Alignment process – A proposal for aligning on requirements and approach moving forward.
Additional thoughts – Comments on discussed topics not covered above.

TL;DR The most important section is 'Alignment Process,' particularly the debate over design alternatives and the confirmation of primary areas for initial exploration. I look forward to your feedback.

Project context

The C++ implementation originates from research on parallelizing the most time-consuming operations in Cardano Node, with a focus on batch synchronization (e.g., processing more than one epoch, ~20k+ blocks at a time). The initial goal was to demonstrate that, on mainnet data, it could leverage more powerful hardware to produce identical outputs significantly faster (targeting a 10x speedup).

To verify that the implementation produces the same ledger state, we follow this procedure:

A Cardano Node instance runs in a virtual container without Internet access.
A script modifies ImmutableDB files to provide one epoch of historical data at a time.
The node is restarted, processes the new data, and generates an updated state snapshot.
The snapshots—captured at the last slot of each complete epoch—are stored. ~600GB in CBOR format.
Another script generates ledger snapshots for several recent epochs using our implementation and compares them to Cardano Node’s using a CBOR diff algorithm.
If differences are found, a binary search determines the first diverging epoch.
All structures must match byte-for-byte, except for non-myopic pool likelihoods (float32), where equality is checked up to four significant digits.

Benefits of This Approach

Tests against complete mainnet data (millions of blocks), ensuring feature compatibility.
Provides confidence in performance comparisons.
Quickly adapts to new ledger behavior (e.g., Conway voting).
Easily extendable to Cardano testnet data.

Downsides to Address with the Proposed Test Suite

Does not test negative cases (blocks rejected by the ledger).
Lacks edge case testing (valid per spec but never seen on mainnet).
Does not assess adversarial behavior (potential attack scenarios).

Data format

Complete blockchain blocks as inputs

Our mental model aligns with Amaru’s simulation testing approach. Since Cardano network protocol messages affecting the ledger use blocks, it would be more practical if the conformance test suite were based on blockchain blocks as input. This approach enables simulation in both networked and standalone environments.

Thus, an input format consisting of a file containing a sequence of 0 or more fully formed Cardano blocks is proposed. A real-world example of a valid input file is a chunk file from Cardano Node’s ImmutableDB or VolatileDB.

To illustrate why blocks are preferable to transactions (as suggested by @ch1bo), consider a block with zero transactions:

If such blocks must be discarded, we need a way to explicitly test this behavior.
If such blocks must be accepted, the ledger state still changes because some components are influenced by block structure rather than transactions.
- For example, the latest slot affects pulsing computations such as rewards and voting.
- Another example is pool block counters, which impact pool performance, reward calculations, and, therfore, consensus. These behaviors also need explicit testing.

To conclude, in my opinion, using sequences of 0+ blocks allows to model all rules, while keeping the inputs sufficiently small for quick run times.

Complete ledger state as outputs

Through trial and error in teaching the C++ implementation to produce binary-compatible state snapshots with Cardano Node, we’ve learned that almost every component of the ledger state eventually impacts stake distribution or distributed rewards. In turn, this influences consensus.

The only purely informational component we are aware of is non-myopic pool likelihoods. However, since they are relatively small, making an exception for them seems unnecessary.

If needed, we can review the ledger state component by component and present cases where consensus is affected.

JSON vs CBOR

JSON has advantages, including human readability, ease of writing, and readable diffs, which improve developer productivity. However, our experience with test execution, ledger state generation, diff analysis, and test case preparation led us to replace JSON with CBOR. Here’s why:

File Size & Readability Limitations
- A CBOR ledger state snapshot for a recent mainnet epoch is ~2.5 GB.
- When converted to JSON, it exceeds 10 GB.
- Editors struggle with files this large, making them effectively unreadable.
Performance & Hardware Constraints
- Engineers often need to quickly generate a reference snapshot from a sequence of blocks.
- Cardano Node requires significantly more time and RAM (crashed in a VM with 48GB RAM!) to generate full JSON snapshots of most recent epochs.
- This makes it impossible to run on a moderate laptop, reducing convenience and increasing execution time.
Faster Diff Analysis & Tooling Solutions
- When analyzing snapshots, programmatic diff analysis is unavoidable.
- Since CBOR files are smaller, they are faster to process, improving developer efficiency.
- Although CBOR is less readable, a simple CBOR diff script can:
  - Print diffs in human-readable format.
  - Convert sequences of positional indices (e.g., #0.4.3.2.1.11) into descriptive names.

Despite CBOR’s poor readability, its smaller size, faster processing time, supported by minimal tooling make it far more efficient for this workflow. In practice, this results in higher developer productivity.

A Complete Test Case Data Set

A complete test case naturally includes the previous state snapshot and genesis configuration. Thus, a full data set for a single test case consists of:

Input — A file with a sequence of CBOR-encoded blocks.
Input — A set of genesis configuration files used by Cardano Node.
Input — A CBOR-encoded initial ledger state snapshot (or none, if not applicable).
Output — The expected output data is just a single A CBOR-encoded initial ledger state snapshot.

Benefits of This Format

Initial test cases can be created by copying data from a Cardano Node instance.
Ensures coverage of any functionality that affects the ledger state and could lead to non-conformant behavior.
The data is suitable for further simulation testing, such as the networking tests described by @KtorZ.

Alignment Process

To take a small step forward, I’ve identified key design decisions where alternative suggestions have been proposed, along with their respective proponents. Would it be reasonable to ask each proponent to prepare a minimal working example for their preferred approach?

With concrete examples available, it should be easier and faster to evaluate and align on a final version. Let me know if that works for you.

Design Decisions

Inputs: Blocks (@sierkov) vs. Transactions (@ch1bo)
Outputs: Full ledger snapshot (@sierkov) vs. Partial ledger snapshots (@KtorZ)
Encoding: CBOR (@sierkov) vs. JSON (@KtorZ) for the ledger snapshot

Also, I’d like to list some areas for exploration that seem particularly valuable with regard to the incremental value of this new conformance test suite. Please, let me know if I’ve missed something.

Areas for Exploration:

Generate a test case for the simplest possible edge case that does not occur on the mainnet.
Generate a test case for the simplest possible attack scenario.
Generate a minimal negative case—blocks rejected under current ledger rules.
Collect examples where the formal specification is incomplete compared to the reference implementation. (@WhatisRT, did I capture this correctly?)
Gather proposals for generating a comprehensive set of traces from the formal specification.

@WhatisRT, in my view, one of the most important questions in this discusison is the complexity of generating test cases from the formal specification. Given your experience in this area, could you describe the necessary steps to programmatically create a simple test case of your choice from the Agda formalization? I have practical ideas on how to generate traces, but before considering alternative approaches, I’d like to understand the feasibility of programmatic generation from the specification, which I see as the most comprehensive route.

Additional topics

On Implementation-Specific Drivers

@ch1bo, in my view, the test suite should provide only the data, while the responsibility for building a test driver should lie with each implementation. Since different implementations may have their own objectives and toolsets, it makes sense for them to develop their own drivers.

However, the data format should be designed to allow any implementation to develop an initial test driver quickly (within about a week of development time?). To validate this, a reference driver for Cardano Node could be included. Moreover, benchmarking against the reference implementation may be the most widely shared interest across all implementations, despite their differences.

A good example is the UPLC file format from the Plutus Conformance tests—it is implementation-agnostic and simple enough that an initial UPLC encoder and decoder can be developed within a week, enabling teams to proceed with testing.

WhatisRT · 2025-02-20T13:11:24Z

Overall, this sounds quite reasonable to me. Some points I'd like to add:

A ledger implementation that wants to participate in consensus will have to be able to validate transactions and step across epoch boundaries independently from each other. This is why I suggested adding an interface for TICK, which gives you what you need (if you can already validate transactions). In fact, some of the logic of validating blocks does not happen in the ledger, but as part of the consensus code - we actually do have a separate consensus spec that implements this logic and just now became executable. It's further away from being used for testing (but it's also a lot smaller, so it should be less work to get it ready for testing), so if you want to validate entire blocks I'd suggest using that instead.
The spec doesn't provide generators, it just provides a reference implementation. The ledger team worked quite hard on the problem of generating good tests and lots of different approaches have been tried in the past. There is now a quite good library for generating test cases based on constraints - I'd suggest to use this to not duplicate years of engineering effort. I'm a bit out of the loop on this, but if the ledger team is interested in supporting this use case for the library I'm sure @lehins can point you in the right direction.
The constraint based generators library should also in principle be able to provide negative tests, which is quite useful for conformance testing. We talked about doing that a while ago, but I don't know if that's something that was ever implemented.
In the JSON vs CBOR question, it seems quite obvious to use CBOR to me. It's what is actually being used and it has much better performance characteristics. If you manually want to read the data, you can always convert it to JSON and then you get the best of both.

lehins · 2025-02-21T05:36:13Z

This is a noble goal. However, it is one that requires an enormous amount of work. I can tell this for a fact from experience of trying to implement conformance testing of Ledger implementation against Ledger specification, which is a much smaller task than what is being asked for here. So, I hate to say it, but until we have the budget approved for such test suite that can be used for testing alternative implementations and until we have enough people to work on a massive project like that, the Ledger team can't really participate in it.

@ch1bo My opinion on this is that I don't have enough bandwidth to even worry about working a testing framework for alternative implementations. So, for now, this will have to be a volunteer driven effort that does not involve the Ledger team. I'll keep this ticket open, since there is a chance that we might get pulled in at some point into this effort, but until then we have to focus on work items that we have explicit approval for.

That was my administrative opinion. My personal opinion is that such a testing (or certification project as Charles mentioned it in his AMAs) should be driven by a totally separate team. I strongly believe that Ledger team should not be directly involved in this, since it will more than likely be a totally separate beast. Once a team like that is formed we would be happy to provide our guidance and share our experience we acquired from implementing conformance testing. Constraint generation framework and conformance test suite could potentially be even repurposed to become a more general testing tool like the one desired in this ticket. But again, this is not going to be a responsibility of the Ledger team, until we have appropriate resources and explicit approval, if ever.

sierkov · 2025-02-22T04:23:13Z

@lehins, thank you for being direct about the resourcing situation on your end.

This task does come with associated resources, though they are not explicitly outlined yet since the scope is still being defined. However, all three implementations already have to allocate resources for conformance testing. Additionally, as @ch1bo shared, the Cardano-Blueprint Initiative has expressed resource-backed interest in this effort.

A well-designed conformance test suite could meet the needs of all implementations, making shared contributions more practical than separate efforts. This is why the first stage of the task focuses on aligning scope and approach—its outcome will directly influence the resources available.

Why this issue is created in this repository

You manage a reference implementation and have deep expertise in this area, making your feedback especially valuable.
Some components of the test suite could provide incremental value to the reference implementation. For example, @WhatisRT pointed out that negative cases might be an area worth strengthening.
This repository is the primary reference point for alternative implementations. Keeping this task here increases visibility and may help attract additional resources for future implementation.

Request for Feedback

Would you be open to sharing your thoughts on the following, without making any resource commitments?

How practical do you find the proposed data format?
Do you foresee any challenges in structuring the test suite as a pre-generated, implementation-independent dataset rather than implementation-specific code?
Could any of the discussed testing areas provide additional value to the reference implementation? For example, more comprehensive tests for negative cases or adversarial behavior?

ch1bo added this to Cardano Blueprint Feb 18, 2025

ch1bo moved this to Opportunities in Cardano Blueprint Feb 18, 2025

ch1bo moved this from Opportunities to In Preparation in Cardano Blueprint Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation-Independent Ledger Conformance Test Suite #4892

Implementation-Independent Ledger Conformance Test Suite #4892

sierkov commented Feb 16, 2025

KtorZ commented Feb 18, 2025

WhatisRT commented Feb 18, 2025

KtorZ commented Feb 18, 2025

WhatisRT commented Feb 18, 2025

ch1bo commented Feb 19, 2025 •

edited

Loading

WhatisRT commented Feb 19, 2025

sierkov commented Feb 20, 2025

WhatisRT commented Feb 20, 2025

lehins commented Feb 21, 2025

sierkov commented Feb 22, 2025

Implementation-Independent Ledger Conformance Test Suite #4892

Implementation-Independent Ledger Conformance Test Suite #4892

Comments

sierkov commented Feb 16, 2025

Motivation

Requirements

Assumptions Driving the Proposed Approach

Proposed approach

KtorZ commented Feb 18, 2025

I - Simulation testing

II - Ledger snapshots

III - Test vectors

WhatisRT commented Feb 18, 2025

KtorZ commented Feb 18, 2025

WhatisRT commented Feb 18, 2025

ch1bo commented Feb 19, 2025 • edited Loading

WhatisRT commented Feb 19, 2025

sierkov commented Feb 20, 2025

Project context

Data format

Complete blockchain blocks as inputs

Complete ledger state as outputs

JSON vs CBOR

A Complete Test Case Data Set

Alignment Process

Additional topics

On Implementation-Specific Drivers

WhatisRT commented Feb 20, 2025

lehins commented Feb 21, 2025

sierkov commented Feb 22, 2025

Why this issue is created in this repository

Request for Feedback

ch1bo commented Feb 19, 2025 •

edited

Loading