-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation-Independent Ledger Conformance Test Suite #4892
Comments
Hey @sierkov, we're fully onboard with this idea as part of the Amaru project and have even already started working towards it. So let me outline a few of the ongoing efforts: I - Simulation testingInspired by the Maelstrom project from Jepsen, we're working on a simulation engine for node-to-node interactions that is (unlike Jepsen's): deterministic and language-agnostic. The idea is to leverage message-passing between processes and simulate faults between them. This effort is particularly focused on the overall consensus behavior of nodes, with the hope to offer a validation bench for the Ouroboros protocols as a whole. See more under pragma-org/simulation-testing. II - Ledger snapshotsTo test conformance of the Amaru ledger against the Haskell's ledger, we compare epoch snapshots produced from both ends. We currently roll with two snapshots:
As you can see, those snapshot are in JSON, mainly for two reasons:
We are yet to document the steps to produce them from a Haskell node but in brief, we leverage the mini-protocols and dump the III - Test vectorsOverall, we're also often taking specific CBOR-serialised blocks from mainnet, preprod or preview as easy source of validations. This strategy is however sub-optimal at this point because: (a) we do not yet measure coverage, so there's little guarantee that we actually cover the entire serialiser and (b) it only ever puts the SUT in front of valid data, and never in front of potentially problematic one. So, we are also when necessary generating test vectors from the Haskell codebase containing both valid and invalid scenarios. This has been the case, for example, for block headers which we used to validate both the deserialisation parts as well as the KES / VRF primitives with specific adversarial mutations. All-in-all, as you can see, we try to keep those things as language-agnostic as possible, with the idea to upstream some of them to the Cardano blueprint led by @ch1bo. |
In principle it shouldn't be too difficult to turn the formal specification into one that can operate on CBOR serialized data. We already have a translation for most datatypes that appear in blocks and states between Haskell and Agda, and obviously the ledger can deserialize CBOR into the Haskell types. So just generating a trace checker out of this should be quite doable in not too much time. I'd assume at least half of the effort would just be getting all the build infrastructure going. This is actually very closely related to a project that I'd really like to do at some point, which is to verify mainnet using the Agda formalization. There are multiple ways to do that, but this would be one. The bigger problems with this approach would be completeness (there are some things missing from the ledger formalization) and how to actually generate good traces. Completeness is obviously quite high on our priority list but it will take a while. For generating good traces, you could probably also reuse some of the ledger infrastructure that exists, but I don't know how difficult that would be. |
You mean to say that you are volunteering for this task 😶 ?
For lack of better tools, code coverage can at least give some answer (although it will only highlight the obvious non-covered path). We can at least start with that? |
I don't really have time to work on this myself (too busy with Leios), but if there is an ask from the community I'm sure it can be prioritized.
Absolutely! What we have right now works quite well for conformance testing the implementation, and of course completeness is exactly the same problem there. |
Thanks for opening this discussion @sierkov and for pointing me to it @KtorZ.
@WhatisRT @KtorZ @sierkov I would be interested to take a stab at and have capacity in course of the cardano-blueprint initiative, if you confirm something like the following would be used by you? I do understand this would roughly mean:
I would love to have input from @lehins on these things too. |
I'd at least put something like In principle it shouldn't be too difficult to provide tests for other subsystems as well, but their usefulness may depend on implementation details. We've also had this problem when conformance testing the Haskell implementation against the spec, where certain things happen at different places of the hierarchy. To deal with this, we've provided a 'Conformance' version of the spec, which has an equivalence proof to the actual spec and aligns better with the implementation. We could do something similar for other implementations that want to structure their logic differently, but that's a relatively big piece of work. |
@KtorZ, @WhatisRT, @ch1bo, thank you very much for the additional context and ideas.
TL;DR The most important section is 'Alignment Process,' particularly the debate over design alternatives and the confirmation of primary areas for initial exploration. I look forward to your feedback. Project contextThe C++ implementation originates from research on parallelizing the most time-consuming operations in Cardano Node, with a focus on batch synchronization (e.g., processing more than one epoch, ~20k+ blocks at a time). The initial goal was to demonstrate that, on mainnet data, it could leverage more powerful hardware to produce identical outputs significantly faster (targeting a 10x speedup). To verify that the implementation produces the same ledger state, we follow this procedure:
Benefits of This Approach
Downsides to Address with the Proposed Test Suite
Data formatComplete blockchain blocks as inputsOur mental model aligns with Amaru’s simulation testing approach. Since Cardano network protocol messages affecting the ledger use blocks, it would be more practical if the conformance test suite were based on blockchain blocks as input. This approach enables simulation in both networked and standalone environments. Thus, an input format consisting of a file containing a sequence of 0 or more fully formed Cardano blocks is proposed. A real-world example of a valid input file is a chunk file from Cardano Node’s ImmutableDB or VolatileDB. To illustrate why blocks are preferable to transactions (as suggested by @ch1bo), consider a block with zero transactions:
To conclude, in my opinion, using sequences of 0+ blocks allows to model all rules, while keeping the inputs sufficiently small for quick run times. Complete ledger state as outputsThrough trial and error in teaching the C++ implementation to produce binary-compatible state snapshots with Cardano Node, we’ve learned that almost every component of the ledger state eventually impacts stake distribution or distributed rewards. In turn, this influences consensus. The only purely informational component we are aware of is non-myopic pool likelihoods. However, since they are relatively small, making an exception for them seems unnecessary. If needed, we can review the ledger state component by component and present cases where consensus is affected. JSON vs CBORJSON has advantages, including human readability, ease of writing, and readable diffs, which improve developer productivity. However, our experience with test execution, ledger state generation, diff analysis, and test case preparation led us to replace JSON with CBOR. Here’s why:
Despite CBOR’s poor readability, its smaller size, faster processing time, supported by minimal tooling make it far more efficient for this workflow. In practice, this results in higher developer productivity. A Complete Test Case Data SetA complete test case naturally includes the previous state snapshot and genesis configuration. Thus, a full data set for a single test case consists of:
Benefits of This Format
Alignment ProcessTo take a small step forward, I’ve identified key design decisions where alternative suggestions have been proposed, along with their respective proponents. Would it be reasonable to ask each proponent to prepare a minimal working example for their preferred approach? With concrete examples available, it should be easier and faster to evaluate and align on a final version. Let me know if that works for you. Design Decisions
Also, I’d like to list some areas for exploration that seem particularly valuable with regard to the incremental value of this new conformance test suite. Please, let me know if I’ve missed something. Areas for Exploration:
@WhatisRT, in my view, one of the most important questions in this discusison is the complexity of generating test cases from the formal specification. Given your experience in this area, could you describe the necessary steps to programmatically create a simple test case of your choice from the Agda formalization? I have practical ideas on how to generate traces, but before considering alternative approaches, I’d like to understand the feasibility of programmatic generation from the specification, which I see as the most comprehensive route. Additional topicsOn Implementation-Specific Drivers@ch1bo, in my view, the test suite should provide only the data, while the responsibility for building a test driver should lie with each implementation. Since different implementations may have their own objectives and toolsets, it makes sense for them to develop their own drivers. However, the data format should be designed to allow any implementation to develop an initial test driver quickly (within about a week of development time?). To validate this, a reference driver for Cardano Node could be included. Moreover, benchmarking against the reference implementation may be the most widely shared interest across all implementations, despite their differences. A good example is the UPLC file format from the Plutus Conformance tests—it is implementation-agnostic and simple enough that an initial UPLC encoder and decoder can be developed within a week, enabling teams to proceed with testing. |
Overall, this sounds quite reasonable to me. Some points I'd like to add:
|
This is a noble goal. However, it is one that requires an enormous amount of work. I can tell this for a fact from experience of trying to implement conformance testing of Ledger implementation against Ledger specification, which is a much smaller task than what is being asked for here. So, I hate to say it, but until we have the budget approved for such test suite that can be used for testing alternative implementations and until we have enough people to work on a massive project like that, the Ledger team can't really participate in it. @ch1bo My opinion on this is that I don't have enough bandwidth to even worry about working a testing framework for alternative implementations. So, for now, this will have to be a volunteer driven effort that does not involve the Ledger team. I'll keep this ticket open, since there is a chance that we might get pulled in at some point into this effort, but until then we have to focus on work items that we have explicit approval for. That was my administrative opinion. My personal opinion is that such a testing (or certification project as Charles mentioned it in his AMAs) should be driven by a totally separate team. I strongly believe that Ledger team should not be directly involved in this, since it will more than likely be a totally separate beast. Once a team like that is formed we would be happy to provide our guidance and share our experience we acquired from implementing conformance testing. Constraint generation framework and conformance test suite could potentially be even repurposed to become a more general testing tool like the one desired in this ticket. But again, this is not going to be a responsibility of the Ledger team, until we have appropriate resources and explicit approval, if ever. |
@lehins, thank you for being direct about the resourcing situation on your end. This task does come with associated resources, though they are not explicitly outlined yet since the scope is still being defined. However, all three implementations already have to allocate resources for conformance testing. Additionally, as @ch1bo shared, the Cardano-Blueprint Initiative has expressed resource-backed interest in this effort. A well-designed conformance test suite could meet the needs of all implementations, making shared contributions more practical than separate efforts. This is why the first stage of the task focuses on aligning scope and approach—its outcome will directly influence the resources available. Why this issue is created in this repository
Request for FeedbackWould you be open to sharing your thoughts on the following, without making any resource commitments?
|
Motivation
The availability of Plutus Conformance Test Suite has fostered the development of numerous Plutus implementations in Rust, Python, JavaScript, Go, and C++. These implementations support diverse community use cases and enable experimentation with various aspects of Plutus, including performance.
The community is actively working on Cardano Ledger implementations in Rust, Go, and C++. However, without a high-quality, implementation-independent test suite like the one available for Plutus, each implementation must rely on its own custom tests. This increases the risk of non-conformant implementations being actively used simply due to the lack of a universally available conformance test suite. Furthermore, a language-independent test suite would facilitate collaboration, allowing contributions from any team to benefit all implementations.
The task consists of two stages:
More details follow. Feedback and suggestions are welcome.
Requirements
Assumptions Driving the Proposed Approach
The initial proposed approach is based on the following assumptions:
Proposed approach
The text was updated successfully, but these errors were encountered: