Logbook 2024 H1

December 2024

2024-12-24

To execute schnorrkel on L2 I had to play with the maxTxExUnits since I would get budget overspent errors.
If I alter genesis-alonzo.json file I can get past this error but then init tx is never re-observed (it seems like it was tried to post but there is no error and the test just hangs)
I was expecting to have to change Evaluate.hs module since that one contains the same definitions for values on L2 ledger but altering the values there (just slapping one more zero to exUnitsMem and exUnitsSteps) didn't resolve the issue with the budget.
Altering the forementioned values also increases the fees needed for other Hydra transactions, perhaps I also need to play with the wallet code in order to get everything running but it seems like the plutus function really needs to run withing the budget limits.
Maybe the ledger changes are not correctly implemented when it comes to the this new schnorkkel function?
Ok, even if I don't use schnorkkel function in our dummy validator I still get the overspent budget error:

  The plutus evaluation error is: CekError An error has occurred:\\nThe machine terminated part way through evaluation due to overspending the budget.\\nThe budget when the machine terminated was:\\n({cpu: -100\\n| mem: -100})\\nNegative numbers indicate the overspent budget; note that this only indicates the budget that was needed for the next step, not to run the program to completion.\

so I think there is something off with the plutus fork we are using.

2024-12-03

SB on debugging deposits

We tried running a head between two parties and noticed that deposit/increment mainly doesn't work for whatever reason. I saw it also working but in majority of cases the deposit suceeds but increment is either not observed or didn't run at all.
As the first step let's figure out if increment actually ran or not so we can go deeper.
Ok, it doesn't seem like increment ran at all!
We have this condition in code:

case find (\(_, depositUTxO) -> Just depositUTxO == utxoToCommit) (Map.assocs pendingDeposits) of

and in case this find returned Nothing we don't issue any error message at all. Perhaps the recorded deposit is not matching with what we have in the snapshot to commit?

I added a CommitIgnored message just so we are able to figure out what is happening.
Re-running the local demo with this change in place since I think this is what's happening and the reason we don't do any increments.
One thing confusing me is this one:

{"timestamp":"2024-12-03T11:16:01.589868581Z","threadId":539,"namespace":"HydraNode-\"2\"","message":{"api":{"reason":"Network.Socket.sendBuf: invalid argument (Bad file descriptor)","tag":"APIConnectionError"},"tag":"APIServer"}}

Also it seems like my two nodes are not connected - not able to exchange messages for some reason. Perhaps the port is busy? I'll re-run them with different ports.
I was having wrongly configured nodes duh... Once they are properly connected I managed to see H4 which is a HeadValueNotPreserved on increment!
Very weirl, I'll beef up the local e2e test to spin two nodes and try to deposit from a non-leader node.
The test turns out to work and I think part of the problem is that one of the node always sees a failing tx in case it was too late to submit it and the other node managed to get ahead first.
Testing different scenarios proved to work. I want to perhaps add a condition that you need to wait on decommit to finish before doing any deposits and also that you can't decrement in case there is a pending deposit.

October 2024

2024-11-27

SB on incremental commits on-chain

I talked to FT about tx trace testing and we might change our approach but let's see. Seems impossible to get the coverage we expect if we constrain everything in the precondition/validFailingAction
I discovered bugs when trying to do close/fanout after the increment in our e2e tests. This is the outcome of the redeemer changes for close/contest and also the way we handle snapshots.
HeadLogic needed to not include utxoToDecommit together with the utxo field - this was causing duplicate entries in utxo field. After altering this I found out that we need to specify CloseUsedInc also when calculating the deltaUTxO so that I am able to use deltaUTxO to provide a valid sig on-chain.
This change means deltaUTxO is not tied anymore just to decommits so now I am stuck in a problem between close/fanout. I can make close work by adding utxoToDecommit hash as deltaUTxO. This makes the close valid but fanout expects that any utxoToCommit hash is present in the close datum. This makes the close invalid but would make fanout valid. So I need to find a way to satisfy both.
One option would be to pass to the fanout tx utxoToCommit too and then use it like so

     Head.Fanout
          { numberOfFanoutOutputs = fromIntegral $ length utxo
          , numberOfDecommitOutputs = fromIntegral $ maybe 0 length utxoToDecommit + maybe 0 length utxoToCommit
          }

and also make changes in the CloseUsedInc redeemer to include the utxoToDecommit hash and not rely on deltaUTxOHash.

Unfortunately this leads to some weird plutus error I can't reason about from the error message. What could cause this?
Plutus debugging is really shit when it comes to situations like this one.
I am trying to view the tx and try to figure out from that what can cause this failure.
It was the case that fanout was missing any outputs
To solve things with more explicit approach I decided to add alphaUTxOHash to keep information about any utxo to commit on-chain
This solved our bugs with close/fanout and hopefully we are more explicit but also more clear in what we are trying to do.

2024-11-26

SB on incremental commits tx trace testing

I don't think we ever successfully do increments. Currently trying to get the increment tx to find the deposit txid.
I altered the AppM type to hold information about the deposit txid too since I needed to record it upon successfull deposit so I can use it to construct the increment tx.
This made the deposits work but now I see some problems with contests. H29 - TooOldSnapshot
I worked a lot on the close/contest and it seems like the current situtation does not allow proper contest modelling. If I have a snapshot with a snapshot number higher than the closed snapshot one how can I successfully contest?

2024-11-25

SB on incremental commits tx trace testing

Ok, I think it is time to wrap things up with these types of tests. I should also remember to add close and fanout to some e2e tests that exercise de/commits just so we are testing the close/contest logic some more.
After some changes in decrement to make sure only to do it if there is nothing pending I am seeing CannotFindDepositOutputInIncrement. I think I should investigate if the depositing works with adjustUTxO function.
I spent the whole day helping Dan with his branch

2024-11-21

SB on incremental commits tx trace testing

I have everything green right now but the coverage for fanout with empty utxo is not making the tests pass.
Here and there I see I generate something to decommit but to get something to commit it seems like there is something preventing deposits to appear more often.
In fact it seems like we never generate something to commit!
Perhaps including some utxo with Deposit was a mistake? Perhaps I should use the NewSnapshot for that too? -> No. We need to invoke a deposit tx in order for commit to workout nicely.

SB on sqlite experiment

In order to test out the benefits of using a simple db like sqlite and in general, help out the hydra-doom project to squeeze better performance we decided to do this experiment
Adding a new module with the similar interface (handle) proved to be easy.
Spent some time with the tests but all green now
Replacing the file persistence with this new module everywhere
Some tests need adjusting since we don't use temp files anymore (perhaps I could make it work). Slashes don't play nice with the db name so that's why I don't use tmp folders anymore but maybe I could find a workaround so we don't get db files all over after running the tests. There is a dropDb command I added but then I need to make sure to use it everywhere.

2024-11-20

SB on incremental commits tx trace testing

So the current problem is that we are trying to close with snapshot 0 and something to decommit and we get a invalid sig error. We rely on a fact that snapshot number 0 is tied to initial state and there is the problem we have since the state is different between the input/output datum which causes sig failure. For now I'll just make sure to close only if closedSnapshotNumber is > 0 in the precondition.
After this change we go further and now the decommit is failing because of VersionNotIncremented.
I've progressed further and now constantly trying to solve the problem where close sig is invalid.
The problem is that all looks good and aligned in the local state, snapshot and versions so it is not obvious why the signature fails.
This is how the actions look like:


       Assertion failed (after 18 tests):
         do action $ NewSnapshot {newSnapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
            action $ Close {actor = Bob, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
            failingAction $ Contest {actor = Bob, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
            failingAction $ Contest {actor = Alice, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
            failingAction $ Contest {actor = Carol, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
            failingAction $ Contest {actor = Alice, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
            action $ Fanout {utxo = [A,C], deltaUTxO = [B]}
            pure ()
         Model {headState = Open, knownSnapshots = [ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}], currentVersion = 0, closedSnapshotNumber = 0, alreadyContested = [], utxoInHead = [A,C], pendingDeposit = [], pendingDecommit = [B]}
         Close {actor = Bob, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
         Expected to pass validation

So immediately after the NewSnapshot we are trying to close. This leads me to believe that the input datum contains different information than what we set in the output datum thus the signature failure.
Visually I see the datums are different but I should try to parse and see exactly what is not matching in there.
After parsing both input and output datums I see that the utxoHash field is different.
In my opinion this hash should stay the same if we try to close without any actions before and then we experience failure.
I made sure to produce the same utxoHash as it is in the input datum by not taking out the utxo to decommit beforehand but after we see the decrement tx succeed.
Now I can confirm the two hashes are the same but we still get invalid signature error and I think I know what's at play here. In the TxTraceSpec we sign the snapshot right before calling the offchain tx building code. We say in the close that in case open and snapshot versions are the same and you still have someting to decommit in the snapshot that all you need to do is provide a already present signature but on-chain we still had deltaUTxOHash parameter for the decommit hash! The tests actually found a bug!
On top of this the tests are failing because the redeemer contains the wrong hash it seems. This is happening because we have a signature for a snapshot that claims it already decommitted something (since versions are the same) but in fact we produced a signature that contains something to decommit - wrong! We should bump the version when generating snapshots I think.
This makes this seed pass so I can move on to the next error in contest TooOldSnapshot. Yay!

2024-11-19

SB on incremental commits on-chain

I've made sure to properly generate snapshots that contain valid utxo to de/commit. Currently experiencing some issues with contest and H12 but in general I think I fixed a lot of corner cases we had in there. Current tactics is to generate bunch of different snapshots and than control them in precondition and validFailingAction to filter out the ones that should work.
One note the de/crement tx bumps the snapshot version by itself so we should not bump before some de/crement action is performed.
Having problems writing up this logbook entry since I change so many details in the testing quick and then forget to write about it.

2024-11-18

SB on incremental commits on-chain

Continuing work on modelling the deposit/increment in the TxTrace tests.
Focusing on snapshot generation in presence of de/commits and pending deposits.
Having problems getting the expected coverage for fanout (with some pending utxo to de/commit)

2024-11-15

SB on incremental commits on-chain

Made everything green and then started with some minor refactoring for the close/contest redeemers. Still want to revisit this code since it needs to be bulletproof.
I fixed the TxTraceSpecs by making sure we don't produce invalid snapshots - ones that don't have matching versions but still don't have anything to de/commit in the snapshot.
Proceeded to fix one failing mithril test and then stubbed out the Deposit and Increment in the TxTraceSpec. Monday will be all about modelling Deposit/Increment in the trace spec.

2024-11-14

SB on reproducing persistence bug

I basically wrote an e2e test to see how persistence behaves and it worked without any issues. I even tried parsing the state file to inspect the last tag we have recorded so I think the issue is somewhere in hydra-doom.

SB on incremental commits on-chain

I think now that everything is green again I will finally implement contest redeemer changes.
I am looking at the spec and pretty much doing what I did with the close redeemers already but it seems we have a corner case - if the versions do not match which close/contest redeemer you want to set? We don't know if the increment or decrement bumped the version so we can't know which redeemer to set?
This requires us to think how to handle this situation both in code and the spec.
Our local and snapshot versions are the source of truth here. The problems are not that big as I initially thought and we should probably refactor some code I wrote in the end but the main thing is that the version determines everything when it comes to the knowledge of incremental de/commits.

2024-11-13

SB on incremental commits on-chain

Continuing from yesterday to try and figure out why hydra-scripts-tx-id parser is not working.
After staring at the code for a long time I figured out what makes the hydra-node executable halt and not respond when trying to run it. If you specify a value in your options parser and this value somehow is not matching with the type you are trying to parse everything halts for whatever reason! omg...
And now the publishing fits into the reasonable tx sizes so we're back on track baby!
I want to package all changes nicely into a single commit so we can get rid of it fast in case we get the scripts to be smaller in the end. This solution works and I would go for it but just leaving room for some improvements.
I think now is finally the time to pay attention to close/contest offchain and make necessary changes to the redeemers which means we are wrapping up the feature. I don't see what else is missing a part from the spec changes?
Before doing that I decided to make all tests green again since spec was failing on fanout for some reason.
I got all StateSpec green and will do the same for some logging tests and similar stuff.

2024-11-12

SB on incremental commits on-chain

Since we merged the initial aiken validator work + plutus V3 to master now is the time to rebase it on top of incremental commit work and see how are script sizes.
Had to recompile aiken scripts and fix some stuff I screwed up during rebase but now I see that the publishing tx is 17401 bytes big so I think the next step is to try and use asData plutus trick to cut down on the head script size.
The other, completely opposite, approach is to use Plutarch and try to replace the individual Head script checks incrementally. Perhaps I should opt in to do this sooner rather than later since I have heard great things about Plutarch and the produced script sizes.
Third option would be the simplest one and that is to use list of tx-ids as the argument to the hydra-node. Perhaps instead of wasting a lot of time to optimize a Head script easiest would be to just make the publishing work with multiple transactions instead of one. There really are no cons for this approach besides making the users give in a list of tx-ids instead of just one. On the command like it would look really similar so I don't think this is an issue. But on the other hand it is kinda lame to have to change the current hydra-node behavior so I am not really sure now...
I noticed we don't optimize our minting policy so I added conservative-optimizations and optimize pragmas which brought dramatic reduction (from 11111 to 5555 bytes) but since this is not the script we publish I still need to cut down on a Head script size.
Ok. Let's try with the asData plutus trick and see where does that lead us.
As expected there are a lot of changes since asData requires us to disambiguate all record fields and disable DuplicateRecordFields.
For some reason in the end I am getting:

hydra-node: Error: Unsupported feature: Type constructor: GHC.Prim.Addr#
Context: Compiling expr at "hydra-plutus-0.20.0-inplace:Hydra.Contract.Head:(761,5)-(761,48)"

and the line that fails is the one compiling the typed validator:

compiledValidator :: CompiledCode ValidatorType
compiledValidator =
  $$(PlutusTx.compile [||wrap headValidator||])
 where
  wrap = wrapValidator @DatumType @RedeemerType

When re-compiling with removed Strict and no-specialise pragma in the HeadState I see:

hydra-node: Error: Reference to a name which is not a local, a builtin, or an external INLINABLE function: Variable Hydra.Contract.HeadState.$mInitial_go
       No unfolding
Context: Compiling definition of: Hydra.Contract.HeadState.$mInitial
Context: Compiling definition of: Hydra.Contract.HeadTokens.validateTokensMinting
Context: Compiling definition of: Hydra.Contract.HeadTokens.validate
Context: Compiling expr at "hydra-plutus-0.20.0-inplace:Hydra.Contract.HeadTokens:(187,5)-(187,100)"

so it seems this has something to do with the minting policy???

I couldn't get a proper answer in the plutus channel a part from the fact that there has to be some type in the asData that is not parseable by plutus for some reason.
I moved on to implement the script publishing as three separate transactions and got pretty far with it. It is not a problem to not have the tx-ids in order since we compare the script hashes to figure out which output is which. Just need to get the parser working nicely.

SB on incremental commits on-chain

Continuing to debug the initial validator. Assumption is we run into a validator branch that is not properly covered by some trace.
Even returning True from the ViaCommit redeemer yields the same L82;5 error. Omg....
Enabling verbose tracing in aiken and trying to figure out if we get the same commit validator hash in the initial argument as we expect. This yields:

         The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["redeemer: Redeemer"]))]

Again wat? I am trying to trace the commit validator hash here:

      ViaCommit(committed_refs) ->
        trace @"commit_validator": commit_validator
        True

Let's try returning true even before pattern matching on a redeemer:

    expect Some(head_id) = datum
    trace @"commit_validator": commit_validator
    True

Still the same BUT! I got an idea! What if this happens because we are using plutus V2 to assemble the redeemer while initial expect V3 version? And then the parsing of the datatype fails with this misterious error? Let's give it a shot!
This makes sense to me but the problem is that it seems like the plutus version is baked into the ledger codebase so you can't easily mix them.
I think our best bet is to make the branch with plutus V3 work by trying to decrease the size of the initial validator so that it can fit in the publishing transaction and then rebase the incremental-commits branch on top of that one.
FT and me managed to cut down on the initial script size by around 300 bytes so now we can publish scripts again.
Need to fix two failing mutations related to commit transaction.
Commit is not healthy and I see I12 because commit datum is empty for whatever reason. This is weird, I am tracing the commit script hash since commit always has a single output.
Seems like we were using wrong validator hash in aiken blueprint apply... For some reason aiken can't use the hash I have available in plutus.json. It is happy to take in whole validator cbor on the other hand but that is just wrong. I saw a mention of calling a cbor.serialize function but the value is already a valid cbor?

2024-11-05

SB on incremental commits on-chain

Now that we rewrote the initial validator in aiken I can try to see if it fits in the publish transaction.
It does! woo-hoo! We still get some mutation errors and need to deal with golden plutus tests since now we need to automate the blueprint apply command but overall I am very happy since we are able to wrap things up in terms of script sizes.
I need to be able to do blueprint apply manually in order to automate it later on but for some reason it always reports malformed cbor string. If I use interactive aiken prompt everything works. I am trying to go back to the version 1.14 and see if this works there. (no it doesn't)

➜ aiken blueprint apply cf8b5028b108
    Analyzing blueprint
      Parsing inputs

        Error aiken::blueprint::parse::parameter

  × I failed to convert some input into a valid parameter
  help: Invalid Plutus data; malformed CBOR encoding: decode error: unknown tag for plutus data tag

Whatever I try I can't make the aiken command take the cbor string directly. I'll leave it off for the time being but we need to do this.
Continuing to look at Commit/is healthy tests I see weird error:

Redeemer report: fromList [(ScriptWitnessIndexTxIn 1,Left (ScriptErrorEvaluationFailed (CekError An error has occurred:
         The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["L82;5"]))]

Wat? Why can't I have nice things in life.... :)
Returning True in ViaCommit path in the initial redeemer didn't help.
At this point I am not sure how to proceed. I wanted to print the increment tx and then run the tests and try to detect what is outstanding from looking at the commit tx from the tests.
At least one good thing is that basically all of the commit tests fail with L82;5
But how to tackle this error?
Ok, one thing that is a step forward - if you try to commit using blueprint tx that one also fails with the same error!!! So this has something to do with the blueprint tx somehow.
I vaguely remember I had to actually make sure that FullCommitRequest works with commits but not really sure right now.

2024-11-04

SB on initial aiken validator rewrite

I think the main stream of work today is to make sure FT's branch work since that one contains the initial aiken validator and bump to plutus V3 which allows smaller script sizes for the work on incremental commits.
Right now we get a nasty failure that looks like this:

         Redeemer report: fromList [(ScriptWitnessIndexTxIn 1,Left (ScriptErrorEvaluationFailed (CekError An error has occurred:
         The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.
         Caused by: unBData
                      (Constr 0
                         [ Constr 0
                             [ List
                                 [ Constr 0
                                     [ Constr 0
                                         [ B #7db6c8edf4227f62e1233880981eb1d4d89c14c3c92b63b2e130ede21c128c61
                                         , I 21 ]
                                     , Constr 0
                                         [ Constr 0
                                             [ Constr 0
                                                 [ B #b0e9c25d9abdfc5867b9c0879b66aa60abbc7722ed56f833a3e2ad94 ]
                                             , Constr 1 [] ]
                                         , Map [(B #, Map [(B #, I 231)])]
                                         , Constr 0 []
                                         , Constr 1 [] ] ]
                                 , Constr 0
                                     [ Constr 0

...

So it looks like decoding of redeemer fails here? Initially I though it was the ScriptHash parameter for the increment transaction.
I am trying to figure out exactly what fails here: parameter, redeemer or the datum?
From what I see the initial and commit script hashes look correct.
In the end we figured out we actually need to apply a validator argument using aiken blueprint apply... duh...
Now we need to fix the initial validator since it still throws errors on mutations for commit and collect.
Seems like tracing gives fast results compared to looking at the logic to figure it out.
Ok, I am not going to try and get rid of Option in a list and fail early but leave it until the end to pattern match.
Got everything green but one I03 mutation
Seems like we are checking if the value is >= in the validator but in the mutation we definitely produce larger values so this is expected to fail. Negating values worked.
Everything is green but the script sizes are not satisfactory. We need to gain like 300 kb somehow if we want to get this merged. Seems like the head script got bigger by updating to V3.

2024-10-31

SB on fixing the master

Ok, just updating the mithril to the latest version fixed the problem

SB on figuring out why some user blueprint request is not working

We got the user complaint on discord that this request :

{
  "blueprintTx": {
    "cborHex": "84a600d9010281825820fa8a263c3b2b4de3dca8c6f9abd6e6da1dc28c8ff3e0025796ed0d717d8b37a9000dd9010281825820fa8a263c3b2b4de3dca8c6f9abd6e6da1dc28c8ff3e0025796ed0d717d8b37a901018002000ed9010281581c9f55c6328203ab96b079ca0347194836aa8013532b68a8889afa6b840b5820e4522c9439f91001ae113ecba30044593de966600f64a0d170be818703defa2fa207d901028158c358c1010100332323232323223225333004323232323253330093370e900118051baa001132323232330010013758602260246024602460246024602460246024601e6ea8020894ccc044004528099299980799b8f375c602600401a29444cc00c00c004c04c004c03cc040008c038004c02cdd50008b1806180680118058009805801180480098031baa00114984d958dd7000ab9a5573aaae7955cfaba1574498011e581c9f55c6328203ab96b079ca0347194836aa8013532b68a8889afa6b84000105a18200008201821a000186a01a000f4240f5f6",
    "description": "",
    "type": "Tx ConwayEra"
  },
  "utxo": {
    "fa8a263c3b2b4de3dca8c6f9abd6e6da1dc28c8ff3e0025796ed0d717d8b37a9#0": {
      "address": "addr_test1wze4wd4nrv5wra70glypjlk7ff2n46ty29qf2dqv53q33us7q0lpp",
      "datum": null,
      "datumhash": null,
      "inlineDatum": "d87980",
      "referenceScript": null,
      "value": {
        "lovelace": 2000000
      }
    }
  }
}

is yielding:


Error in $.blueprintTx: \nunexpected \"l\"\nexpecting hexadecimal digit\nIncorrect transaction id format: Expected Base16-encoded bytestring, but got b; invalid bytestring size

Since I need to sync the preprod and everything I suggested they try the request with the specified txId which you can calculate using cardano-cli and specify the tx type as Unwitnessed to see if that fixes the problem before I can take a look.
We went back and forth to figure what is the problem and finally it was just that the inline datum should be encoded as json value not the hash

SB on initial aiken validator

Trying to help FT on this one. We see problems related to data formats and I think it is related to how we pass the commit validator to initial one.
The plutus error I get almost certainly points to some large structure (like script) but we can't mix and match Plutus V2 and V3 when providing argument to the initial script (the commit validator). Therefore we need to first bump to PlutusV3 then check if FT work is ok and the initial validator works. If all of that is good finally I can use it on my branch to see if we are able to publish scripts with reduced script sizes.

2024-10-30

SB on incremental commits on-chain

Continuing with the final step for the aiken deposit validator, I am missing to check if all recover outputs are matching the deposit outputs we recorded in the datum. Might be that the aiken custom type I use on-chain is somehow not aligning with the haskell version we put in the datum. What worries me is that when I trace what I get on chain it looks like this:

         The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["depositOutputs: [_ 121([_ 121([_ 121([_ h'099828E6353CCC63F28FCB11FF71E92A48BE3DECCF7739A564CA3992']), 122([])]), {_ h'': {_ h'': 19 } }, 121([]), 122([])])]","deposited: [_ 121([_ 121([_ h'0000010000000001010000000100010000000000000001010001000101010000']), 79])]","preSerializedCommits: h'4BAECAAF54C46AC8C2CDF7CC256CEBE547806810608AE8B86B5B5AA78AF49FC7'","hashOfOutputs: h'916F843120D16BDE9D98F9D50E7A7708D5A1EABA1CA261BD3A710F7B60AAC271'","D05","Validator returned false"]))]

So deposit outputs are not matching what was deposited so the hashes of course don't match. I need to see exactly same values on both ends before I can compare them.
I made sure to follow exactly what I did in the haskell implementation of a deposit validator and BAAM! I got it working. What I needed to do is exactly sort by the TxOutRef and then use the preSerializedOutput field for hashing. Nice!
Now I'd like to use a deposit script as a reference input to further reduce the tx size. Running tx-cost to see the current sizes and then use the deposit reference.
I am realizing I didn't even add the increment transaction to tx-cost. Doing that now.
This is how the results look:

## Cost of Increment Transaction

| Parties | Tx size | % max Mem | % max CPU | Min fee ₳ |
| :------ | ------: | --------: | --------: | --------: |
| 1| 1842 | 20.01 | 9.06 | 0.46 |
| 2| 2079 | 22.29 | 11.03 | 0.50 |
| 3| 2124 | 22.83 | 11.93 | 0.51 |
| 5| 2444 | 26.45 | 15.26 | 0.58 |
| 10| 3227 | 34.65 | 23.19 | 0.74 |
| 50| 9151 | 99.48 | 85.75 | 1.97 |

Now let's use the deposit as a reference to see the outcome.
I see MissingScript errors but don't want to deal with that right now so I'll move on to do needed changes in contest instead.
First I refactored and added the missing ContestUsedDec mutations. We were missing to test out a bit the case when the decommit was not used.
Now moving on to the new ContestUnusedDec. All these redeemers should have appropriate mutations.

2024-10-29

SB on incremental commits on-chain

Today we need to take a look at fixing the two functions that should hash the serialized TxOutRef in the deposit validator.
It seems like the produced aiken datum is not in the correct shape as the plutus one.

         Caused by: unListData
                      (Constr 0
                         [ Constr 0
                             [ B #3f4610dea46ef10fb5be4396c992dd657a4f5852019c8010b56aff32
                             , I 1730196006889
                             , List
                                 [ Constr 0
                                     [ Constr 0
                                         [ Constr 0
                                             [ B #0101010101010100000101010100000101010001000001010100010000000100 ]
                                         , I 19 ]
                                     , B #d8799fd8799fd8799f581ce0ae7da7d247a68e71b40a81b04b1d3f4dc809a1eeef582f35cd22b8ffd87a80ffa140a14003d87980d87a80ff ] ] ] ])) ["Expected the List constructor but got a different one"])),(ScriptWitnessIndexTxIn 1,R

Currently digging through internet to see if somebody fall into the same trap.
I also asked the hydra-doom guys since they are working a lot with aiken.
Trying to use plutusV3 to construct the datum and see if that works -> NO
Trying to use type alias for datum in deposit, the same way we do for commit.ak -> NO, but I think I am on the right path. Looking at how we did things when rewriting the commit validator should provide a solution.
BAM! Success! I was missing to properly encode datum and redeemer to plutus data and I used type alias for the datum just like we do in commit aiken validator.
Now the final piece of the puzzle is how to implement check that outputs match what was hashed in the datum?
I tried using both list of TxOutRef and already hashed data to compare but didn't get anywhere.
Tracing to see what values I get on-chain and it seems that list of deposited outputs is a cbor string? There is a nice debugging function cbor.diagnostic which I am using to inspect the datum values. In general Aiken feels great even for me with a lot of years looking at haskell. Syntax throws me off of course but I could get used to it and I should if we plan to use aiken for the head script rewrite.

2024-10-28

SB on incremental commits on-chain

Today should be the time to deal with close tx and changes in the redeemer. The code looks a bit unwieldy but once it is correct I can try to improve upon it.
I ended up starting from scratch. Feels like there is a need for a new redeemer to signal that we are closing, there is nothing to inc/decrement but we are not using initial snapshot. This is not covered by the new redeemer types and we need to also add this into the spec.
This change makes the e2e tests green again.
Next I'd like to test the changes to the deposit aiken validator since I cherry picked the commit FT made.
If all works I would use the deposit as a reference input to increment and recover.
Had to propagate this change across the code-base and make sure I always use PlutusScriptV3 for the deposit script.

SB on initial validator check

Hydra doom people had a question related to one of our checks in the initial validator https://github.com/cardano-scaling/hydra/blob/master/hydra-plutus/src/Hydra/Contract/Initial.hs#L125C9-L126C37 . They have the need to change the values distributed on L2 without using an extra tx on L2 hopefully. The increment validator is making this impossible since it expects that whatever we put in the commit datum to be matching exactly with what we consume in the commit tx and the same needs to be present in the initial redeemer (ViaCommit) . What they would like is to keep the UTxO value check but don't check if the exact inputs are also locked so that they could distribute the UTxO on L2 as they please. Question is: does this affect any security guarantees of a Hydra Head protocol?
I wrote up an issue about this and agreed to arrange a meeting with Matthias to go through the implications of this change.

2024-10-25

SB on incremental commits on-chain

I wanted to first fix the StateSpec tests I added yesterday. These are propertly tests so perhaps they find some interesting corner cases in our initial validator. Together with mutation tests these should give us the clear picture if our validator was implemented correctly or not.
I realized most of the generated types depend on what we have in the deposit generator since increment is spending this output so I re-used OpenState and HydraContext and improved couple of things (like deposit deadline re-use) so the tests are green now.
Looking briefly at FT e2e test that faile (related to recover) I realize that threadDelay doesn't behave the same on MacOS and Linux so we added extra second in order to get this test green on FT machine as well.
Happy that this worked out nicely I am looking at the e2e tests that fail. On close we try to use CloseInitial but that expects both version and snapshotNumber to be zero (which feels wrong). We could have many snapshots but stay in the same version in case we never did in/decrement.
If I revert all changes in Close module to master then the test is green.
I am trying to refactor close off-chain logic but I think I better do it with a clear head on my shoulders on Monday since I am tired rn.

SB on hydra doom

I need to time the time it takes for the hydra-node to catch up and see an open head if you use --start-chain-from in the past. This is needed for hydra-doom of course since they want to pre-open many heads and then make them catch-up with the chain.
Lots of problems with making a linux command to look at websocket messages and stop once it sees HeadIsOpen and of course measure the time.
I even pinged my friend who is systems guy and he helped use some python script since all linux commands we used were not stopping once we see the mentioned hydra-node message.
I tried doing this on AWS a week ago by my cardano-node kept restarting for some reason.
Locally I can't get to a open head state for some reason. I restart the hydra-node using start-chain-from which is one block before but still I don't see the head open.
Ok got some results at least. So the Head got opened here https://cexplorer.io/tx/cabd84249d9d4ba37c58801f7f0d695c2bb17569ffd23b29b0ccba45aa6d0a75 at 138313510.cabd84249d9d4ba37c58801f7f0d695c2bb17569ffd23b29b0ccba45aa6d0a75
If we go back one block to be able to observe Head opening and want to sync to

{
   "block": 11013347,
   "epoch": 517,
   "era": "Conway",
   "hash": "8a26f38d9118f8c7b51ab823e4c3eb8919c51ad9bea343a9c30332c25a43437f",
   "slot": 138390441,
   "slotInEpoch": 409641,
   "slotsToEpochEnd": 22359,
   "syncProgress": "100.00"
}

which is currently 21h31m ago so less than a day it takes 2m35.174s

Now if I choose some slot from the previous epoch https://cexplorer.io/block/f066dda7d6870b51bba99e5a2df538eb953f068858e0a23511748ef520c6ed4a which is currently 9d17h59m ago then to sync to a block

    "block": 11013414,
    "epoch": 517,
    "era": "Conway",
    "hash": "e69b2bb04568837690b7e810125608373fa722a2170a3a04fdbe6d36025cf2ef",
    "slot": 138391676,
    "slotInEpoch": 410876,
    "slotsToEpochEnd": 21124,
    "syncProgress": "100.00"
}%

takes 5m21.054s

Curios to see FeeTooSmallUTxO when trying a new tx using 1,2 and 3 ada. Is the head actually calculating correctly fees on mainnet?

2024-10-24

SB on incremental commits on-chain

Went through some questions with SN, all seems good so far just minor clarifications. We should rewrite our initial contract in aiken to be able to publish transactions.
Continued with the blueprint script commit test in the e2e and FT and me had some luck to find the problem. The only thing is I'd like to know why the utxoToCommit is not the same as the deposit out value?
Found out that the value mismatch is happening because when constructing the increment tx we try to find the deposit UTxO in the spendableUTxO and for some reason the UTxO inside has slightly different value. I am investigating further.
Phew... After some serious debugging I found out that the deposit output gets balanced and if the incoming value is too small (just like in my case) the output gets like around 700 ada more after balancing. So the observed value and the value in the UTxO is not matching. If I produce high enough lovelace output then the test is green. How to deal with this problem? I think since I try to commit a script input with just above 1 ada this get's autobalanced in the wallet when posting deposit and then when we observe the value is not matched with what we expect. There should be a note somewhere (in the docs most probably) that users need to be careful to provide correct amount of ada at each deposit they send.
Happy to see this working end-to-end so we now move forward knowing the current situation is working fine.

2024-10-23

SB on incremental-commits on-chain

Yesterday I solved two major bugs. Now I want to make sure everything works (as in the spec) and tests are green before diving back to the deposit aiken validator.
We found another major bug which we fixed. We were assuming the head output is always the single one but there is also a change output to the internal wallet which we neglected.
Now increment validator is behaving nicely and I am looking at commit blueprint tx spec in the e2e since that one also fails with H4.
Ok, time to gain back some sanity. Ok, time to gain back some sanity. Why would the full commit request (using blueprint tx) fail when we check that deposit value + whatever value the head contained is equal to the head output?
This is the corresponding check:

  mustIncreaseValue =
    traceIfFalse $(errorCode HeadValueIsNotPreserved) $
      headInValue <> depositValue == headOutValue

I will print the corresponding increment tx that failed in wallet to spot what is wrong.

2024-10-22

SB on incremental commits on-chain

I wanted to see in the end if I use deposit (aiken) script as a reference how will the increment tx sizes look like? Right now I am getting a missing script error for some reason and need to investigate since I already made sure to publish the script beforehand.
I think I altered all possible places to mention the new aiken deposit script but still get MissingScript errors in e2e. I decided to step back and try to find out the original cause for the plutus script error I got since it is not gone yet. Yesterday I saw all tests green when I added big enough contestation period value but now I just think I am crazy. I still see the same phantom script error so want to go back and deal with it once and for all.
I am having problems matching the proper version needed for plutus-debug executable to work.
Went back two commits behind since I realized that the mutations are flaky when it comes to checking snapshot signature and utxoToCommit field. Need to make these bullet-proof before continuing further.
The problem was that I was accessing tx inputs by index in the increment validator! Now the increment tx is valid and all of the mutations. BUT there is still a problem in the e2e tests that I need to fix before continuing further. Tx fails on submission and the CEK machine error is not pointing to anything meaningful.
I'll figure out exactly which check in the validator is doing this by removing all of them and then re-enabling. Ok so it was unexpectedly the check that the head output needs to be head input + whatever we deposited! Getting the deposited value on-chain proved to be the problem so I put it in the redeemer (for now) and the check is working again.
A lot of problems on every step but we are moving forward at some pace I don't really like but that's the reality of it. After fixing these two major bugs I was not aware of I think we are in good position to move forward and try to reduce script size using the new deposit validator in aiken.

2024-10-21

SB on incremental commits on-chain

Continuing to debug and figure out why increment tx is causing weird plutus errors. I commented out the suspicious check where I compare txOutRefs and left alone all other checks.
I feel like this has something to do with the lower validity bounds for increment. What if the tx is valid but the time passes between tx going to the mempool and then getting out of the mempool. I see that tx was rejected in cardano-node logs but there is no reason.
As soon as I increased the contestation deadline in the test everything started working again since there was enough time to post increment before the deadline. Woot-woot!
Since rewriting the Head script to Aiken will take some sweet time I could reduce the size of increment tx by rewriting deposit in aiken. Since increment tx consumes the deposit input and needs to attach deposit script this should lower down the total tx size.
Ok, turns out working with Aiken is a breeze even if you never saw it before! The script size for deposit is now 571 bytes and it was 3096 before! Sadly this is still not enough so I'll try to use it as the input reference and see if I can gain any tx size benefits.
Trying to finish the aiken deposit validator I am having some problems translating the functions we have in haskell to hash the tx outputs (sha_256) and compare it to the commits we stored in a datum. A bit disappointed changes didn't lead final success but we need to keep reducing these scripts. Perhaps I should spend some time profiling the head script?

2024-10-17

SB on incremental commits on-chain

I got all checks for the increment validator implemented and covered with tests
Next problem is that the head script grew too big so I need to do some optimisations

MaxTxSizeUTxO 16554 16384

I tried using asData TH function to use the more efficient data encoding but that clashes with DuplicateRecordFields and in the end I couldn't compile the code that easily so I gave up - perhaps I'll revisit
Tried Strict pragma but that did basically nothing
Looking into using our custom ScriptContext type for the deposit - how to go from BuiltinData to the Interval type?
Ok this makes the tx size even bigger for some reason:

MaxTxSizeUTxO 16861 16384

What if I used deposit script as a reference input in the increment/recover transaction? let's give it a shot. Ok, this didn't help also.

MaxTxSizeUTxO 19697 16384

Trying to use the same deposit reference but also use custom ScriptContext in the deposit validator.

MaxTxSizeUTxO 19963 16384

Wow! nice work...not
Ok, I decided to deal with this problem in the end when I alter the close validator too and then perhaps re-write the deposit validator to aiken or plutarch.
Continuing with the close validator/redeemer changes - Adding new constructors for increment in the on-chain code was a breeze. Slight problems with the off-chain code where we need to detect if we de/commited and then use the corresponding constructor.
e2e tests for incrementally committing now fail but not sure why. Increment fails and it should be ok since the changes I am doing affect only close tx.
Perhaps it's just the tx size? NO it is some weird plutus failure I am not even sure what happens since I don't see full error. I'll continue in the morning.

Hack and learn!

Golden file changes

Learned that the looking at the diff of the golden files is not very useful! Better to know that they should change if the associated types changed.

Blueprint tx

When committing, you must provide a transaction (which we call the blueprintTx), which specifies how to spend the UTxO that has also provided. The type is CommitBlueprintTx.
It's a "blueprint" because we modify it ( see hydra-tx/src/Hydra/Tx/Commit.hs ) by adding the hydra-node signing key to the signers, adding in a requirement for it to spend the initial input
By setting attributes of the transaction, for example adding a script output, this would bring in additional logic into L2

Trivia

"PT" stands for "participation token"

2024-10-16

SB on incremental commits on-chain

Ok, trying to figure out why the added check that the deposit script is spent is not working
Now I am trying to trace on-chain representation since I don't see off-chain any differences between the two TxOutRefs
Ok, I found out that calling toBuiltinData before comparing the two values did the trick! Question is why we can't compare the two on-chain without turning it into BuiltinData?
By writing the corresponding mutation I discovered the change is actually wrong since I can use any input in the redeemer and tests are green. Wat?
I reverted the recent changes so I can start again. Perhaps comparing hashes should work out?
So comparing hashes didn't work so I'll just leave this check as the last thing to do since other ones should be easier.
Next check in line is the signature validation. How to get the correct $U\alpha$ ? What we include in the head is contained within the deposit datum. I should decode the datum and then use the information inside to produce the correct hash.
Correct, this is also what the spec says (though that might be expensive)
Again I find it hard to debug when the validator fails since what I see in code seems correct but not having some easer way to trace on-chain code is making my life miserable.
Ok, I found the problem. Test snapshot is just plain wrong and doesn't contain anything to commit.
Another issue is that sometimes I wasn't able to find the deposit input properly. I think it had to do with the resolveInputsUTxO but now I changed the code to not rely on the input ordering and try to find the deposit by address so everything works again.
This was actually the problem all along! Even the redeemer check from yesterday is ok now! Oh man, I am happy now
I don't understand why altering the order of these two conditions affect the increment tx healthyness?

    && claimedDepositIsSpent
    && checkSnapshotSignature

SB on incremental commits on-chain

After writing the initial mutation that should check the deadline I can't seem to influence the deadline in the increment tx in such way to trigger the validator (deposit deadline seems to be set in the past, it is a negative number).
Perhaps I need to make sure the deposit deadline has a meaningful value first in order to resolve this test.
Creating the UTCTime from day, month, seconds did the trick but the input datum is for some reason invalid - can't be deserialized.
Ok, this happened because I was using the head input instead of deposit one. Now the datum can be decoded properly.
Since we are re-using the contestation period for the deadline, both in the http server when setting the deadline and in the handlers when constructing the increment, these values clash now. I opted in to set the deposit deadline as double the amount of contestation period. Hopefully this is enough time to be able to post the increment, if not we can always have a custom cmd line argument.
Now the deadline related mutations are green and I also updated the incremental commits documentation PR to reflect these new changes.
Next in line is the check for the head id. In the head contract we have the previous and the next datum to compare the headId. I opted in to put the head id in the Claim redeemer instead so that I have some value to compare when spending the output.
I got both mutations I wanted green and I added a fixme to address needed changes in the spec for the Claim redeemer
Continuing to tackle the increment redeemer. There is immediately some confusion with the redeemer fields, seems like we need to keep the TxOutRef of a deposit we're trying to spend?
I added one fixme to tackle the spec wording where n is mentioned in the first check that parameters are unchanged
~~Now I noticed another discrepancy фincrement vs. фdeposit where the latter one is not introduced properly.~~ Actually...I tell a lie, it is the deposit input.
This next check is interesting

Claimed deposit is spent
𝜙increment = 𝜙deposit

So this would mean I need to find the deposit input to increment transaction and then compare it to the data I have in the increment redeemer (TxOutRef)
I still see deposit input first in the list, is the tx rendering borked? I see sortBy (compare on fst) (txIns content) so we sort than print which could be the cause of my confusion.
I don't immediately see why the check that deposit script input is spent is not working.

2024-10-14

SB on timing the sync to HeadOpened

I got the request from Trym to time how much time it would take to open a head, then wait for couple of days and then start-chain from the moment the head was opened so we cut down on the chain congestion.
My plan is to use faucet keys we have and open a head on mainnet. Then stop the node and give it some slot/block from couple of days before and time how much time it takes to re-sync.
Question is would my aws instance survive two cardano-nodes at the same time? Let's see.
Downloading mitrhil db snapshot for mainnet
Fond a command to look for specific string in the output, hopefully it work for timing since we need to see HeadIsOpened eventually

time until websocat ws://localhost:4001 | grep -m 1 "HeadIsOpened"; do : ; done

Perhaps I would need to enlarge my aws instance since mainnet db requires a lot of disk space
I got the cardano node to run on the mainnet with usual quirks related to conway-genesis file and it's hash in the node config
I got the head opened as well but need to connect using websocat in order to be able to measure the time to catch up
I am realizing I need to re-map the ports differently since this aws server already runs one hydra-node and one cardano-node on preview
Now that I can connect with websockat I stole a small script to be able to measure the time until we see HeadIsOpen. This is the block before the head opening https://cexplorer.io/block/9df1b9f9c1357e9fc0960a079ab183fa99c669a3d493aca0af315b0d8f521eee and this is the slot/block hash

, "--start-chain-from", "137342970.9df1b9f9c1357e9fc0960a079ab183fa99c669a3d493aca0af315b0d8f521eee"

I'll try to test it out now with one day difference to see the timing

SB on incremental commits on-chain

I am starting by looking at the spec - perhaps there are some areas that we can already unmark from the red color since we implemented them?
On the second thought we unmarked everything last time once we had the implementation so I'll wait on this one.
Wrote the initial increment mutation that should test that increment happens before the deadline.
Funny that the deposit input is the first one, I was expecting the head input to be always the first one so need to see what is happening.
There is missing tx upper bound for the increment it seems, we need to set it's validity so it becomes invalid if it is equal or greater than the deadline.
Interesting problem - how to come up with a meaningful value for the upper validity slot for increment? We could use the current slot and calculate upper bound like we do for contest. I think this is the correct solution, we don't need to depend on saving the deadline in the local state so that we can use it for increment.

2024-10-10

SB on adding more testing for incremental commits off-chain

I've fixed the test I was working on yesterday (I keep forgetting we don't issue any network messages on commits)

-~~Perhaps we should introduce a network message to signal to other parties that deposits/recovers/increments are happening? Not sure if there is anything related in the spec~~

The spec doesn't mention any network messages on deposit so all good.
Anyway I think we should always try to snapshot whatever we have pending for deposits and the logic on AckSn will discard if something is not valid.
Continuing to write similar tests as the one we had for decommits to gain better understanding and perhaps find some quirks in the head logic.
One thing I found which is different from decommits is that a node can observe increment transaction but not have the txid stored locally since it is not propagated by the network message. Perhaps the commit messages should be propagated over the network
Other thing is we need to try to snapshot any pending deposit on each ReqTx. Similar to decommits, if there is something pending it should go into the next snapshot.

2024-10-10

SN on TxTrace tuning / rework

The fact that genSnapshot is so random and hardly meets our coverage drives me to a different strategy: Generate plausible sequences of snapshots and only pick from them when generating Decrement/Close etc. actions.
Bringing back ProduceSnapshots action, should it contain a list of snapshots or only a single one and have multiple such actions?
Considered to remove snapshots from Model now after use, but this would make the overall test suite very optimistic. For example, we would not keep any snapshots with old versions after a decrement and never test outdated snapshots are rejected.
Realized that I need to improve the precondition of NewSnapshot to ensure it remains a plausible sequence of snapshots.
Turns out we need to keep track of the state version and inferring from knownSnapshots is not good.

At some point a fanout was invalid with a hash mismatch on the delta. Printing the corresponding realWorldModelUTxO shows an interesting TxOut:

    TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraConway) (ShelleyAddress Testnet (ScriptHashObj (ScriptHash "9a81d9534c3eed3e374a921fd06ead4a822018e6a0281ec28423ee67")) (StakeRefBase (ScriptHashObj (ScriptHash "7734a436653455cf24035622e1216c83676446d1088b47d589f81956"))))) (TxOutValueShelleyBased ShelleyBasedEraConway (MaryValue (Coin 10) (MultiAsset (fromList [])))) (TxOutDatumInline BabbageEraOnwardsConway (HashableScriptData "\EOT" (ScriptDataNumber 4))) ReferenceScriptNone

That is, it contains a stake credential! Could this be broken right now? Why don't we test with stake keys in other, more "unit" properties?

I suspect that the fanout generator is also not generic enough?
Could not confirm. Neither stake address nor datum was the issue.

For a fanout to have a mismatching hash, it could also be that the hash in the closed state is wrong. And inded, in the counter example we have a Close, Contest and then Fanout. Looking closer, the Contest was not resulting in a different delta UTxO hash!
The TxTraceSpec is always issuing contest transactions with ContestOutdated redeemers. Why?

Now shrinked this is a very intersting case:

    do action $ NewSnapshot {newSnapshot = ModelSnapshot {version = 0, number = 4, snapshotUTxO = fromList [(A,10),(B,10),(C,10)], decommitUTxO = fromList []}}
       action $ NewSnapshot {newSnapshot = ModelSnapshot {version = 0, number = 5, snapshotUTxO = fromList [(A,10),(B,10)], decommitUTxO = fromList [(C,10)]}}
       action $ Decrement {actor = Alice, snapshot = ModelSnapshot {version = 0, number = 4, snapshotUTxO = fromList [(A,10),(B,10),(C,10)], decommitUTxO = fromList []}}
       action $ Close {actor = Carol, snapshot = ModelSnapshot {version = 0, number = 4, snapshotUTxO = fromList [(A,10),(B,10),(C,10)], decommitUTxO = fromList []}}
       action $ Contest {actor = Bob, snapshot = ModelSnapshot {version = 0, number = 5, snapshotUTxO = fromList [(A,10),(B,10)], decommitUTxO = fromList [(C,10)]}}
       action $ Fanout {utxo = fromList [(A,10),(B,10)], deltaUTxO = fromList [(C,10)]}
       pure ()

Notably, the Close and Contest here are already using outdated snapshots (the Decrement of version 0 already happened). Obviously, the fanout with (C,10) is not allowed here. This is a model error and we should not expect this to succeed.

As we will make our genSnapshot generation more model-driven, we should be able to get rid of some complexity in precondition / validFailingAction.
When checking coverage again, I realize we are not hitting fanout very often. Especially the one without utxo inside, i.e. all utxo was decremented.
Not reducing the values and just decommitting the whole A-C makes us reach fanout with an empty head more likely.
Confused by the fact that when I reduce frequency of generated Decrements to 0, the coverage still reports "has decrements"?

2024-10-09

SB on adding more tests for incremental commits

There is a leftover from the incremental-commits off-chain work which specifies that we need to add more interesting tests for our new commits
As a first step I wanted to separate the existing decommit tests in the BehaviorSpec and do something similar for the commits.
I realize pretty quickly that we send client inputs for decommits but not for commits so perhaps I need to move the tests someplace else?
I think I could just use the mocked chain layer to directly post a deposit, let's see
Oh, ofcourse, you can't post deposit, only increment and recover. Perhaps I can post increment directly but that requires a lot of parameters and I am not too happy with this level of testing.
I implemented couple of missing validations for the new api endpoints
Added a test that on recover we check the head id is correct
The deposit deadline is not properly configured so I wanted to re-use the contestation deadline for the same purpose (and see later if users would like to configure it differently)
To do this I had to pass the environment to the http server
Now the recover tests are failing because of the deadline check which is now set further in the future depending on contestation deadline. They are fixed just by waiting for sufficient period of time to pass aka contestation deadline
Now I would like to actually record the deposit deadline and upon each recover warn the users beforehand that recover would fail unless we wait for deposit deadline to pass.
I am realizing that these changes conflict with the PR I drafted yesterday to remove some local state and now I need to actually re-add the deadline.
The test I am writing is supposed to prove you can do multiple deposits but the issue is, they actually go one by one at the time since snapshots are sequential. Is it safe to increment multiple at once if the state contains multiple UTxO?
Since increment is spending only one deposit output this needs to be sequential for now
When I try to do two sort of simultaniously I get ReqSnCommitNotSettled error which I need to investigate

SB on incremental commits deadline

I was looking through code trying to see how to use the deposit deadline in the best way to avoid stuck Head in case when recover spends the output we want to snapshot.
What I wanted to do when it comes to deposit/recover actions is to make sure the deadline is correctly set (we can probably define it in terms of contestation deadline + some buffer) and then it seems like the most appropriate place to check the deadline would be when we receive the client input.
we could check that in case of any pending deposit we want to recover, it's deadline has passed so that it is safe to do recover. Perhaps the good thing to add is to check that there is no snapshot in flight too and if possible compare the information in the snapshot to make sure the same utxo we want to recover is not in there.
This is a bit hand wavy but what we could do is attach a timestamp to each client input (requires a separate type to wrap ClientInputand add time information). Then in the head logic, when we receive the client input with a timestamp we could check if recover input time is larger than the set deadline and only then try to recover.
What I don't like is the fact that we have to deal with time in the pure HeadLogic but at the same time we would just compare the time we already would receive together with the client input.
What are the alternatives to this? I don't see many unless we make recover to be received just by HTTP call and keep the deadline around somehow so we can reject the wrong recovers immediately.

2024-10-08

SB on withdrawal commit bug

After drafting a nice test yesterday with SN I added on top the piece of code I already had to make everything green
PR review suggested some code changes which are nice so I did those but there weren't any major objections into merging this
I sent the docker image to Micrograx to test it out, hopefully it all works out so we can move on.

SB on remove deposit script UTxO from the local state

I continued working on this today, progressing to remove any local UTxO state we actually don't need.
Now it is time to use the spendableUTxO and find the correct UTxO to spend in increment/recover in there.
Seems like using the isScriptOutput is the easiest way to track down the UTxO I need to spend
After making everything to compile I don't see e2e tests green as I suspected. I think last time I looked at this I suspected that adjustUTxO is doing a bad job when it comes to these new transactions since they are related to L1 so we need to preserve UTxO as they were without adjusting. I'll test my hypothesis real quick.
I was wrong about this, both increment and recover can find the deposit output but recover fails to post with D05 - incorrect deposit hash, so debugging this rn.
In the end it was just a condition in HeadLogic that was wrong, everything works now so it is time to wrap things up and make all tests green again.
One note: I need to think about the deadline since we should incorporate this information we have to perhaps reject recovers that are bound to fail because of the deadline.

2024-10-07

SB on remove deposit script utxo from the local state

This is a follow up from the work on the incremental commits off-chain.
I kept a local map with the deposit UTxO I need to spend in the recover but this should not be needed since we should have all observed UTxO in the chain state already.

SB on withdrawal commit bug

Looking at this user issue again. I noticed the rewarding redeemers are missing so I added them to the commit tx and I also see them after the balancing so this change should be correct but the error remains the same one I saw before WithdrawalsNotInRewardsCERTS
If I just try to naively submit the user transaction the error is the same so perhaps the user transaction is invalid (it is missing the outputs anyway) but I am not sure if no outputs are causing this.
I also created cardano-cli test using the same commands as user did in the issue description to see if it leads to the same result and it acually did.
Looking at the ledger code it seems that there is a check to make sure any rewarding is matched with a proper certificate. The user transaction is missing certificates so I could perhaps try to add them programmaticaly and see if that helps?
The proper thing to do is to construct a valid tx that does rewarding and make sure these tests are green.
I am pretty sure on hydra end everything should be ok once I add the non-spending redeemers but need to confirm with the user if what they are sending is actually ok and should work since I can't submit a user tx without any error.

SB on incremental commits off-chain

We looked at pr this morning together with the team and decided to merge it (yay!) and identify next steps in order to get it even more in shape
I discovered some weird stuff in the api schema so had to fix that and also add a changelog entry to remind us not to release this version by any means since it contains security holes big as... well major flaws with the validators returning true.
Now I can move on with my life and have future small PR to fix all I wanted to fix in there.

SB on reproducing the withdraw redeemers bug in blueprint tx

I finalized incremental commits off-chain branch and need to look at the user reported bug where our blueprint transaction doesn't allow withdraw redeemers.
I have the request json, script used to construct the withdraw script output so I'll start by first inspecting the request UTxO
There was already a test I drafted in a branch so looking to improve this test to detect possible bug we have
Seems like mocking the entire thing is time consuming, I'll just slap around quick e2e test since I have the request body available.
Ok so this is the error I see when using the user request:

  SubmitTxValidationError (TxValidationErrorInCardanoMode (ShelleyTxValidationError ShelleyBasedEraConway (ApplyTxError (ConwayUtxowFailure (MissingRedeemers [(ConwayRewarding (AsItem {unAsItem = RewardAccount {raNetwork = Testnet, raCredential = ScriptHashObj (ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")}}),ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")]) :| [ConwayUtxowFailure (UtxoFailure (UtxosFailure (CollectErrors [NoRedeemer (ConwayRewarding (AsItem {unAsItem = RewardAccount {raNetwork = Testnet, raCredential = ScriptHashObj (ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")}})),BadTranslation (BabbageContextError (AlonzoContextError (TranslationLogicMissingInput (TxIn (TxId {unTxId = SafeHash "0b95bec9255a426435e6aa7ff9749d76ee4c6d815acb47f10142db889b0edce4"}) (TxIx {unTxIx = 3})))))]))),ConwayUtxowFailure (UtxoFailure (ValueNotConservedUTxO (MaryValue (Coin 21466323) (MultiAsset (fromList [(PolicyID {policyID = ScriptHash "8dc5d80bb963464a696850684f783ea8b7dacd1cea9b38f1425b7716"},fromList [("f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d",1)])]))) (MaryValue (Coin 23466323) (MultiAsset (fromList [(PolicyID {policyID = ScriptHash "8dc5d80bb963464a696850684f783ea8b7dacd1cea9b38f1425b7716"},fromList [("f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d",1)])]))))),ConwayUtxowFailure (UtxoFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash "0b95bec9255a426435e6aa7ff9749d76ee4c6d815acb47f10142db889b0edce4"}) (TxIx {unTxIx = 3})]))),ConwayCertsFailure (WithdrawalsNotInRewardsCERTS (fromList [(RewardAccount {raNetwork = Testnet, raCredential = ScriptHashObj (ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")},Coin 0)]))]))))

notably WithdrawalsNotInRewardsCERTS is pointing me that this could be correct reproduction even if utxo is perhaps invalid.

Consulting the ledger code I see that this error means Withdrawals that are missing or do not withdrawal the entire amount so hopefully once I fix the missing withdrawal redeemer I can say the bug is fixed.
I need to figure out where is the problem - is it in the commit tx building or in our wallet while balancing.
Ok seems like the problem is in the commit tx building where we try to re-map spending redeemers since we want to add one (initial) input.
Seems like I could just record whatever rewarding redeemers where there and add them on top after the spending redeemers are re-mapped.
It worked in the sense I can get passed the error I saw above and now I get the ScriptFailedInWallet which probably means that the wallet code is missing to properly adjust the reward redeemers too.
Nice findings - I would like to have a general function that re-maps all of the redeemers so we are covered for any case there can be in the future.

2024-10-03

SB on incremental commits off-chain

Did some code review to kick off the day
Majority of work is done already and now it is the time to cleanup and make all of the tests green.
It turns out all tests are green a part from the logging/api schema validation - the most annoying tests to make pass
I realized that I am missing to finish up the tutorial also so this is something to do first and then fix the tests
Of course I found a bug by doing a tutorial! We need to do this more often. I discovered that we always serve past deposits in a list since the projection is never deleting any. Fixing this and adding a test to make sure it works.
Now all that is left is to tackle the schema errors it seems since the tutorial is updated.

2024-10-02

SB on incremental commits off-chain

Yesterday I created the test case that will drive the implementation of blueprint tx in the deposit construction.
I noticed the hydra-tx is not compiling because of some changes in the observations so I'll make that compile somehow. Not sure if we even want this executable since it is not serving it's purpose really. If you managed to draft deposit/recover using this executable then the tx is not balanced so the UX is not good at all.
We looked at some minor problems with Noon to solve the new endpoint that lists the deposits so that thing is done (still missing to document it in our api schema)
Wrote something about incremental commits in the monthly report
Continuing with the blueprint support for the incremental commits - I should look at how I did commit before and do similar stuff in here too.
Adding the blueprint support for deposits was done smoothly, main problems were finding the types between cardnano-api and ledger-api but overall nice - the tests are green.
Looking at the PR comments and it seems the biggest part is to change TxIn to TxId for recovering so maybe I get that out of the way before starting the final cleanup
Ok, this proved to be easy enough. I was wondering if it is safe to assume that any deposit TxId can be turned into TxIn by always using the zero index?
Now for the cleanup

September 2024

2024-09-30

SB on incremental commits off-chain

Before diving in to make the recover work with head logic instead of drafting directly from the api code I wanted to fix the observation tests first now that we think our observations are correct.
This made me need to return more information when it comes to increment transaction since the head UTxO is not enough to observe this transaction.
Now it is the question if I should make all the tests green again or do that after recover is implemented differently. Since nobody will look into this soonish I'll just make the recover changes first and then make all things green again since I don't want to do any duplicated work.
I am not sure why I though this step would have a lot of code changes. I propagated all the changes and now see again D04 which is the deadline validator check.
I set the recover deadline to whatever was set as a deadline in deposit with a caveat that I do convert to UTCTime using posixToUTCTime so perhaps the conversion is wrong or I just need to wait in the e2e test?
It would also be nice if the deadline is expressed in slots since that way I could emit errors even from the HeadLogic.
Let me summarize the situation right now: deposit deadline is set in the api code (users do not influence it at all), recover deadline is set to whatever we already have in the state as our deposit deadline for certain TxIn (we keep the map of TxIn containing additional data), so obviously recover deadline is the same as the deposit deadline so the validator check is triggered wrong. To solve this I could pass either expect the users to give me a deadline (not very good UX), I could create new deadline as soon as the recover endpoint is hit (more feasible), I could also look at the existing deadline and make sure it is increased in recover (can lead to tx rejection by the cardano-node if the slot is in the future)
If I just add some time to the saved deposit deadline the test passes so I realize that I need a way of providing the recover deadline and my plan is to take the currentSlot directly from HeadLogic upon receiving the recover request and then convert the slot to UTCTime in the handlers upon constructing the recover transaction
This seems like a correct thing to do here - construct a deadline from the current slot.
Ok, the test is green, moving on.
I need another endpoint to list the pending deposits - this is very easy to implement now that we record the pending deposits in the local state. I could draft a test and have one of the guys implement this to help me out while I continue with the rest of the changes needed for the off-chain part of the commits (like suport blueprint transactions)
Adding support for the blueprint committing feels like important and also most complex part of the work that is left.

2024-09-27

SB on incremental commits off-chain

Ok, so today I'll get the observations in place correctly before I move on to fixing the recover workflow so that it goes through the Head logic completely instead of drafting the recover directly from the api code.
Removing the depositScriptUTxO from IncrementTx . I find it a bit hard to envision what exactly do I need in order to build the increment transaction successfully. If I don't have the correct UTxO for deposit script then I need to find it in the local UTxO state (since this is where UTxO ends up if it's being observed) but then I need to pick from this deposit UTxO (potentially many outputs) the one I am actually trying to deposit.
Feels natural to include TxIn in the DepositObservation so that I know exactly which deposit I observed so that I can use the same TxIn to find the proper UTxO to increment.
I think I see the whole workflow in my head but it will take some sweet compilation time to get to a place where everything compiles again and works. But on the other hand I can say that the deposits/increments are done correctly then and I can focus on fixing the recover workflow. Let's dig in!
Lost a lot of time looking into why deposit is not observed after my changes only to realize I removed the constructor from observeHeadTx 🤦
The condition to make sure deposit datum contains correct information was wrong. The observations tests didn't seem to care and they were green but the e2e tests proved we are not observing the deposit tx correctly.
My focus today is just bad. If all observations and e2e work I will not go into digging why did these changes make sense since I need to move faster.
I decided to keep depositScriptUTxO in the deposit observation since I didn't find the convenient way to find the deposit script UTxO to spend in the local UTxO. It seems a bit weird since If I observed the deposit this means that it should be part of spendableUTxO no?
Instead I propagate the needed values (so deadline, depositTxIn etc) so that OnDepositTx can bring them in to the HeadLogic.
Now I needed to extend the IsTx typeclass with the new associated type TxInType since I want to keep Map TxIn (UTxO, UTxO, UTCTime) locally where first utxo is the one you wanted to deposit, second one is the script utxo that is to be spent in the increment and the map key is there so we can have multiple deposits around. Deadline is just the deposit deadline, we might need that one as well if nothing else just to show the information when it is ok to recover the funds.
Ok, after tackling all related code upon adding this new associated type I am moving through the head logic with the new �PendingDeposit alias that should hold all information we need to do increment/recover later on.
I propagated changes through the head logic and now would like to use TxIn to find the deposits for increment/recover
I also propagated TxIn in Increment and OnIncrementTx to be able do easier find the deposit I want to remove.
Next step is to start with the api changes in recover (do not call handler code from the api but go though the head logic) and that will reveal easier way to delete recovered deposits by using the new available TxIn.

2024-09-26

SN on etcd network component

When handling the error of etcdctl put the question arises: what if broadcast fails? How should the HeadLogic react on this?
Not allow broadcast to fail and buffer not succesfully sent things instead.
For some reason withProcessTerm now correctly stops etcd .. well, fine.
Difference between resilience against minority cluster writes and crash-recovery is really minimal. i.e. TBQueue vs. a "persistent queue"
Storing the last known revision in a file seems to be sufficient to avoid re-delivering already seen messages. But do we need this? Our application should be fairly resilient against duplication of messages. On the other hand, writing a file on a best-effort basis should not be too much of a problem (this is not write ahead!)
With these latest changes temporary package losses of up to 100 percent are handled just fine! The only reason why benchmarks would stop is because of the timeout of confirmations (20 seconds), but if the package losses are intermittent it's just fine
Using pumba, we can also randomly select nodes which perceive a 100 percent package loss every couple of seconds.

SB on incremental commits off-chain

Continuing the work after yesterday's successful demo
The problem I had yesterday with recovers not working is related to the fact that you can't recover from any node (currently). Only the node that crafted the original deposit has in it's state the needed utxo that needs to match with the recover.
There are many changes that need my attention but I'll focus on the observations first
I moved the recover observation to hydra-tx package and removed deposit and recover from tests that make sure head transactions are observed. Now they have their dedicated tests
I got the recover observations working but experience ScriptErrorRedeemerPointsToUnknownScriptHash for index 0 which is unexpected here
Turns out this is just a consequence of D04 which is the script deadline error
I made sure to resolve fixme in the deposit observation which mentions that we need to check that the deposit datum contains exactly the deposit transaction inputs
Removed utxo field from the DepositObservation since it is not needed (deposit TxIn will be convenient here tho) and propagated changes
This broke the increment creation since we used this deposit scipt utxo to easily find the deposit we want to spend. Should be simple to fix this.
Next work will be probably to track multiple observed deposits in the local state and then just pick one existing one on recover

2024-09-24

SB on withdraw redeemer fix

We got the new user issue where they say that the withdraw redeemers are stripped once the wallet balances the transaction.
I want to write a test to exercise this theory and I think I could just use the WalletSpec to create a transaction with some withdraw redeemers, then call coverFee and then take a look at the redeemers and if they stayed the same.
Upon balancing we adjust the spending redeemers but the others should be intact.
I stubbed out the initial test case and noticed that the tx generator doesn't actually produce any redeemers so this is the next step to do.

SB on incremental commits review

Main takeaways:
- changelog update missing
- inline api docs on post draft commit
- Deposit should work with a blueprint tx!
I continued working on the branch: simplifying the api code to re-use existing types we had in place and making the recover use the path instead of query string

2024-09-23

SB on incremental commits off-chain

Wrapping things up so we can have a review soon.
I also want to try to decommit from the tui today with our fresh Head
Need to address user issue and try to reproduce the issue mentioned with cardano-cli (about the datum hash mismatch)
I had a problem where I couldn't ssh into my aws account and it turns out if you have too many ssh keys this sometimes happens. As a workaround I had to provide additional flag -o "IdentitiesOnly yes"
Continuing to fix the red tests - it seems like the logging tests are especially a PITA since it is a bit hard to figure out why we get failures sometimes.

SB on blueprint drops withdraw redeemers

We got another user issue related to our blueprint transctions which mentiones that if you try to commit to a Head using blueprint transaction the withdraw redeemers are dropped from the transaction.
Need to read upon how withdraw zero trick exactly works but so far I know that you can reduce execution costs if you specify the script purpose to be rewarding instead of spending.
I'll run the demo just so I can get to an open Head and try to use the request body in the issue to reproduce the issue on my end.

2024-09-20

SB on incremental commits off-chain

Yesterday evening after work I realized what could be the problem with the e2e test for the recover. As I mention we try to balance the tx twice, I should only return the transaction from the handlers and then make the hydra-node post in on client input.
Ok, I was right and now I got the recover e2e green! Yee-haw!
I'll include the recover workflow also as part of this pr which makes me happy.
Question arizes: should I also observe the recover transaction and then have a way for a cleanup in the local hydra state? I think so since we can't rely to do a cleanup on posting only.
For sure, even our protocol.md diagram says so.
I'll use the opportunity to clean up unused variables etc. while I add the recover observation.
One thing is: how to get the head id from recover transaction? Currently there seems to be no way which means we would need to check the recovered UTxO against whatever we have in the state related to deposits and if the two UTxO match then we could say recover is approved/finalized.
I ended up looking for head id in the resolved recover inputs, in the deposit datum and this worked nicely.
Tests are green and now it is time for a major cleanup.

2024-09-19

SB on incremental commits off-chain

I almost came to the end of the recover implementation but got uncaught exception: JSONException which was surprising.
Debugging a bit to see where this happens and it seems sometimes on the api level we get this error but I don't see why exactly would this be happening.
Realizing I've changed what I return from the recover endpoint - it is no longer the transaction but just the string OK. Now the recover transaction is posted by the hydra-node and not the user in the end.
After making sure all pieces are in the correct place so that I can recover a deposit I get ErrNoFuelUTxOFound error when trying to post recover transaction.
Very weird since my wallet UTxO has plenty of ada.
I found out that this is happening in mkChange function that is supposed to calculate the change and error out in case inputs are lower or equal to the outputs but there is a fat fixme in there that says we need to account for minimum UTxO value too.
Realizing this is just the outcome of some other problem I have (most probably in the Handlers when we construct and balance the recover transaction)
Recover is trying to spend the deposit output and produce as outputs whatever was deposited (outputs present in the deposit datum)
I need to think more deeply about the recover tx and how does it play with the internal wallet since I need to check if all inputs are properly resolved and how come there is ErrNotEnoughFunds error coming out of the wallet
It is time to gain some sanity again I think. One question is how come I didn't see this error before when working with the recover transaction? The answer might be that I prevously passed in the UTxO user wants to commit to the draftRecoverTx and now I rely on the datum in the deposit output.
In the wallet code it seems that the input balance is lower than the output balance by around 6 ada:

> 24687860 - 24681788
6,072.0000000000

What am I not accounting for here?

Could it be that the UTxO's are duplicated and I need to remove the deposit and commit UTxO from spendable UTxO before giving the tx to the wallet for balancing?
Good thing is that the amount seems to be always the same!
I tried several things and I think the bottom line is that we are balancing the recover transaction and then post it which should be doing the balance all over again. In my mind this operation should be idepontent but perhaps it is not?
If I just try to submit the recover transaction directly after assembling it I don't see any errors but I fail to query the wallet address to make sure the recover worked.
I really feel under the weather right now so I'll leave dealing with recovers for tomorrow or next time while I try to make the rest of the codebase into some revievable state for the team to look at.

2024-09-18

SB on incremental commits off-chain

We did a nice session this morning to see the current state of the PR and discover the next steps
These are the steps I wanted to outline that would be needed for the users to test out this new feature:

     ### Prepare the UTxO to deposit

     - How to make the deadline configurable for our internal purposes like testing?

     ### How to issue a http request to kick off the deposit process

     - Can we copy/paste the json utxo representation directly from one of the explorers?


     ### Sign and submit the deposit transaction

     - cardano-cli example most probably

     ### Query the Head to see the deposited UTxO present on L2

     - Issue `GetUTxO` call to view the L2 UTxO state


     ## How to recover deposited funds


     ### Follow the deposit steps

     - How to demonstrate recover if hydra-node will pick up the deposit as soon as it is observed?

     - Play around with the network so one participant doesn't see the new snapshot request?

     ### Once we make sure deposit is not observed we can issue a recover request

     - How to make sure user keeps the committed UTxO representation around so that they are able to
     issue the api request with the same json?

     - They also need to provide the txin which complicates things? How could we make it more simple, could they query the hydra-node
     to get this information?

Plan right now is to get the PR into shape as much as possible and outline in the PR what is outstanding which will need to be implemented as series of next PRs.
I have a good understanding where the main problems are where are we at currently so this should not be an issue at all.
As a concrete next step I'll try to make the recover e2e test pass by shutting down one of the nodes to that the increment is not posted so that I can exercise the recover workflow nicely (right now the recover tx get's rejected by the cardano-node because of the lower validity bound).
If I don't get it to work in a reasonable time I will leave the complete recover feature to be something we think about and implement as one of the next steps since in the end we do need to think more about the recover and the optimal way of doing it.
Even before any of this I should not expect the users to provide the deposit script UTxO (actually any UTxO since we should be able to have it around in our chain state) and take in TxId instead of TxIn at the recover endpoint. Also use the path instead of query string (just remembering all of the stuff we talked about this morning).
I removed the undeeded UTxOs from the recover endpoint request and altered the e2e test to spin up two nodes, observe Head opened and commit approved and then shut down one of the nodes in order to be able to do recover instead of observing the increment.
Looking at the recover code - so you need to know the deadline to be able to construct the deposit datum so that one can spend the deposit output but how can you know the deadline if that is something hydra-node decides on?
The e2e recover tests are failing with the validator error related to invalid commit hash so currently debugging the UTxOs to see where is the discrepancy (commit utxo I am sending to hydra-node api and the local utxo we get from observing head transactions)
Realizing I just need to decode the datum of the deposit script output in order to get all parameters I need for the recover.

2024-09-17

SB on incremental commits off-chain

I looked at the increment observation tests and it turns out they are flaky. So not all of them fail but sometimes we experience some weird behavior and I decided to leave this problem for later on since the e2e is green and this should be a problem somewhere in how we wire things together (increment depends on deposit output also)
The most relevant next chunk of work should be connecting the recover endpoint to the handlers.
I slowly progressed one thing following another to make the e2e test that will prove that recover transaction worked.
This took some time - I decided to aslo encode the query string (TxIn) as base 16 string since the hash in the rendered transaction input doesn't play nice with query strings.
Now I finally get the unknown input error on the recover transaction so progress!
After making sure I use the correct UTxO for recover request (utxoFromTx depositTx) I am getting ErrScriptExecutionFailed D04 which is related to deposit deadline. Yeee-haw our validator is kicking in. I'll make sure to pass in the proper deadline. How to make the users life easier in terms of these recover inputs they need to provide?
I commented out the validator check for the dealine to see if the locked commits are matching and I'll worry later on about the deadline. Oh nice...I get the D05 so the locked commits are also not valid. Debug time!
Looking at the issue again I realize users don't need to provide the deadline at all so removing the field from the request type.
If I just remove true from the recover validator I am getting the NoFuelUTxOFound error. I think we modeled the recover as if we will just run it as a standalone cli tool but to have hydra-node wallet balance these transactions we need more. What I think will work is to provide the UTxO which is locked at the deposit script so that we can satisfy the validator. Then we would also need the deposit script UTxO since we need to spend it so it must be given to the hydra-node wallet.
Success. Validator doesn't run currently but at least our internal wallet is happy. Now turning on the validator checks so that I can construct a proper valid request and see the recover working. Feeling pretty close to success.
Ok, adding the appropriate slot lower bound made the deadline check pass. I need to make sure these values make sense.
The check for locked outputs is naturally ok once I've provided the correct UTxO to the recover tx so that's satisfying.
As the final step for this test let's make sure the deposited UTxO is returned back to the user and is not in the Head.
For some reason the test is hanging - can the user actually submit the tx node created?
I added the recover client input since I we need the hydra-node signature on the recover tx so it is easier if the node just does it.
Adding the RecoverApproved message so users can have some sort of insight in what is happening inside of Head.
I feel burned out a bit, there are a LOT of changes needed.

2024-09-17

SN on using etcd for networking

After fixing the Authenticate network module to not drop messages, I ran into an issue where offline mode would not open the head when configured with other --hydra-verification-key parties.
Fixing manually run offline mode nodes to communicate
- Need to simulate other parties' OnCommitTx
- Still seeing MessageDropped?
- Disabled signature verification for now
- Sending the ReqSn now seems not to arrive on the other end. Use distinct keys per node (again).
- Seeing InvalidMultisignature now:
BTW: at first the InvalidMultisignature content was tripping me now because it only contains a single vkeys when one signature fails
- Each node is seeing the other node's snapshot signature in AckSn as invalid. I suspect something's wrong with offline-mode.
- Re-generating keys did not help.
- Found it: the headId in the snapshot is offline-alice and offline-bob so we can't have networked offline heads because of that.
Side note: Is cabal bench hydra-cluster single mode broken?
Lots of our tests depend on PeerConnected messages. How would we deal with "connectivity" information in case of etcd? It's not really that we are connected to node x or y, but rather whether we are connected to the majority or not.
Etcd config initial-cluster requires a consistent list of peers across all instances. Right now I derive it from the given --host, --port and --peer cli options, but this is very fragile.
Problem: submitting messages is not possible while we are not connected to the majority or leader election is ongoing?
After giving etcd a bit time in the beginning the benchmarks work reliably. However, since I switched to watch it got slower? Maybe using the command line parsing is not the fastest way to do this (and it depends on the internal loop of etcdctl watch).

2024-09-16

SB on incremental commits off-chain

Continuing where I left off yesterday - see how to fix the increment tx posting after changes in the deposit observation.
It seems like I need to re-gain some sanity here so let's break down things.
I made changes to the deposit observation so now what we return from the observation is not just the deposit UTxO but also what we locked inside of the deposit script so the code around observing deposits/ posting the increment needed to change to accomodate for these changes.
Now that e2e is green again and I also added the GetUTxO call in the end to assert deposited funds ended up in the head on L2 it is time to consolidate and try to get somebody involved with this work since there are too many things in scope and we want to get this work merged asap.
One area in which I need to grow is to learn to ask for help! Right now I think the fastest thing to do is keep coding since I know I will lose time explaining what needs to be done but at the same time this puts the pressure on me to deliver all in time.
So what is outstanding is to implement recover e2e, implement api endpoint to list pending deposits, remove increment parts (observation mainly) but leave the transaction construction since we want to keep uninplemented validators to return False if we want to merge this thing. Also write a tutorial explaining the steps to deposit/recover.
I've looked at the increment observation tests and it seems like we will need to find a way to alter these tests so that we can return some UTxO that is working nicely with increment (needs to contain both head and deposit scripts). Main thing I see is we can't find the deposit datum in the observed UTxO

2024-09-15

SB on incremental commits off-chain

After changing the deposit observation (reminder: still need to implement increment observation tests ) the e2e test is broken now since what I return from the observation is input UTxO locked in the deposit script not the actual UTxO we saw coming in from L1 so there is some work needed in order to get this right but nothing too hard to do.

2024-09-14

SB on incremental commits off-chain

As I realized before the observations are not done correctly confirmed by SN. They should be pretty similar to the commit ones since they are even using the same type Commit.
I've made sure to implement the deposit observation in the same manner as commit one (since they are pretty much the same) but didn't have the strength to continue further (too tired) so I'll make it work on monday.

2024-09-12

SB on incremental commits off-chain

After adding the increment mutation I want to make sure that the increment tx is valid as a starting point.
Did a small change to provide the correct UTxO as the argument to incrementTx. What I need here is the output from a deposit tx. Right now I get:

         Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Left (ScriptErrorEvaluationFailed (CekError An error has occurred:
         The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["PT5"])),(ScriptWitnessIndexTxIn 1,Left (ScriptErrorMissingScript (ScriptWitnessIndexTxIn 1) (ResolvablePointers ShelleyBasedEraConway (fromList [(ConwaySpending (AsIx {unAsIx = 0}),(ConwaySpending (AsItem {unAsItem = TxIn (TxId {unTxId = SafeHash "913afe731ffea84a50b0b

so it seems like I didn't attach correctly a deposit script appropriately (index 1 here) and the head script (index 0 here) failed with PT5 (which is expected since validator returns false).

After returning True from a increment validator I can see that the head script got attached properly while the deposit is still missing (need to look at the deposit datum which is missing)
Found out this was because of two issues:
- I used mkScriptReference and we are not referencing the deposit script so it should be mkScriptWitness
- I was attaching the headScript instead of deposit script derp
I quickly added a test to make sure we don't observe deposits from a different head
Now I am able to connect the deposit snapshot signing and posting of the increment tx as a result of it. Posting from head logic and connecting the handlers to be able to actually post the increment tx.
Wrote some Handler code to be able to post increment on deposit observation. Also added a type for observing the increment. We need this so that we can say the commit is finalized once the increment is observed.
I can try the e2e workflow now to make sure everything works up to observing the increment. It seems that deposit was not observed back after hitting the api so I investigate.
Turns out that after balancing we end up with a deposit that has two inputs and two outputs. One of these belongs to the hydra-node wallet as it's the one responsible to pay the fees to we need it's input and also we end up with the output that holds the change from balancing the deposit tx.
Debugging to see why we don't observe back the deposit. So it turns out we need a utxoFromTx on a deposit tx observation in order to be able to find the correct script output.
What I notice is that we never issue AckSn but that is not so important right now.
Figured it out: I shouldn't include the the deposit tx in localTxs on ReqSn. This got the head stuck by waiting to apply the deposit transaction which is never applicable. (perhaps I am misunderstanding something since my explanation is fragile)
We get to the next error which happens on posting the actual increment:

ConwayUtxowFailure (NotAllowedSupplementalDatums (fromList [SafeHash \"3135e9c31e91873b8b3a7d2922038e934785abd68401589bc67f40591af137c3\"

This happened because the increment tx construction was wrong and I should have used InlineScriptDatum when constructing the deposit witness.
Progressing further since now I get to see the CommitApproved message and all that is left for the e2e to work is the increment observation! Nice, I am happy things are progressing nice and fast although there is really a lot of changes needed.
Adding the increment observation now.
After needed changes I get to observe increment transaction but the committedTxOuts are not matching with the local deposit utxo. This should be easy to fix.
I've decided to return the UTxO directly from the observaton and compare it to the local one.
BAM! e2e is green now!

2024-09-11

SN on Raft consensus as network

Need to change semantics of broadcast as it should only "deliver" once the majority of the network saw a message. Going to give the callback a name too.
While NetworkCallback{deliver} makes things more consistent with literature and easier to read in the components, its also more verbose. Need to ask the team about this trade-off.
When disabling the broadcast shortcut, the NodeSpec tests failed first, but only because it was asserting on additional message being submitted to ourselves. The mocked, recording network is not short-cutting and does not need to, so simply adding some more received messages and asserting less is fine here.
The BehaviorSpec tests were eating the whole memory until I removed the filter for own party in the broadcast implementation (to follow the expectation). Probably some busy loop wait for a message which never came.
After fixing the ModelSpec to also broadcast to self (in createMockNetwork), I wondered.. which test is actually covering that short-cutting (or non-existence of it)?
Turns out only hydra-cluster end-to-end tests are testing this integration. Anyhow, making the lowest level network component (Hydra.Network.Ouroboros) self-deliver after enqueuing a fire-forgt should resolve the previous behavior.
Not seeing the network layer operational right now (hard to tell through only hydra-cluster tests). I suspect it's because the network stack is filtering when seing a message delivered at the lowest layer.

SB on incremental commits off-chain

Ok so after adding a field to the Snapshot that would hold the information on what will be committed to a Head it is time to move further and make the HeadLogic correct upon observing a deposit transaction.
Looking at the issue it seems that deposit observation was a separate task but I did it already so I updated the issue.
Added some code to wait on unresolved decommit before issuing a ReqSn on deposit observation.
Looking into the spec to figure out if there is any validation needed before including a depositted UTxO into a Head. Seems like there is no validation necessary which makes sense. If you managed to lock a utxo with a deposit tranasaction then this UTxO can't be spent twice so it is safe to include it in the Head UTxO.
I added check upon observing ReqSn where I need to make sure there is no other de/commit in flight before including the deposit UTxO into an active UTxO set and asking for a snapshot signature.
Also added the commitUTxO to the local CoordinatedHeadState to keep track of it.
Seems like the time has come to actually construct the increment transaction from the observed deposit (and recover also). Perhaps it is better to make sure all tests are green first and implement recover workflow. The increment transaction is most interesting work but we need to build everything else around it since this would be the last step in this PR.
Made some logging tests green and now looking into failing deposit observations in hydra-chain-observer and hydra-node. Basic observation seems to work but plugging in the deposit generation + observation seems to fail.
Fixing this required me to pass in also the utxoFromTx so that the deposit can be observed since head UTxO doesn't contain the necessary datum as these transactions are not tied to the Head ones.
Making almost all green (a part from one test in HTTPServer) paved the way directly into increment transaction construction and observation (I could do the observation in the separate PR).
Wondering if we merge this (incoplete) branch to master are we harming the regular hydra-node operation in any way? I'd assume not since users should only rely on the released versions anyway.
Detecting a place in code in the HeadLogic where the posting of the increment transaction should be triggered - most probably on snapshot ackgnowledgement.
I was right, we can plug the increment posting easily on AckSn. Let's build the transaction first and then think about the plugNplay :)
Doing this soon I realize we will need to add the Deposit to the script registry. BUT on the other hand I have the deposit UTxO so I can find the deposit input.

2024-09-09

SB on incremental commits off-chain

I tried running the end-to-end test for the incremental commits and get PPViewHashesDoNotMath which is surprising.
Seems like our wallet code is assuming we are always having scripts in the Head transactions but deposit is different. It produces the script output but it doesn't run any scripts.
Tracing some code in the Hydra wallet module to figure out why we can't have a transactions without any scripts.
I did a conditional attaching of script integrity hashes with scriptIntegrityHashTxL and notice we also do recomputeIntegrityHash after a tx construction again.
In case this hash is SNothing I just return already constructed tx body so hopefully if there are no scripts the wallet would still be able to construct the correct transaction.
If I just remove recomputeIntegrityHash I got success on the deposit but the question is if we actually need this function for other transactions. Why did we need this in the first place?
Decommit works just fine without recomputeIntegrityHash let's see about the other transactions but they should also be fine I think.
So it turns out that recomputeIntegrityHash was a leftover that needed to be removed and all of the sudden the deposit tx is valid again. Nice!
Moving on to the observation code now.
Writing simple observation didn't seem hard at all. Let; see if it actually works.
It worked after some tweaks - we look at the input UTxO and don't try to resolve the inputs as we are only interested if we can decode the datum to the correct one. Should we look at the deadline here too?
Moving into the changes in the HeadLogic now. I need to add some code to request a snapshot once a deposit tx is observed.
I stubbed out a function to be called on observed Deposit, added new field to hold commit utxo data in the ReqSn
Added utxoToCommit to the Snapshot and alphaUTxOHash to the Close datum. This required me to build a lot of modules. Tired now, continuing tomorrow.

SB on incremental commits

Moving forward, I got the hydra-tx changes so I want to see the deposit end-to-end workflow working up to the API level.
Connected the hydra-tx/depositTx to our API and now doing the same for recoverTx
Ok, the recover is connected too. I am itching to try out if deposit could be imported into a wallet but I need to work further :)

2024-09-06

SB on missing datum hashes in the hydra txs

Continuing where I left off - making the script output that contains the datum hash so to prove that we either have a problem (most likely) or hydra decommit/fanout work as expected.
Realized that the createOutputAtAddress is just providing a pub key witness (using makeShelleyKeyWitness). Somehow I need to also add the ScriptWitness but not sure exactly how in this setup and why is this needed now? Shouldn't I add the witness when trying to spend? ... Ok yeah. The error is coming from a hydra-node that tries to spend this script output so makes sense.
It is actually funny since I was the one implementing the commit feature. I need to send a transaction that contains the script witness for the script I am trying to spend :)
Giving it a show and the test is green! hmmm.....

SB on implement deposit/recover validators and off-chain code

Failing test related to snapshot signature verification. This is surprising since I didn't touch the related code.
Trying out with removing the validation of some snapshot fields to see which field trips the validation.
It is the utxoToDecommit field that causes this.
I can get the valid signature if I make sure that I also produce hash for mempty in case this field is not present and I don't see that any code changed at all which is surprising.
It has to be something related to hashing of the empty UTxO by using serializeData $ toBuiltinData mempty. Previously we had Maybe UTxO as the argument and now we have Maybe Hash but I am not sure if I should spend time exploring this.
Making the signature accept a Hash instead of Maybe Hash and fixing all the call places. This should introduce changes to our on-chain code mostly while we would still keep Maybe UTxO in the Snapshot.
This yields some changes I would need to think about :

case
 contestRedeemer of
   Head.ContestCurrent{} -> -- TODO: think of the consequences of this
    toBuiltin $ hashUTxO @Tx $ fromMaybe mempty utxoToDecommit
   _ -> toBuiltin $ hashUTxO @Tx $ fromMaybe mempty utxoToDecommit

so the check isNothing deltaUTxOHash should be replaced with the check for empty hash hashTxOuts mempty.
Close mutations are affected by these changes. H30 is happening now in the mutations for CloseUsed
Turns out in the CloseOutdated redeemer we need to make sure that utxoToDecommit is not present. The CloseOutdated means we are closing while the decommit is pending but I am not entirely sure why then utxoToDecommit needs to be Nothing?
This highlights the fact that this knowledge needs to be captured somewhere (it was in my head). Most likely spec has the answer to this - checking.
The spec uses different terminolgy Initial, UnusedDec, UsedDec so UnusedDec reminds me of our CloseOutdated but this is not the case where we expect utxoToDecommit to be bottom! It is actually the filename of the mutation test that points to the correct redeemer on-chain!
We need to make the connection between mutations and the spec/on-chain code obvious to avoid spending much time on this in the future and make it obvious which is which!
Working on incremental commits will hightlight this since we will need to add more close redeemers for the close!

2024-09-05

SB on incremental decommits

I realized I called deposit drafting draftIncrementalCommitTx so renaming that and connecting the chain handle to our API server.
Next step would be to actually construct and serve the deposit transaction but since I still don't have the code to do it I will move to the observation.
I will not be able to complete the OnDepositTx observation off course but I can hook things up.
After adding OnDepositTx I had to add a corresponding type to ChainTransition but should this really be here? Stubbed out the observation code with undefined's and errors here and there
Stubbed out the on-chain increment validator too and now since I got the green light on the PR I'll start using the new code and finally kick off incremental commits for real!

2024-09-04

SB on incremental commits off-chain

Continuing with the off-chain changes for incremental commits
Separating the recover flow to use REST DELETE so POST /commit route will be responsible for deposit flow as well as normal commits
Commit client input now takes a UTxO instead of tx
For the next stage I would need the deposit PR merged since I'd like to create and balance the deposit tx and return it to the user for signing/submitting.
Could quickly specify DELETE /commit route for recovering pending deposit
Question arises: Should the hydra-node pay for the recover tx balancing or we should serve the tx to the user to balance/sign/submit?
It is a bit better UX to make the hydra-node do it since we don't have an easy way (yet) to balance the tx using cardano-cli and we are not sure if importing tx and balancing in the user wallet would even work (eternl can import the tx but not sure of other wallets)
Let's make the node balance the transactions since we mentioned a plan already that we could have a feature to make only the requester's hydra-node pay for the transactions once they are the next leader.
Tackling initial code to handle DELETE /commits/tx-id so that the user is able to also recover the deposit

SB on PR changes in deposit/recover workflow

Moving to this since I got a review and I need this work to continue on incremental commits
Not too bad, I can get this done quickly it seems
I should get the deadline and headid from the decoded datum of a recover UTxO
Recover mutations seem to be missing here also so I'll need to spend some time to add them
One mutation should be one that mutates the deadline of the recover transaction. I expect to see DepositDeadlineNotReached D04.
In order to do that I need to mutate the input datum deadline.
If I just remove the lower bound validity I should get DepositNoLowerBoundDefined D03
It took some sweet time to compile the mutation that changes the input datum deadline and I still don't get the wanted failure. This happens because of slot/time conversion so I am looking into this right now
I noticed I was using the same deadline for deposit and recover so that's one problem. I got the mutation working but broke the recover transaction so that's a bit puzzling. Working with time is always a headache especially in the pure Haskell setting like this.
Debugging the current seed:

input datum deadline is 7364703000
recover datum deadline is 7364703001
tx lower validity slot is 7364703

and we get D04 which means we tried to recover before deadline has passed or said differently recover lower validity slot is not after the deadline.

Ok, I was re-using the recover slot so found a bug. I really like logging the work since it helps me immensely.

2024-09-03

SB on incremental commits

I could start working on implementing first the offchain parts of incremental commits since deposit/recover PR is now in review. I'll check with Noon where did they left off since they started looking at this initially.
Ended up looking at haddoc builds and our website haskell documentation. Turns out you can't build haddocs if they are in the testlib (at least it seems that way)
Ok, hydra-tx initial work is merged, deposit/recover validators and tx building is PR'd so I'll continue further and see where things are standing on the initial commit branch
First I'll make everything compile again and then focus on the mermaid graphs and make changes which should be well specified already in our Hydra spec
I removed the mermaid diagram in protocol.md which was I suppose added as a possible alternative and kept the one we are currently implementing.
Did a small refactor to resolve a TODO in HTTPServerSpec and now looking into issue itself to make sure I don't miss something important before diving in.
Added the new Chain handle to tackle the new request (could reuse the draftCommitTx too but wanted to be verbose at the beginning at least) and I look around the API server code to determine what changes are needed.
Seems like we need two more types in order to support deposit/recover drafting. Possible code solution:

...
  | IncrementalCommitDepositRequest
      { utxo :: UTxOType tx
      , deadline :: UTCTime
      }
  | IncrementalCommitRecoverRequest
      { utxo :: UTxOType tx
      , deposittedTxIn :: Text -- TODO: This should be a TxIn but in the API server we are not yet using the Tx (Cardano) type.
      , deadline :: UTCTime
      , recoverStart :: Natural
      }

Slightly problematic is that in the API server level we are still abstract over the transaction type so using for example TxIn as the argument to recover should be done differently. We can always just try to parse the request field into the type we expect off course.
I wanted to explore first to get understanding and so that we all agree on the execution plan in order to minimize implementation time.
Will we call hydra-tx directly from the API server? Do we even need new Chain handle?

2024-09-02

SB on wrapping things up in hydra-tx/recover/deposit

I wanted to add number of recover outputs in the recover redeemer
While doing this I realize the recover script witness is not properly attached
To fix the healthcheck for recover tx I need to do exactly what I mentioned couple of days ago - build depositTx and obtain the resulting utxo so I can create arguments to the recover tx from it.
Ok after all this is done my mutations still reveal the issue D05 which means the locked utxo hash is not matching with the outputs.
I realized I actually create a script output from a recover transaction. ouch. What I should do is try to decode the datum off-chain and produce whatever was locked in the datum.
I like current work on this as I feel like I am gaining better understanding but it can also be that I was too distracted when I implemented this originally?
Looking at what we did to unlock the committed outputs - logic should be exactly the same
Ok, so I ended up decoding the datum from the user provided TxOut and had a minor bug with the number of hashed outputs therefore the mutations turned out to be red

August 2024

2024-08-29

SB on Increment transaction and mutation

I have stubbed out the deposit and recover transactions and now it is time to do the same for the increment one.
Working on increment transaction should reveal missing parts in the deposit/recover (if any) and allow us to write mutation tests to test out the validators.
I drafted increment tx for the mutation tests but it is not yet valid:

Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Left
(ScriptErrorTranslationError (RedeemerPointerPointsToNothing (AlonzoSpending
(AsIx {unAsIx = 1}))))),(ScriptWitnessIndexTxIn 1,Left
(ScriptErrorRedeemerPointsToUnknownScriptHash (ScriptWitnessIndexTxIn 1)))]

It seems I didn't attach the deposit script properly but from what I see all is good
Since the deposit datum needs to match in the increment transaction I could assemble a deposit tx and grab a UTxO from it to pass it to the increment transaction. Maybe this is the cause of increment not being valid?
Ok, this seems to produce nicer errors

Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Right (ExecutionUnits
{executionSteps = 209565180, executionMemory =
672406})),(ScriptWitnessIndexTxIn 1,Left (ScriptErrorEvaluationFailed (CekError
An error has occurred: The machine terminated because of an error, either from
a built-in function or from an explicit use of 'error'.) ["C02","PT5"]))]

By the way C01 means the PT token is missing and if I disregard this for now I get to see D02 error which means deposit doesn't have upper bound defined. Nice! we are making progress in terms of deposit tx now. Mutation tests like this are really valuable since they allow us to make sure all parts of the assembled transactions are as we expect.
If I want to add upper validity for the deposit transaction I need a slot and currently we accept UTCTime in hydra-tx cmd line arguments.
If I choose to accept slots how hard it would be to build a deposit datum?
I couldn't find appropriate functions for now to go from SlotNo -> POSIXTime since this type is what get's set in the deposit datum.
Turns out I was looking at outdated tx traces 🤦 The spec PR contains the new changes related to these validators.

2024-08-16

SB on implement deposit validators and off-chain code

Starting to work on https://github.com/cardano-scaling/hydra/issues/1574
We want to have a deposit (both $\mu_deposit$ and $\v_deposit$ ) and refund validators in order to be able to commit to a Head or refund to the user in case of unsuccessfull commit
I am looking at the tx trace graphs and mermaid diagrams for these workflows
$\nu_deposit$ is parametarized by the seed (outputs user wants to commit) and the $\v_deposit$ validator hash
It seems like it would be easiest to start with $\v_deposit$ and specify needed mutations first but in order to specify mutations we need off-chain tx construction together with the on-chain code so I'll start with drafting the needed types/functions for $\v_deposit$ instead and then $\mu_deposit$
I added checks for before/after deadline for the Claim/Recover redeemers, check that head id is correct and deposit token is burned (not complete yet)

2024-08-16

SN on magic-nix

Another trial of magic-nix which uses the github internal cache as a binary cache for nix.
Magic nix does not offer out-of-the-box any public cache interaction and we would need to make this work together with cachix. In another exploration, @locallycompact encountered problems by having both enabled: https://github.com/DeterminateSystems/magic-nix-cache/issues/78
However, before tackling this I wondered what the build time improvement would be?
Plan: Getting a first single package built and rebuilt with both magic-nix or cachix.
Magic-nix:
- Using accept-flake-config to seed magic cache from cachix.
- Rerun of hydra-node build with primed cache: https://github.com/cardano-scaling/hydra/actions/runs/10418269282/job/28854308115 -> build step only: 1m04s, overall 2m49s (still uploading)
- Most time is spent on evaluating the derivation (IFD)
- Post action uploads are rate limited - does a typical rebuild result in enough uploads?
- Seems like magix-nix post action is re-uploading similar store paths on each rerun -> overall build time counts: 3m14s Are our builds non-deterministic due to IFD?
- Rebuild hydra-node (changed version number) with magic nix: https://github.com/cardano-scaling/hydra/actions/runs/10418479282 -> 4m45s and no pushed paths pushed?
- Rerun of the same action also resulted in a rebuild (indeed no paths were pushed) -> 4:19s, also no paths pushed again
Cachix:
- Switching over to using cachix-action https://github.com/cardano-scaling/hydra/actions/runs/10419403396 -> rebuild of hydra-node in 4m47s
- Rerun of cachix build https://github.com/cardano-scaling/hydra/actions/runs/10419403396/job/28857732485 -> 1m28s, 1m50s
- Dedicated rebuild of hydra-node with cachix https://github.com/cardano-scaling/hydra/actions/runs/10419717683/job/28858340501 -> 1m25s (accidental cache hit)
- Now with a proper, new version for rebuild with cachix: https://github.com/cardano-scaling/hydra/actions/runs/10420234054/job/28859845924 -> 5m04s
- Rerun after this https://github.com/cardano-scaling/hydra/actions/runs/10420234054/job/28860297642 -> 1m39s
Conclusion:
- Magic nix is slightly faster in fetching (from github cache than download from cachix), but also spends more time in pushing paths after builds
- Also saw some paths pushed too often, or not at all (e.g. the deliberate rebuild of hydra-node was never cached)
- It does not provide public access to the cached derivations
- Not worth it

2024-08-04

SB on prevent decommit with the same version and different utxo to decommit

When looking at the BehaviorSpec tests related to the fix we wanted to do I noticed actor is requesting a decommit with UTxO they don't own.
Another problem ( and this is th original issue we started to look at) is can process transactions while decommit pending Here the problem is that we still have around the decommit transaction which was snapshotted but not observed (so it is still both in the new snapshot and the confirmed one) and we don't have a way of detecting if the next snapshot (L2 tx in this case) should ignore the pending decommit and just sign the snapshot.
I am tempted to add a field to ReqSn to detect if the reqest for snapshot is just L2 tx and in this case not check for decommit tx in the snapshot.
Would this approach somehow influence also requests for snapshots that have some decommit tx? Not sure at this point but I don't think so.
So it is possible to have multiple snapshots with the same decommit tx in case there is a pending decommit but this decommit tx can't be altered until the decommit is settled. Only then you can request a new decommit and the local txs would still continue to work with the pending decommit.

2024-08-01

SN exploring peerbit

Browsed https://github.com/dao-xyz/peerbit/ a bit: A replicating document storage framework that uses libp2p under the hood to discover peers and establish transport.
Peerbit applications seem to be structured around Documents that are replicated across peers. https://github.com/dao-xyz/peerbit/blob/master/packages/programs/data/document/document/src/program.ts#L89
Eeach document has an and index and a SharedLog that seems to concern itself with replication to "replicas".
Modifications on documents are AppendOperations replicated through the shared log https://github.com/dao-xyz/peerbit/blob/master/packages/programs/data/document/document/src/program.ts#L395
More details on the sharding approach can be found here: https://peerbit.org/#/topics/sharding/sharding
The shared log ultimately uses lbp2p pubsub to reach replicas and submit messages https://github.com/dao-xyz/peerbit/blob/5205167d4d1be4176741aa08458a6841f9b32c81/packages/programs/rpc/src/controller.ts#L284
The RemoteBlocks abstraction suggests it does something similar like ouroboros BlockFetch https://github.com/dao-xyz/peerbit/tree/5205167d4d1be4176741aa08458a6841f9b32c81/packages/transport/blocks

SB on new transactions work even if decommit is pending

Fix I did to make this work enables us to not apply decommit transaction twice by looking at the previous confirmed snapshot.
We may want to exclude utxoToDecommit on the next snapshot if this is the case.
I will draw transaction traces on the miro board to see possible scenarios
There is another problem with the pr: by adding the threadDelay to make sure we observe transactions on the next block we broke some logging test.
Perhaps I can insert this thread delay only if the caller says so?
Ended up wrapping the withSimulatedChainAndNetwork so it takes extra parameter that determines if there should be a delay when posting/observing a tx.
Other main issue is that we are not sure what are the consequences of keeping/removing utxoToDecommit in the following snapshot/s.
It seems that we want to make sure that we do not sign snapshot if the version is the same as the confirmed one and the utxoToDecommit is different. In this case we should reject the reqSn request and in case where version and utxoToDecommit are the same we just want to skip applying it.
There are tests already that help us reason about these changes and I'll focus next on trying to come up with a design for both spec and code that we care to maintain. Currently it feels fragile and it seems we can't predict the exact behaviour of decommit feature since we came a long way just to realize this problem right now.
Looking at can only process one decommit at once in the BehaviorSpec

{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"serverOutput":{"decommitTxId":1,"headId":"31323334","tag":"DecommitFinalized"},"tag":"ClientEffect"},"effectId":0,"inputId":13,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effectId":0,"inputId":13,"tag":"EndEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"inputId":13,"tag":"EndInput"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"effect":{"serverOutput":{"decommitTxId":1,"headId":"31323334","tag":"DecommitFinalized"},"tag":"ClientEffect"},"effectId":0,"inputId":14,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"input":{"chainEvent":{"newChainState":{"slot":7},"observedTx":{"distributedOutputs":[42],"headId":"31323334","newVersion":0,"tag":"OnDecrementTx"},"tag":"Observation"},"tag":"ChainInput"},"inputId":14,"tag":"BeginInput"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"effectId":0,"inputId":14,"tag":"EndEffect"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"inputId":14,"tag":"EndInput"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"outcome":{"error":{"message":"decrement observed but no decommit pending","tag":"AssertionFailed"},"tag":"Error"},"tag":"LogicOutcome"}

Seems like another issue we have is to somehow loose the local decommit tx before the observation happens?
I see 4 observations of OnDecommitTx so on first occurrence we delete the local decommit tx but afterwards we experience the error?
Then afterwards we see a new DecommitRequested and a ReqSn:

{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"effects":[{"serverOutput":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"headId":"31323334","tag":"DecommitRequested","utxoToDecommit":[22]},"tag":"ClientEffect"}],"stateChanges":[{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"newLocalUTxO":[],"tag":"DecommitRecorded"}],"tag":"Continue"},"tag":"LogicOutcome"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"serverOutput":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"headId":"31323334","tag":"DecommitRequested","utxoToDecommit":[22]},"tag":"ClientEffect"},"effectId":0,"inputId":15,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effectId":0,"inputId":15,"tag":"EndEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"message":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"snapshotNumber":2,"snapshotVersion":0,"tag":"ReqSn","transactionIds":[]},"tag":"NetworkEffect"},"effectId":1,"inputId":15,"tag":"BeginEffect"}

And eventual failure because of not bumped snapshot version:

{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"error":{"requirementFailure":{"lastSeenSv":0,"requestedSv":0,"tag":"ReqSvNumberInvalid"},"tag":"RequireFailed"},"tag":"Error"},"tag":"LogicOutcome"}

July 2024

2024-07-30

SB on No processing of transactions during a decommit

I start with a failing test case already where we try to decommit, wait to see the decommit approved and then do a new tx and expect it to be snapshotted correctly
Snapshot number 1 is confirmed with utxo = 2 and utxoToDecommit = 42
Then we see TxValid/ReqTx by both nodes after ReqSn one node can't apply next snapshot:

{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"error":{"requirementFailure":{"error":{"reason":"cannot apply transaction"},"requestedSn":2,"tag":"SnapshotDoesNotApply","txid":1},"tag":"RequireFailed"},"tag":"Error"},"tag":"LogicOutcome"}

So we have a utxoToDecommit in the next snapshot and it is not applicable since the utxoToDecommit was already removed from the confirmedUTxO but we also didn't observe the decommit tx.
How would we know when the next snapshot comes if we already applied decommit tx or not?
One idea I had is to check before trying to apply decommit tx and see if in the local confirmed snapshot we already have the same utxo and if that's the case then we would not try to apply the decommit transaction again.
This works but we need to think if this strategy will work in all different scenarios we can have with L2 transactions while decommit tx is still kept locally/not observed.

2024-07-25

SB on prevent empty decommit

Turns out the ledger rules are actually preventing empty inputs/outputs in some transaction. In case you don't provide any inputs you get TxBodyEmptyTxIns since it doesn't make sense to accept a transaction with empty inputs and in case there are no outputs the ValueNotConservedUTxO is thrown.
I'll add one test that will assert this is true for decommit transaction and assert on the error so we can call this non-issue done.

SN on hydra-doom

Exploring hydra-doom watch mode
Start by adding a command line flag to individually enable -send-hydra and -recv-hydra
Looking around for command line arguments and found the -wss and -connect args from doom websockets.. should we integrate as a net_module too?
The doom-workers repo is a nice, small, vanilla JS application to take inspiration from (good shortcuts like display)
When only watching with -hydra-recv I realized that the datum decoding was actually not working.
Disabling the demo makes it nice and idle if ?watch is set
After figuring out encoding and decoding the redeemer, submitting ticcmd_t pieces this was quite easy. Now the question is.. how would we get a synchronized view/start of the game? How is this working in multiplayer?

2024-07-22

SB on Reviewing final PR for incremental decommits

Everything looks good to me. Nice job, I am glad this is finally done.

SB on Add Snapshot type to the spec

I introduced new snapshot type in our transaction traces https://miro.com/app/board/uXjVMA_OUlM=/?moveToWidget=3458764594997541131&cot=14
There can be Normal , Inc or Dec type of snapshots each signaling their purpose. Normal ones are reserved for new L2 transactions. Inc snapshot includes some new UTxO into the Head and Dec takes something back to L1.
For the spec changes I decided on the letter $\kappa$ to represent this new snapshot type for no specific reason $\kapa \cup Normal \cup Inc \cup Dec$
The tricky part is to explain all of the possible decisions we can make in close transaction based on this snapshot type and the type of a close redeemer.
In the end it wasn't that bad to update the close transaction. I just had to specify 4 different close current/outdated with respect to Inc and Dec snapshot types and also update the figure to include the snapshot type $\kappa$ in the close redeemer.

2024-07-19

Draw transaction traces for unordinary scenarios (de/commit)

I updated the tx traces we added yesteday to depict the flow with the increment/decrement transactions
I think we gained some clarity together with Franco
Updated the drawing with the new snapshot type I propose and the close redeemer addition to specify this type
Now close needs to distinguish what to do based on the snapshot type: Normal, Inc or Dec and the redeemer type: CloseCurrent, CloseOutdated, CloseInitial
It is time to update the spec with these proposed changes
Changes in spec mainly need to consist of explaining the new snapshot type where appropriate and use the new type in the increment/decrement transaction.
Validators need to contain check for conditions I explained in the tx traces https://miro.com/app/board/uXjVMA_OUlM=/?moveToWidget=3458764595074776662&cot=14 which take into account the new snapshot type together with close redeemer type
This snapshot type will be propagated through the close redeemer too

Open a Head on preview

Seems hard to find information on when the head was initialized on-chain?
I used the slot/block when I published hydra scripts for that but how would you find it otherwise? Answer: We could search for tx metadata. All hydra init transactions contain HydraV1/InitTx.
Do we need to expand on the documentation for these corner cases in order to provide more information to our users?

2024-07-11

SN on incremental decommit cleanup

After putting the close redeemer casing, I had also aligned the "outdated" case signature verification to use version - 1 -> this made tests fail!?
Using version instead of version - 1 makes mutation tests green. This is wrong?
EndToEnd tests now also fail with "MustNotChangeVersion" -> this is connected!
Fixing the version used on close to the last known open state version (offChainVersion), makes E2E pass
While the "can decommit utxo" was green right away, "can close with pending decommit utxo" was red first. After adding a delay of 2 secs after submitting the decrement it was green -> race condition in hydra-node when trying to close while decrement was not yet observed back. I think this is expected and we need to manage the UX here (e.g. try close again).
In any case, the pending commit E2E test is redundant.. it's not a strictly a pending decommit and the test is the same as the one above
After mutation and e2e tests are fine, model tests still choke on new MustNotChangeVersion
Found an original bug (or current inconsistency?) in HeadLogic: version is incremented whenever the snapshot leader sees the decrement, but not to what the chain observation includes.. could be a race condition (model tests fail)
When trying to tie the knot correctly in onDecrementTx, more and more abstract functions are needed. What is the most common denominator?
Testing correct observation of decrement is not easy as all our tests use "pure transactions" and only e2e tests spot misalignment on "real" transactions (with fees paid, change output etc.)
Removing snapshot number check from decrement: should motivate not having number in open state datum. Running mutation tests, tx trace and model tests after dropping it.
Only UseDifferentSnapshotNumber failed, and fails with a different reason: SignatureVerificationFailed.
This is expected as the snapshot number to verify in decrement txs is still taken from the datum instead of the redeemer. Let's change that now.
Also an interesting TxTrace failure with a single Decrement of snapshot number 0. Was expected to fail and does not fail now (is this a problem?)
Sub-typing all datums and redeemers likely makes sense. At what cost?
When fixing the TxTraces to not expect failure on decrement with snapshot 0, I run into another issue where contestTx is called with version 0 or otherwise too old version, resulting in a "should not happen" case.
I realize there are pros and cons about utxoToDecommit and deltaUTxOHash being a Maybe UTxO and Maybe Hash respectively.

2024-07-05

SB on incremental decommit

Maybe it can be worthwhile exploring the idea of how to trick the Head protocol into decommitting twice.
So we would ask for a decommit after which there would be no further snapshots.
Then we would close but the contestation should try to contest with the snapshot we used for decommit.
Would the fanout re-distribute already decommitted funds again?

June 2024

2024-06-27

SB & FT on failing decommit test

Franco wrote an e2e test that wanted to send a decommit request, wait to see it approved but then close instead.
The test was failing because of SignatureVerificationFailed and it was happening because of the version missmatch.
I fixed this by updating the version check:

-  snapshotVersion = alreadyBumpedVersion - 1
+  snapshotVersion = max (version - 1) 0

This way we don't go below 0 which was causing the snapshot validation to fail.
This worked and got us to another interesting problem. Close transaction was failing with:

InsufficientCollateral (DeltaCoin 0) (Coin 2588312)``

This is interesting since the collateral input had around 21ADA available which should be more than enough.
I was looking at the cardano-node logs and what seems to happen is that the posted decommit tx got into the mempool just before the close one. The close tx was then failing with the error above but this doesn't explain the error I was seeing. What I would have expected is the close tx to fail with BadInputsUTxO.
Our suspicion is that since the hydra-node has never observed the DecrementTx taking place on-chain, our local view of the wallet UTXO is invalid. This is likely causing the collateral error we are experiencing.
We decided to leave exploring this for now since it seemed like a race condition between the decommit and close tx but still I'd like to get to the bottom of it.

June 2024

2024-06-17

SB on decommit demo

I wanted to prepare a small demo for the team and demonstrate that decommits are working as expected.
In the process of decommitting I had small problems submitting a decommit tx since the balancing is on user side so setting the fees is smallish annoyance.
After successfully decommitting (a single utxo of 10ADA) now the head is still open but there are no funds inside.
Posting a close yields:

An error happened while trying to post a transaction on-chain: FailedToPostTx {failureReason = "TxValidationErrorInCardanoMode (ShelleyTxValidationError ShelleyBasedEraBabbage (ApplyTxError (UtxowFailure (AlonzoInBabbageUtxowPredFailure (ExtraRedeemers [AlonzoSpending (AsIx {unAsIx = 0})])) :| [UtxowFailure (AlonzoInBabbageUtxowPredFailure (PPViewHashesDontMatch (SJust (SafeHash \"0e96eb70e55ac2cfa0b9d93aa5ef3fd57dff54de704d72b7d355896581e70a85\")) (SJust (SafeHash \"2862039cf16f77fc21cb3673a4cb6e9185636c9b6a3e519f3ec8416a4af469e5\")))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (ValueNotConservedUTxO (MaryValue (Coin 0) (MultiAsset (fromList []))) (MaryValue (Coin 9982912420) (MultiAsset ( fromList [(PolicyID {policyID = ScriptHash \"ff0fc67587997fc3330fb29dff6c388166baaf0d62f934882deee38d\"},fromList [(\"4879647261486561645631\",1),(\"f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d\",1)])])))))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash \"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2\"}) (TxIx 0),TxIn (TxId {unTxId = SafeHash \"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2\"}) (TxIx 2)])))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure NoCollateralInputs)),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (InsufficientCollateral (DeltaCoin 0) (Coin 2586728))))])))"}

So we get InsufficientCollateral errors and now I want to see why would hydra-node's internal wallet yield this error.
If there are no funds in the Head this should not be related to what hydra-node internal wallet owns.
Ok, the error above seems to be just the outcome of trying to Close multiple times without waiting for a close tx to be observed BUT fanout yields two UTxO with the same amount!

{
    "11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2#1": {
        "address": "addr_test1vp5cxztpc6hep9ds7fjgmle3l225tk8ske3rmwr9adu0m6qchmx5z",
        "datum": null,
        "datumhash": null,
        "inlineDatum": null,
        "referenceScript": null,
        "value": {
            "lovelace": 9836259
        }
    },
    "cddf3345a486396a8516418244c54161aab511efb99234580836114fa0f5b6cc#0": {
        "address": "addr_test1vp5cxztpc6hep9ds7fjgmle3l225tk8ske3rmwr9adu0m6qchmx5z",
        "datum": null,
        "datumhash": null,
        "inlineDatum": null,
        "referenceScript": null,
        "value": {
            "lovelace": 9836259
        }
    }
}

which means we decommitted 98362259 lovelace, then closed with the same ConfirmedSnapshot which had the utxoToDecommit field still populated and that ended up creating another output with the same amount! A BUG!

May 2024

2024-05-17

SB on Decommit model spec

We went through the decommit model for testing and determined we can't just randomly produce some utxo to snapshot. We need to maintain the utxo set of the head if we want to be able to decommit something but not alter the head value.
I did a spike yesterday and tried to evolve the head utxo set in the model but decided to drop it since I noticed even before this change there were some flaky tests.
Now I want to make sure everything is green all the time before I try to add decommits again.
I ended up removing the Augmented snapshot number constructor. We didn't account for them in the validFailingAction and it was causing weird behavior in the presence of negative testing. Now all non-decrement actions are green and I need to come up with a way of evolving the head utxo and make sure there is some value to decrement before actually attempting to decrement (TxBodyOutputNegative errors).
I enabled Decrement actions again and made sure that the head utxo value contains the initial utxo value at the start. Then, when decrementing I make sure to pick some utxo from the spendableUTxO and decrement it. This way all is green but we also need to make sure to test actions with various snapshots.
This is not so straight forward to do and I need to think about the easiest way to do this.
Perhaps keep the seen snapshots in the model and occasionally pick one of the old ones?
Are the negative actions we produce enough to say the decrement is working properly?
Seems like everything is green just because we adjustUTxO and not really remove anything from the head utxo.
I want to add a check to make sure the value in the Head is reduced right after a decrement.

2024-05-16

SN on hydra-doom

Lots of pain with emscripten and typescript, but forgot to keep track of log items... anyhow:
Websocket.addEventListeners were leaking heavily, with removeEventListener things seem fine. I wonder though, is there a way to re-use "subscription infra" on the frontend and put events into a queue of some sorts?
When running the scenario for a while, the log of the hydra-node grow heavily (should control log density). So I removed the file from disk (the file descriptor likely was still existing) .. and perceived performance got a bump!?

2024-05-14

SB on Decommit model spec

Last thing I remember is that we experienced CannotFindHeadOutput errors sometimes when running the TxTraceSpec tests. There is some flakyness here but most of the time the tests are green which is not something I would expect. We wanted to see failures in close/fanout related to Uw (U omega) addition/checks.

Sprinkled some tracing in the close/contest/fanout tx creation to see what kind of UTxO is there and detect why Head output is not found.

"Close: 9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace\n52f744b6ac#71 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14"

"Contest: 1d3367a1b0#0 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14\n9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace"

"Fanout: f5110cd941#0 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14\n9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace"

From the traces above I don't see clearly the head output with some PT's
I disabled the Decommit action generation and the same problem still exists
- so this is not tied to the changes I added for decommit action. I'll try to go back few commits to assert tests were green before decommit snapshot addition.
Going back just leads to missing decommit utxo in the snapshot errors.
Tracing in the State module suprisingly reveals the head UTxO is actually there???
Ok, removing shrinking with qc-max-shrinks=0 reveals the problem again.

       Assertion failed (after 4 tests):
         do action $ Close {actor = Alice, snapshot = Augmented 3}
            failingAction $ Fanout {snapshot = Normal 3}
            action $ Fanout {snapshot = Augmented 3}
            pure ()
         Model {headState = Closed, latestSnapshot = Augmented 3, alreadyContested = []}
         Fanout {snapshot = Augmented 3}
         Failed to construct transaction: CannotFindHeadOutputToFanout

Seems like we always seem to produce something to decrement (decommitUTxO in the snapshot can be Just mempty) which doesn't make sense (and there was a comment in the PR related to semantics of Just mempty vs Nothing). Let's fix this first since I think it is tied to missing Head utxo problem above.
After directly producing the snapshot with something to decrement we get the problem of negative output TxBodyOutputNegative which is expected. We just generate snapshots without taking care the actual UTxO is part of the Head UTxO.
I think the next step is to make sure whatever we decrement needs to be part of the Head UTxO.
After this change I see H09 or HeadValueIsNotPreserved error in the decrement tx.
This means that the head input value does not match the head out value since we added decommit outputs.
If I update the UTxO state in the run model with the new UTxO containing the decommit outputs then I get H24 which is FannedOutUtxoHashNotEqualToClosedUtxoHash.
This points to the fact that we don't do a good job of maintaining the UTxO set between close/fanout in our specs. But we wanted to keep it minimal so that was fine so far, maybe it is time to add more UTxO modelling in our specs.

2024-05-13

SB on fixing the tx parsing from text envelope

We seem to think that the cardano-cli is using the TextEnvelope transaction format. At least this is what we expressed in one of our tests (accepts transactions produced via cardano-cli).
This test was generating an arbitrary transaction and then serialise it to the text envelope format and check if we can parse it back.
This test now fails because the tx type we expect is not correct. The transaction type depends on the fact that tx is signed or not ("Un/Witnessed Tx BabbageEra") but the serialiseToTextEnvelope produces just "Tx BabbageEra" which makes our test red.
I tried assembling a tx using cardano-cli and noticed that what we get is the "Ledger CDDL" format instead and proper string for the tx type ("Un/Witnessed Tx BabbageEra").
So it seems there is some discrepancy between what we expect and what we actually get as the cardano-cli tx format.

2024-05-10

SB on fixing Auxiliary data hash in commit tx

Progress. I can see the blockfrost submit error with the instructions provided by MLabs.
Seems like the correct hash for the commit tx is detected (12e5af821e4510d6651e57185b4dae19e8bd72bf90a8c1e6cd053606cbc46514) but for whatever reason it doesn't hash to 61f765d1abd17495366a286ab2da9dbcb95d34729928f0f2117f1e5ad9f9e57b which was expected hash.

SB on Model spec for decommits

Last thing I did in the new model tests for decommits is to add the actual UTxO to decommit to our modelling of the Snapshots.
This led to H35/34 validator errors which means the Snapshot signature is invalid and in turn it means whatever outputs wanted to decommit are not complete in the Snapshot and/or they are not properly signed.
Tracing the produced Snapshot with something to decommit reveals that decommit output is actually present in the decommit transaction. Does this mean the snapshot is not properly signed?
Indeed this is the case. We create a snapshot and the signatures but don't set utxoToDecommit. Afterwards we set the utxoToDecommit before calling decrement function but the signatures are invalid at that point.
After re-signing the proper snapshot with decommit output we go further and the model spec fails with invalid Decrement action which is expected to fail.
Turns out the problem was that I was removing the decommit outputs from the utxo set and this is not what we want to do since the snapshot utxo is just generated and was never actually committed.
Now we come to a problem where head output was not found for contest/fanout txs.

2024-05-09

SB on fixing Auxiliary data hash in commit tx

Our users MLabs are having issues with the commit tx produced by hydra-node. The issue is that the auxiliary data hash is not correct.
Our current tests are all green and we explored the wallet code in order to figure out at which point the auxiliary data hash becomes invalid but with no success.
There are both property tests and the tests that actually complete the whole workflow and return the signed commit tx to the user but nothing reveals the issue.
The last attemp was to recalculate the aux hash in the conversion from/to ledger tx but this turned out to not be a proper fix.
After this I tried to assert the following for different combinations of:
- Aux data and hash is valid when create tx using commitTx function against draftCommitTx_
- Aux data and hash is valid when create tx using draftCommitTx_ function against signed commitTx tx
- Aux data and hash is valid when we encode/decode tx to make sure that the aux data is not lost in the process and all the tests are green ...
Is this somehow blockfrost specific? I know MLabs are using blockfrost to submit this transaction and so maybe this is worth exploring.
They have the workaround by just recalculating the aux hash since they have a way to re-sign the hydra transaction but this is not a good solution.
I think further steps are to try to reproduce the issue together with MLabs and see if we can find the root cause.

February 2024

2024-02-14

SB on Validating transactions in MockChain

When working on reproducing the contest bug we found out that our mock chain layer is not validating transactions properly.
When calling the evaluateTx we get Either EvaluationError EvaluationReport but we never check if EvaluationReport contains some ScriptExecutionError
Another problem we found is that addNewBlockToChain updates the new slot and block but keeps the UTxO same.
We added check when trying to submitTx to see if the tx we are trying to submit is actually valid, but, with the changes above we observe H03 error which is ReimbursedOutputsDontMatch.
We error out in the checkAbort with this error in case the hash of the committed UTxO doesn't match with our hashed outputs in the abort.
So since the error points that there is perhaps something wrong in the way we simulate commits I added check in the performCommit to make sure when we see the Committed output, the UTxO inside matches with what we actually committed. I think we were observing other node's commits and that led to H03 we were seeing.
Now we get a completely different failure (seed is 590033147):

TransactionInvalid (TransactionValidityTranslationError (TranslationLogicMissingInput (TxIn (TxId {unTxId = SafeHash "2c7ce242a2cd37f5d53e0c733985eb1430a2cbb52ade7e2d95225f0be578e64c"}) (TxIx 5))))

The only thing is, when printing a tx I actually see the input 2c7ce242a2cd37f5d53e0c733985eb1430a2cbb52ade7e2d95225f0be578e64c with Ix 5 so WAT is happening.
I see that we need to invoke toApi on our utxo in the evaluateTransactionExecutionUnits and this is just coerce so it shouldn't matter but I added roundtrip tests anyway just to regain some sanity.

February 2024

2024-02-14

SN on reproducing the contest bug

Worked on the model and some back and forth, succeeded to issue close transactions of the initialUTxO available in the mock chain (surprised that it is kept around and made available as futureCommits?)
The model did not fail and wee seemingly can contest and fanout without problems. Let's look back at the logs of the bug issue
State changed events: Both nodes see the HeadClosed at slot 38911656 with deadline 2024-01-18T08:48:47Z
After four blocks with TickObserved, the HeadIsReadyToFanout
Node 1 stores a ChainRolledBack state changed event, seemingly to slot 38911656 (which would be exactly the point where the head closed)
Node 2 has two rollback events, one to slot 38911660 (after close) and then to 38911656 (at close)
In the logs, node 1 observes at 2024-01-18T08:47:36.188861Z in slot 38911656 a OnCloseTx, at 2024-01-18T08:47:40.410448Z in slot 38911660 a OnContestTx with snapshot 1, and at 2024-01-18T09:09:22.762856Z in slot 38911660 another OnContestTx of snapshot 1 (Arnaud mentioned he tried to wipe the state, re-observe and try fanout again)

Node 1 tried to post twice the CloseTx, first on 2024-01-18T08:47:04.257801Z with success producing tx 94a7f0b86ee448dedf3667114f968dd825055bee4c90379a4659d8563b2e7d4a, the second attempt at 2024-01-18T08:47:28.565181Z failed with

TxValidationErrorInMode (ShelleyTxValidationError ShelleyBasedEraBabbage
(ApplyTxError [UtxowFailure (AlonzoInBabbageUtxowPredFailure (ExtraRedeemers
[RdmrPtr Spend 0])),UtxowFailure (AlonzoInBabbageUtxowPredFailure
(PPViewHashesDontMatch (SJust (SafeHash
\"1119bb152dc2c11c54389752f0d6c8398912582a73d2517570e9898c1818bb3e\"))
(SJust (SafeHash
\"7bf7ccdce09da7e95d17ed5faf048b52f3d77a9c96928dd7656ad4e1647059b1\")))),UtxowFailure
(UtxoFailure (AlonzoInBabbageUtxoPredFailure (ValueNotConservedUTxO
(MaryValue 0 (MultiAsset (fromList []))) (MaryValue 89629254 (MultiAsset
(fromList [(PolicyID {policyID = ScriptHash
\"1f8f4076f72a0427ebe14952eb0709c046a49083baef749920cf6bfa\"},fromList
[(\"1ad7cb51c9e2d6250bd80395a5c920f6f628adc4b1bd057a81c9be98\",1),(\"dc9f50cac6e292e1aeaa438c647010f7f6f4fc85010092ff6cb21cf3\",1)]),(PolicyID
{policyID = ScriptHash
\"3ee6157268c7a3b7d21111b6772bc06aed0f425ef64b285223b308fc\"},fromList
[(\"2d2da703f0e1fb682f997f121e3f90084d12c11203dc8c1314da16d3\",1),(\"4879647261486561645631\",1),(\"c312067b108b2ddd1a50ad1e757720570100fad3f14d6f88fd64ed42\",1)])])))))),UtxowFailure
(UtxoFailure (AlonzoInBabbageUtxoPredFailure (BadInputsUTxO (fromList [TxIn
(TxId {unTxId = SafeHash
\"3b6ded925ce15cffc3f47ebee62242e75bb57b63d7dad943397ebe55112c7094\"}) (TxIx
0),TxIn (TxId {unTxId = SafeHash
\"b3704d8d7d7067c4b6910e4dd84b2074ffa4713027c14d81f18928ae3700d40d\"}) (TxIx
1)])))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure
NoCollateralInputs)),UtxowFailure (UtxoFailure
(AlonzoInBabbageUtxoPredFailure (InsufficientCollateral (Coin 0) (Coin
5293200))))]))

Node 1 did post a ContestTx at 2024-01-18T08:47:36.189167Z with id 495878f0639aa2aed384af4b72bcdea058b339e6fd5fc6f63b5e7238d7297baf
Then node 1 did post a FanoutTx at 2024-01-18T08:49:18.378833Z which failed validation with error code H25
All further attempts to fanout from node 1 failed for the same reason (2024-01-18T08:50:07.49598Z, 2024-01-18T08:58:38.158696Z, 2024-01-18T09:03:42.313769Z, 2024-01-18T09:17:42.869217Z)
Node 2 did close the head with the initial snapshot at 2024-01-18T08:46:49.728882Z with tx id 5a4ed22b4e8137118eff07f26b366e031d880e66ebf11b3fafa130b50fc95441 on slot 38911656
All fanout attempts of node 2 failed with error code H24, on 2024-01-18T08:49:18.323083Z, 2024-01-18T09:04:42.427032Z and 2024-01-18T09:18:44.892685Z
Node 2 observed OnCloseTx at 2024-01-18T08:47:36.18Z on slot 38911656, OnContestTx at 2024-01-18T08:47:40.478408Z on slot 38911660 and the same ContestTx again at 2024-01-18T09:09:26.563872Z

2024-02-06

SB/SN on debugging the stuck head on preview

Unclear what actually caused it
Situation: Tx requestor did not sign snapshot, the others (including snapshot leader) did
Consequence: No re-requesting of snapshot, original requestor can't do anything now?
- Can we test this?
- Could / should simulate this in tests (e.g. by deliberately using wrong hydra keys in BehaviorSpec)
Manipulating state is hard, especially as we also have acks and network-messages which need to be consistent
- Why are we storing non-confirmed data? e.g. things that aggregate into localUTxO and localTxs
Coordinated reset https://github.com/input-output-hk/hydra/issues/1284?
- Not loading in-flight information from persistence would have the same effect
- Gossip of signed snapshots is related to this as well (and would not require persisting network messages?)
- Use the chain to check point a snapshot to coordinatedly (safe) reset point?
Multiple ReqSn for same snapshot (upon new transactions "in the mempool"?)
- Can't recover from non-consensus
Remove localUTxO to at least void "inconsistent" local views and instead do that only upon Snapshot request / validation?
This whole thing feels super ad-hoc
Concrete next step: Sasha pays for closing and initiates a new head

January 2024

2024-01-22

SB on Importing hydra-node transaction to eternl

Tried removing the commit witness in order to test out if the produced cbor is compliant with the HW/CIP-21 but we get ScriptFailedInWallet when trying to cover the fee using internal wallet.
Not sure if I really want to pursue this further but at the same time it is interesting to see the needed changes.
I might just do this in my free time. One option is to make the user pay for the fees instead of hydra-node wallet.

2024-01-15

SB on Importing hydra-node transaction to eternl

Trying to figure out how to produce commit tx from hydra-node that is compatible with HW.
There is no option to import the tx into trezor-suite and simply test that it works.
Pasting here the answer from the Eternl guys just for the reference:

The issue comes down to the body map ordering like I said in the beginning of
this thread. I then said that it was a requirement to have all maps sorted
canonically according to CIP21. This is the issue we face with Eternl as well.

Eternl parses the input cbor, extracts the body cbor raw as is, ie as its
sorted in original cbor.
This raw body cbor is hashed and signed.
Eternl then needs to re-construct a final tx, for this, CSL is used to
de-serialize the tx, inject the witness Eternl creates and then outputs the new
tx cbor.
But when this is done, the body map is sorted in canonical order and thus the
body doesn't match the original one any more.

So the witness created by Etenl is correct, but the tx cbor returned contain a
different structure for the body

Possible solutions

- Make sure that tx created follow CIP-21
- After creating hydra tx, pass it through cardano-hw-cli transaction transform
  to reconstruct body to follow CIP-21. Must be done before any witnesses are
  added.
- Change Eternl to put tx back together using the original body cbor and not CSL parsed body.

cardano-hw-cli tool is not really acceptable to us since we should be able to figure out how to produce appropriate tx using cardano-api. Seems like what we produce is just not compatible with CIP-21.

SB on Decrement observation

Continuing work on incremental decommits I wanted to add the off-chain observation code.
Looking at contest tx which should be similar to decrement in a sense that it consumes Head output and produces another one.
I added the observation code without too much problems but need to revisit to make sure I didn't skip some expected checks.
Now our test fails because of NotEnoughFuel which makes sense since we are actually setting fees to zero in mkSimpleTx.
Increasing the amount of seeded lovelace for hydra internal walled fixes this issue and now the only thing that fails is the last check in the test where we want to assert that the UTxO we just created is adopted and can be queried from cardano-node.