-
Notifications
You must be signed in to change notification settings - Fork 87
Logbook 2024 H1
-
Ok, trying to figure out why the added check that the deposit script is spent is not working
-
Now I am trying to trace on-chain representation since I don't see off-chain any differences between the two
TxOutRefs
-
Ok, I found out that calling
toBuiltinData
before comparing the two values did the trick! Question is why we can't compare the two on-chain without turning it intoBuiltinData
? -
By writing the corresponding mutation I discovered the change is actually wrong since I can use any input in the redeemer and tests are green. Wat?
-
I reverted the recent changes so I can start again. Perhaps comparing hashes should work out?
-
So comparing hashes didn't work so I'll just leave this check as the last thing to do since other ones should be easier.
-
Next check in line is the signature validation. How to get the correct
$U\alpha$ ? What we include in the head is contained within the deposit datum. I should decode the datum and then use the information inside to produce the correct hash. -
Correct, this is also what the spec says (though that might be expensive)
-
Again I find it hard to debug when the validator fails since what I see in code seems correct but not having some easer way to trace on-chain code is making my life miserable.
-
Ok, I found the problem. Test snapshot is just plain wrong and doesn't contain anything to commit.
-
Another issue is that sometimes I wasn't able to find the deposit input properly. I think it had to do with the
resolveInputsUTxO
but now I changed the code to not rely on the input ordering and try to find the deposit by address so everything works again. -
This was actually the problem all along! Even the redeemer check from yesterday is ok now! Oh man, I am happy now
-
I don't understand why altering the order of these two conditions affect the increment tx healthyness?
&& claimedDepositIsSpent
&& checkSnapshotSignature
-
After writing the initial mutation that should check the deadline I can't seem to influence the deadline in the increment tx in such way to trigger the validator (deposit deadline seems to be set in the past, it is a negative number).
-
Perhaps I need to make sure the deposit deadline has a meaningful value first in order to resolve this test.
-
Creating the UTCTime from day, month, seconds did the trick but the input datum is for some reason invalid - can't be deserialized.
-
Ok, this happened because I was using the head input instead of deposit one. Now the datum can be decoded properly.
-
Since we are re-using the contestation period for the deadline, both in the http server when setting the deadline and in the handlers when constructing the increment, these values clash now. I opted in to set the deposit deadline as double the amount of contestation period. Hopefully this is enough time to be able to post the increment, if not we can always have a custom cmd line argument.
-
Now the deadline related mutations are green and I also updated the incremental commits documentation PR to reflect these new changes.
-
Next in line is the check for the head id. In the head contract we have the previous and the next datum to compare the headId. I opted in to put the head id in the
Claim
redeemer instead so that I have some value to compare when spending the output. -
I got both mutations I wanted green and I added a fixme to address needed changes in the spec for the Claim redeemer
-
Continuing to tackle the increment redeemer. There is immediately some confusion with the redeemer fields, seems like we need to keep the
TxOutRef
of a deposit we're trying to spend? -
I added one fixme to tackle the spec wording where
n
is mentioned in the first check that parameters are unchanged -
Now I noticed another discrepancyActually...I tell a lie, it is the deposit input.фincrement
vs.фdeposit
where the latter one is not introduced properly. -
This next check is interesting
Claimed deposit is spent
𝜙increment = 𝜙deposit
-
So this would mean I need to find the deposit input to increment transaction and then compare it to the data I have in the increment redeemer (
TxOutRef
) -
I still see deposit input first in the list, is the tx rendering borked? I see
sortBy (compare
onfst) (txIns content)
so we sort than print which could be the cause of my confusion. -
I don't immediately see why the check that deposit script input is spent is not working.
-
I got the request from Trym to time how much time it would take to open a head, then wait for couple of days and then start-chain from the moment the head was opened so we cut down on the chain congestion.
-
My plan is to use faucet keys we have and open a head on mainnet. Then stop the node and give it some slot/block from couple of days before and time how much time it takes to re-sync.
-
Question is would my aws instance survive two cardano-nodes at the same time? Let's see.
-
Downloading mitrhil db snapshot for mainnet
-
Fond a command to look for specific string in the output, hopefully it work for timing since we need to see
HeadIsOpened
eventually
time until websocat ws://localhost:4001 | grep -m 1 "HeadIsOpened"; do : ; done
-
Perhaps I would need to enlarge my aws instance since mainnet db requires a lot of disk space
-
I got the cardano node to run on the mainnet with usual quirks related to conway-genesis file and it's hash in the node config
-
I got the head opened as well but need to connect using websocat in order to be able to measure the time to catch up
-
I am realizing I need to re-map the ports differently since this aws server already runs one hydra-node and one cardano-node on preview
-
Now that I can connect with websockat I stole a small script to be able to measure the time until we see
HeadIsOpen
. This is the block before the head opening https://cexplorer.io/block/9df1b9f9c1357e9fc0960a079ab183fa99c669a3d493aca0af315b0d8f521eee and this is the slot/block hash
, "--start-chain-from", "137342970.9df1b9f9c1357e9fc0960a079ab183fa99c669a3d493aca0af315b0d8f521eee"
- I'll try to test it out now with one day difference to see the timing
-
I am starting by looking at the spec - perhaps there are some areas that we can already unmark from the red color since we implemented them?
-
On the second thought we unmarked everything last time once we had the implementation so I'll wait on this one.
-
Wrote the initial increment mutation that should test that increment happens before the deadline.
-
Funny that the deposit input is the first one, I was expecting the head input to be always the first one so need to see what is happening.
-
There is missing tx upper bound for the increment it seems, we need to set it's validity so it becomes invalid if it is equal or greater than the deadline.
-
Interesting problem - how to come up with a meaningful value for the upper validity slot for increment? We could use the current slot and calculate upper bound like we do for contest. I think this is the correct solution, we don't need to depend on saving the deadline in the local state so that we can use it for increment.
- I've fixed the test I was working on yesterday (I keep forgetting we don't issue any network messages on commits)
-Perhaps we should introduce a network message to signal to other parties
that deposits/recovers/increments are happening? Not sure if there is anything
related in the spec
-
The spec doesn't mention any network messages on deposit so all good.
-
Anyway I think we should always try to snapshot whatever we have pending for deposits and the logic on
AckSn
will discard if something is not valid. -
Continuing to write similar tests as the one we had for decommits to gain better understanding and perhaps find some quirks in the head logic.
-
One thing I found which is different from decommits is that a node can observe increment transaction but not have the txid stored locally since it is not propagated by the network message. Perhaps the commit messages should be propagated over the network
-
Other thing is we need to try to snapshot any pending deposit on each ReqTx. Similar to decommits, if there is something pending it should go into the next snapshot.
-
The fact that genSnapshot is so random and hardly meets our coverage drives me to a different strategy: Generate plausible sequences of snapshots and only pick from them when generating Decrement/Close etc. actions.
-
Bringing back
ProduceSnapshots
action, should it contain a list of snapshots or only a single one and have multiple such actions? -
Considered to remove snapshots from Model now after use, but this would make the overall test suite very optimistic. For example, we would not keep any snapshots with old versions after a decrement and never test outdated snapshots are rejected.
-
Realized that I need to improve the precondition of NewSnapshot to ensure it remains a plausible sequence of snapshots.
-
Turns out we need to keep track of the state version and inferring from knownSnapshots is not good.
-
At some point a fanout was invalid with a hash mismatch on the delta. Printing the corresponding
realWorldModelUTxO
shows an interestingTxOut
:TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraConway) (ShelleyAddress Testnet (ScriptHashObj (ScriptHash "9a81d9534c3eed3e374a921fd06ead4a822018e6a0281ec28423ee67")) (StakeRefBase (ScriptHashObj (ScriptHash "7734a436653455cf24035622e1216c83676446d1088b47d589f81956"))))) (TxOutValueShelleyBased ShelleyBasedEraConway (MaryValue (Coin 10) (MultiAsset (fromList [])))) (TxOutDatumInline BabbageEraOnwardsConway (HashableScriptData "\EOT" (ScriptDataNumber 4))) ReferenceScriptNone
That is, it contains a stake credential! Could this be broken right now? Why don't we test with stake keys in other, more "unit" properties?
- I suspect that the fanout generator is also not generic enough?
- Could not confirm. Neither stake address nor datum was the issue.
-
For a fanout to have a mismatching hash, it could also be that the hash in the closed state is wrong. And inded, in the counter example we have a Close, Contest and then Fanout. Looking closer, the Contest was not resulting in a different delta UTxO hash!
-
The TxTraceSpec is always issuing contest transactions with ContestOutdated redeemers. Why?
-
Now shrinked this is a very intersting case:
do action $ NewSnapshot {newSnapshot = ModelSnapshot {version = 0, number = 4, snapshotUTxO = fromList [(A,10),(B,10),(C,10)], decommitUTxO = fromList []}} action $ NewSnapshot {newSnapshot = ModelSnapshot {version = 0, number = 5, snapshotUTxO = fromList [(A,10),(B,10)], decommitUTxO = fromList [(C,10)]}} action $ Decrement {actor = Alice, snapshot = ModelSnapshot {version = 0, number = 4, snapshotUTxO = fromList [(A,10),(B,10),(C,10)], decommitUTxO = fromList []}} action $ Close {actor = Carol, snapshot = ModelSnapshot {version = 0, number = 4, snapshotUTxO = fromList [(A,10),(B,10),(C,10)], decommitUTxO = fromList []}} action $ Contest {actor = Bob, snapshot = ModelSnapshot {version = 0, number = 5, snapshotUTxO = fromList [(A,10),(B,10)], decommitUTxO = fromList [(C,10)]}} action $ Fanout {utxo = fromList [(A,10),(B,10)], deltaUTxO = fromList [(C,10)]} pure ()
Notably, the
Close
andContest
here are already using outdated snapshots (theDecrement
of version0
already happened). Obviously, the fanout with(C,10)
is not allowed here. This is a model error and we should not expect this to succeed. -
As we will make our genSnapshot generation more model-driven, we should be able to get rid of some complexity in precondition / validFailingAction.
-
When checking coverage again, I realize we are not hitting
fanout
very often. Especially the one without utxo inside, i.e. all utxo was decremented. -
Not reducing the values and just decommitting the whole A-C makes us reach fanout with an empty head more likely.
-
Confused by the fact that when I reduce frequency of generated Decrements to
0
, the coverage still reports "has decrements"?
-
There is a leftover from the incremental-commits off-chain work which specifies that we need to add more interesting tests for our new commits
-
As a first step I wanted to separate the existing decommit tests in the
BehaviorSpec
and do something similar for the commits. -
I realize pretty quickly that we send client inputs for decommits but not for commits so perhaps I need to move the tests someplace else?
-
I think I could just use the mocked chain layer to directly post a deposit, let's see
-
Oh, ofcourse, you can't post deposit, only increment and recover. Perhaps I can post increment directly but that requires a lot of parameters and I am not too happy with this level of testing.
-
I implemented couple of missing validations for the new api endpoints
-
Added a test that on recover we check the head id is correct
-
The deposit deadline is not properly configured so I wanted to re-use the contestation deadline for the same purpose (and see later if users would like to configure it differently)
-
To do this I had to pass the environment to the http server
-
Now the recover tests are failing because of the deadline check which is now set further in the future depending on contestation deadline. They are fixed just by waiting for sufficient period of time to pass aka contestation deadline
-
Now I would like to actually record the deposit deadline and upon each recover warn the users beforehand that recover would fail unless we wait for deposit deadline to pass.
-
I am realizing that these changes conflict with the PR I drafted yesterday to remove some local state and now I need to actually re-add the deadline.
-
The test I am writing is supposed to prove you can do multiple deposits but the issue is, they actually go one by one at the time since snapshots are sequential. Is it safe to increment multiple at once if the state contains multiple UTxO?
-
Since increment is spending only one deposit output this needs to be sequential for now
-
When I try to do two sort of simultaniously I get
ReqSnCommitNotSettled
error which I need to investigate
-
I was looking through code trying to see how to use the deposit deadline in the best way to avoid stuck Head in case when recover spends the output we want to snapshot.
-
What I wanted to do when it comes to deposit/recover actions is to make sure the deadline is correctly set (we can probably define it in terms of contestation deadline + some buffer) and then it seems like the most appropriate place to check the deadline would be when we receive the client input.
-
we could check that in case of any pending deposit we want to recover, it's deadline has passed so that it is safe to do recover. Perhaps the good thing to add is to check that there is no snapshot in flight too and if possible compare the information in the snapshot to make sure the same utxo we want to recover is not in there.
-
This is a bit hand wavy but what we could do is attach a timestamp to each client input (requires a separate type to wrap
ClientInput
and add time information). Then in the head logic, when we receive the client input with a timestamp we could check if recover input time is larger than the set deadline and only then try to recover. -
What I don't like is the fact that we have to deal with time in the pure
HeadLogic
but at the same time we would just compare the time we already would receive together with the client input. -
What are the alternatives to this? I don't see many unless we make recover to be received just by HTTP call and keep the deadline around somehow so we can reject the wrong recovers immediately.
-
After drafting a nice test yesterday with SN I added on top the piece of code I already had to make everything green
-
PR review suggested some code changes which are nice so I did those but there weren't any major objections into merging this
-
I sent the docker image to Micrograx to test it out, hopefully it all works out so we can move on.
-
I continued working on this today, progressing to remove any local UTxO state we actually don't need.
-
Now it is time to use the
spendableUTxO
and find the correct UTxO to spend in increment/recover in there. -
Seems like using the
isScriptOutput
is the easiest way to track down the UTxO I need to spend -
After making everything to compile I don't see e2e tests green as I suspected. I think last time I looked at this I suspected that
adjustUTxO
is doing a bad job when it comes to these new transactions since they are related to L1 so we need to preserve UTxO as they were without adjusting. I'll test my hypothesis real quick. -
I was wrong about this, both increment and recover can find the deposit output but recover fails to post with
D05
- incorrect deposit hash, so debugging this rn. -
In the end it was just a condition in
HeadLogic
that was wrong, everything works now so it is time to wrap things up and make all tests green again. -
One note: I need to think about the deadline since we should incorporate this information we have to perhaps reject recovers that are bound to fail because of the deadline.
-
This is a follow up from the work on the incremental commits off-chain.
-
I kept a local map with the deposit UTxO I need to spend in the recover but this should not be needed since we should have all observed UTxO in the chain state already.
-
Looking at this user issue again. I noticed the rewarding redeemers are missing so I added them to the commit tx and I also see them after the balancing so this change should be correct but the error remains the same one I saw before
WithdrawalsNotInRewardsCERTS
-
If I just try to naively submit the user transaction the error is the same so perhaps the user transaction is invalid (it is missing the outputs anyway) but I am not sure if no outputs are causing this.
-
I also created cardano-cli test using the same commands as user did in the issue description to see if it leads to the same result and it acually did.
-
Looking at the ledger code it seems that there is a check to make sure any rewarding is matched with a proper certificate. The user transaction is missing certificates so I could perhaps try to add them programmaticaly and see if that helps?
-
The proper thing to do is to construct a valid tx that does rewarding and make sure these tests are green.
-
I am pretty sure on hydra end everything should be ok once I add the non-spending redeemers but need to confirm with the user if what they are sending is actually ok and should work since I can't submit a user tx without any error.
-
We looked at pr this morning together with the team and decided to merge it (yay!) and identify next steps in order to get it even more in shape
-
I discovered some weird stuff in the api schema so had to fix that and also add a changelog entry to remind us not to release this version by any means since it contains security holes big as... well major flaws with the validators returning true.
-
Now I can move on with my life and have future small PR to fix all I wanted to fix in there.
-
I finalized incremental commits off-chain branch and need to look at the user reported bug where our blueprint transaction doesn't allow withdraw redeemers.
-
I have the request json, script used to construct the withdraw script output so I'll start by first inspecting the request UTxO
-
There was already a test I drafted in a branch so looking to improve this test to detect possible bug we have
-
Seems like mocking the entire thing is time consuming, I'll just slap around quick e2e test since I have the request body available.
-
Ok so this is the error I see when using the user request:
SubmitTxValidationError (TxValidationErrorInCardanoMode (ShelleyTxValidationError ShelleyBasedEraConway (ApplyTxError (ConwayUtxowFailure (MissingRedeemers [(ConwayRewarding (AsItem {unAsItem = RewardAccount {raNetwork = Testnet, raCredential = ScriptHashObj (ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")}}),ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")]) :| [ConwayUtxowFailure (UtxoFailure (UtxosFailure (CollectErrors [NoRedeemer (ConwayRewarding (AsItem {unAsItem = RewardAccount {raNetwork = Testnet, raCredential = ScriptHashObj (ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")}})),BadTranslation (BabbageContextError (AlonzoContextError (TranslationLogicMissingInput (TxIn (TxId {unTxId = SafeHash "0b95bec9255a426435e6aa7ff9749d76ee4c6d815acb47f10142db889b0edce4"}) (TxIx {unTxIx = 3})))))]))),ConwayUtxowFailure (UtxoFailure (ValueNotConservedUTxO (MaryValue (Coin 21466323) (MultiAsset (fromList [(PolicyID {policyID = ScriptHash "8dc5d80bb963464a696850684f783ea8b7dacd1cea9b38f1425b7716"},fromList [("f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d",1)])]))) (MaryValue (Coin 23466323) (MultiAsset (fromList [(PolicyID {policyID = ScriptHash "8dc5d80bb963464a696850684f783ea8b7dacd1cea9b38f1425b7716"},fromList [("f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d",1)])]))))),ConwayUtxowFailure (UtxoFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash "0b95bec9255a426435e6aa7ff9749d76ee4c6d815acb47f10142db889b0edce4"}) (TxIx {unTxIx = 3})]))),ConwayCertsFailure (WithdrawalsNotInRewardsCERTS (fromList [(RewardAccount {raNetwork = Testnet, raCredential = ScriptHashObj (ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")},Coin 0)]))]))))
notably WithdrawalsNotInRewardsCERTS
is pointing me that this could be
correct reproduction even if utxo is perhaps invalid.
-
Consulting the ledger code I see that this error means Withdrawals that are missing or do not withdrawal the entire amount so hopefully once I fix the missing withdrawal redeemer I can say the bug is fixed.
-
I need to figure out where is the problem - is it in the commit tx building or in our wallet while balancing.
-
Ok seems like the problem is in the commit tx building where we try to re-map spending redeemers since we want to add one (initial) input.
-
Seems like I could just record whatever rewarding redeemers where there and add them on top after the spending redeemers are re-mapped.
-
It worked in the sense I can get passed the error I saw above and now I get the
ScriptFailedInWallet
which probably means that the wallet code is missing to properly adjust the reward redeemers too. -
Nice findings - I would like to have a general function that re-maps all of the redeemers so we are covered for any case there can be in the future.
-
Did some code review to kick off the day
-
Majority of work is done already and now it is the time to cleanup and make all of the tests green.
-
It turns out all tests are green a part from the logging/api schema validation - the most annoying tests to make pass
-
I realized that I am missing to finish up the tutorial also so this is something to do first and then fix the tests
-
Of course I found a bug by doing a tutorial! We need to do this more often. I discovered that we always serve past deposits in a list since the projection is never deleting any. Fixing this and adding a test to make sure it works.
-
Now all that is left is to tackle the schema errors it seems since the tutorial is updated.
-
Yesterday I created the test case that will drive the implementation of blueprint tx in the deposit construction.
-
I noticed the
hydra-tx
is not compiling because of some changes in the observations so I'll make that compile somehow. Not sure if we even want this executable since it is not serving it's purpose really. If you managed to draft deposit/recover using this executable then the tx is not balanced so the UX is not good at all. -
We looked at some minor problems with Noon to solve the new endpoint that lists the deposits so that thing is done (still missing to document it in our api schema)
-
Wrote something about incremental commits in the monthly report
-
Continuing with the blueprint support for the incremental commits - I should look at how I did commit before and do similar stuff in here too.
-
Adding the blueprint support for deposits was done smoothly, main problems were finding the types between cardnano-api and ledger-api but overall nice - the tests are green.
-
Looking at the PR comments and it seems the biggest part is to change
TxIn
toTxId
for recovering so maybe I get that out of the way before starting the final cleanup -
Ok, this proved to be easy enough. I was wondering if it is safe to assume that any deposit
TxId
can be turned intoTxIn
by always using the zero index? -
Now for the cleanup
-
Before diving in to make the recover work with head logic instead of drafting directly from the api code I wanted to fix the observation tests first now that we think our observations are correct.
-
This made me need to return more information when it comes to increment transaction since the head UTxO is not enough to observe this transaction.
-
Now it is the question if I should make all the tests green again or do that after recover is implemented differently. Since nobody will look into this soonish I'll just make the recover changes first and then make all things green again since I don't want to do any duplicated work.
-
I am not sure why I though this step would have a lot of code changes. I propagated all the changes and now see again
D04
which is the deadline validator check. -
I set the recover deadline to whatever was set as a deadline in deposit with a caveat that I do convert to UTCTime using
posixToUTCTime
so perhaps the conversion is wrong or I just need to wait in the e2e test? -
It would also be nice if the deadline is expressed in slots since that way I could emit errors even from the
HeadLogic
. -
Let me summarize the situation right now: deposit deadline is set in the api code (users do not influence it at all), recover deadline is set to whatever we already have in the state as our deposit deadline for certain
TxIn
(we keep the map ofTxIn
containing additional data), so obviously recover deadline is the same as the deposit deadline so the validator check is triggered wrong. To solve this I could pass either expect the users to give me a deadline (not very good UX), I could create new deadline as soon as the recover endpoint is hit (more feasible), I could also look at the existing deadline and make sure it is increased in recover (can lead to tx rejection by the cardano-node if the slot is in the future) -
If I just add some time to the saved deposit deadline the test passes so I realize that I need a way of providing the recover deadline and my plan is to take the currentSlot directly from
HeadLogic
upon receiving the recover request and then convert the slot to UTCTime in the handlers upon constructing the recover transaction -
This seems like a correct thing to do here - construct a deadline from the current slot.
-
Ok, the test is green, moving on.
-
I need another endpoint to list the pending deposits - this is very easy to implement now that we record the pending deposits in the local state. I could draft a test and have one of the guys implement this to help me out while I continue with the rest of the changes needed for the off-chain part of the commits (like suport blueprint transactions)
-
Adding support for the blueprint committing feels like important and also most complex part of the work that is left.
-
Ok, so today I'll get the observations in place correctly before I move on to fixing the recover workflow so that it goes through the Head logic completely instead of drafting the recover directly from the api code.
-
Removing the
depositScriptUTxO
fromIncrementTx
. I find it a bit hard to envision what exactly do I need in order to build theincrement
transaction successfully. If I don't have the correct UTxO for deposit script then I need to find it in the local UTxO state (since this is where UTxO ends up if it's being observed) but then I need to pick from this deposit UTxO (potentially many outputs) the one I am actually trying to deposit. -
Feels natural to include
TxIn
in theDepositObservation
so that I know exactly which deposit I observed so that I can use the sameTxIn
to find the proper UTxO to increment. -
I think I see the whole workflow in my head but it will take some sweet compilation time to get to a place where everything compiles again and works. But on the other hand I can say that the deposits/increments are done correctly then and I can focus on fixing the recover workflow. Let's dig in!
-
Lost a lot of time looking into why deposit is not observed after my changes only to realize I removed the constructor from
observeHeadTx
🤦 -
The condition to make sure deposit datum contains correct information was wrong. The observations tests didn't seem to care and they were green but the e2e tests proved we are not observing the deposit tx correctly.
-
My focus today is just bad. If all observations and e2e work I will not go into digging why did these changes make sense since I need to move faster.
-
I decided to keep
depositScriptUTxO
in the deposit observation since I didn't find the convenient way to find the deposit script UTxO to spend in the local UTxO. It seems a bit weird since If I observed the deposit this means that it should be part ofspendableUTxO
no? -
Instead I propagate the needed values (so deadline, depositTxIn etc) so that
OnDepositTx
can bring them in to theHeadLogic
. -
Now I needed to extend the
IsTx
typeclass with the new associated typeTxInType
since I want to keepMap TxIn (UTxO, UTxO, UTCTime)
locally where first utxo is the one you wanted to deposit, second one is the script utxo that is to be spent in the increment and the map key is there so we can have multiple deposits around. Deadline is just the deposit deadline, we might need that one as well if nothing else just to show the information when it is ok to recover the funds. -
Ok, after tackling all related code upon adding this new associated type I am moving through the head logic with the new �
PendingDeposit
alias that should hold all information we need to do increment/recover later on. -
I propagated changes through the head logic and now would like to use
TxIn
to find the deposits for increment/recover -
I also propagated
TxIn
inIncrement
andOnIncrementTx
to be able do easier find the deposit I want to remove. -
Next step is to start with the api changes in recover (do not call handler code from the api but go though the head logic) and that will reveal easier way to delete recovered deposits by using the new available
TxIn
.
- When handling the error of
etcdctl put
the question arises: what ifbroadcast
fails? How should theHeadLogic
react on this? - Not allow
broadcast
to fail and buffer not succesfully sent things instead. - For some reason
withProcessTerm
now correctly stopsetcd
.. well, fine. - Difference between resilience against minority cluster writes and crash-recovery is really minimal. i.e.
TBQueue
vs. a "persistent queue" - Storing the last known revision in a file seems to be sufficient to avoid re-delivering already seen messages. But do we need this? Our application should be fairly resilient against duplication of messages. On the other hand, writing a file on a best-effort basis should not be too much of a problem (this is not write ahead!)
- With these latest changes temporary package losses of up to
100
percent are handled just fine! The only reason why benchmarks would stop is because of the timeout of confirmations (20 seconds), but if the package losses are intermittent it's just fine - Using pumba, we can also randomly select nodes which perceive a
100
percent package loss every couple of seconds.
-
Continuing the work after yesterday's successful demo
-
The problem I had yesterday with recovers not working is related to the fact that you can't recover from any node (currently). Only the node that crafted the original deposit has in it's state the needed utxo that needs to match with the recover.
-
There are many changes that need my attention but I'll focus on the observations first
-
I moved the recover observation to hydra-tx package and removed deposit and recover from tests that make sure head transactions are observed. Now they have their dedicated tests
-
I got the recover observations working but experience
ScriptErrorRedeemerPointsToUnknownScriptHash
for index 0 which is unexpected here -
Turns out this is just a consequence of
D04
which is the script deadline error -
I made sure to resolve fixme in the deposit observation which mentions that we need to check that the deposit datum contains exactly the deposit transaction inputs
-
Removed
utxo
field from theDepositObservation
since it is not needed (deposit TxIn will be convenient here tho) and propagated changes -
This broke the increment creation since we used this deposit scipt utxo to easily find the deposit we want to spend. Should be simple to fix this.
-
Next work will be probably to track multiple observed deposits in the local state and then just pick one existing one on recover
-
We got the new user issue where they say that the withdraw redeemers are stripped once the wallet balances the transaction.
-
I want to write a test to exercise this theory and I think I could just use the
WalletSpec
to create a transaction with some withdraw redeemers, then callcoverFee
and then take a look at the redeemers and if they stayed the same. -
Upon balancing we adjust the spending redeemers but the others should be intact.
-
I stubbed out the initial test case and noticed that the tx generator doesn't actually produce any redeemers so this is the next step to do.
-
Main takeaways:
- changelog update missing
- inline api docs on post draft commit
- Deposit should work with a blueprint tx!
-
I continued working on the branch: simplifying the api code to re-use existing types we had in place and making the recover use the path instead of query string
-
Wrapping things up so we can have a review soon.
-
I also want to try to decommit from the tui today with our fresh Head
-
Need to address user issue and try to reproduce the issue mentioned with cardano-cli (about the datum hash mismatch)
-
I had a problem where I couldn't ssh into my aws account and it turns out if you have too many ssh keys this sometimes happens. As a workaround I had to provide additional flag
-o "IdentitiesOnly yes"
-
Continuing to fix the red tests - it seems like the logging tests are especially a PITA since it is a bit hard to figure out why we get failures sometimes.
-
We got another user issue related to our blueprint transctions which mentiones that if you try to commit to a Head using blueprint transaction the withdraw redeemers are dropped from the transaction.
-
Need to read upon how withdraw zero trick exactly works but so far I know that you can reduce execution costs if you specify the script purpose to be rewarding instead of spending.
-
I'll run the demo just so I can get to an open Head and try to use the request body in the issue to reproduce the issue on my end.
-
Yesterday evening after work I realized what could be the problem with the e2e test for the recover. As I mention we try to balance the tx twice, I should only return the transaction from the handlers and then make the hydra-node post in on client input.
-
Ok, I was right and now I got the recover e2e green! Yee-haw!
-
I'll include the recover workflow also as part of this pr which makes me happy.
-
Question arizes: should I also observe the recover transaction and then have a way for a cleanup in the local hydra state? I think so since we can't rely to do a cleanup on posting only.
-
For sure, even our protocol.md diagram says so.
-
I'll use the opportunity to clean up unused variables etc. while I add the recover observation.
-
One thing is: how to get the head id from recover transaction? Currently there seems to be no way which means we would need to check the recovered UTxO against whatever we have in the state related to deposits and if the two UTxO match then we could say recover is approved/finalized.
-
I ended up looking for head id in the resolved recover inputs, in the deposit datum and this worked nicely.
-
Tests are green and now it is time for a major cleanup.
-
I almost came to the end of the recover implementation but got
uncaught exception: JSONException
which was surprising. -
Debugging a bit to see where this happens and it seems sometimes on the api level we get this error but I don't see why exactly would this be happening.
-
Realizing I've changed what I return from the recover endpoint - it is no longer the transaction but just the string
OK
. Now the recover transaction is posted by the hydra-node and not the user in the end. -
After making sure all pieces are in the correct place so that I can recover a deposit I get
ErrNoFuelUTxOFound
error when trying to post recover transaction. -
Very weird since my wallet UTxO has plenty of ada.
-
I found out that this is happening in
mkChange
function that is supposed to calculate the change and error out in case inputs are lower or equal to the outputs but there is a fat fixme in there that says we need to account for minimum UTxO value too. -
Realizing this is just the outcome of some other problem I have (most probably in the Handlers when we construct and balance the recover transaction)
-
Recover is trying to spend the deposit output and produce as outputs whatever was deposited (outputs present in the deposit datum)
-
I need to think more deeply about the recover tx and how does it play with the internal wallet since I need to check if all inputs are properly resolved and how come there is
ErrNotEnoughFunds
error coming out of the wallet -
It is time to gain some sanity again I think. One question is how come I didn't see this error before when working with the recover transaction? The answer might be that I prevously passed in the UTxO user wants to commit to the
draftRecoverTx
and now I rely on the datum in the deposit output. -
In the wallet code it seems that the input balance is lower than the output balance by around 6 ada:
> 24687860 - 24681788
6,072.0000000000
What am I not accounting for here?
-
Could it be that the UTxO's are duplicated and I need to remove the deposit and commit UTxO from spendable UTxO before giving the tx to the wallet for balancing?
-
Good thing is that the amount seems to be always the same!
-
I tried several things and I think the bottom line is that we are balancing the recover transaction and then post it which should be doing the balance all over again. In my mind this operation should be idepontent but perhaps it is not?
-
If I just try to submit the recover transaction directly after assembling it I don't see any errors but I fail to query the wallet address to make sure the recover worked.
-
I really feel under the weather right now so I'll leave dealing with recovers for tomorrow or next time while I try to make the rest of the codebase into some revievable state for the team to look at.
-
We did a nice session this morning to see the current state of the PR and discover the next steps
-
These are the steps I wanted to outline that would be needed for the users to test out this new feature:
### Prepare the UTxO to deposit
- How to make the deadline configurable for our internal purposes like testing?
### How to issue a http request to kick off the deposit process
- Can we copy/paste the json utxo representation directly from one of the explorers?
### Sign and submit the deposit transaction
- cardano-cli example most probably
### Query the Head to see the deposited UTxO present on L2
- Issue `GetUTxO` call to view the L2 UTxO state
## How to recover deposited funds
### Follow the deposit steps
- How to demonstrate recover if hydra-node will pick up the deposit as soon as it is observed?
- Play around with the network so one participant doesn't see the new snapshot request?
### Once we make sure deposit is not observed we can issue a recover request
- How to make sure user keeps the committed UTxO representation around so that they are able to
issue the api request with the same json?
- They also need to provide the txin which complicates things? How could we make it more simple, could they query the hydra-node
to get this information?
-
Plan right now is to get the PR into shape as much as possible and outline in the PR what is outstanding which will need to be implemented as series of next PRs.
-
I have a good understanding where the main problems are where are we at currently so this should not be an issue at all.
-
As a concrete next step I'll try to make the recover e2e test pass by shutting down one of the nodes to that the increment is not posted so that I can exercise the recover workflow nicely (right now the recover tx get's rejected by the cardano-node because of the lower validity bound).
-
If I don't get it to work in a reasonable time I will leave the complete recover feature to be something we think about and implement as one of the next steps since in the end we do need to think more about the recover and the optimal way of doing it.
-
Even before any of this I should not expect the users to provide the deposit script UTxO (actually any UTxO since we should be able to have it around in our chain state) and take in
TxId
instead ofTxIn
at the recover endpoint. Also use the path instead of query string (just remembering all of the stuff we talked about this morning). -
I removed the undeeded UTxOs from the recover endpoint request and altered the e2e test to spin up two nodes, observe Head opened and commit approved and then shut down one of the nodes in order to be able to do recover instead of observing the increment.
-
Looking at the recover code - so you need to know the deadline to be able to construct the deposit datum so that one can spend the deposit output but how can you know the deadline if that is something hydra-node decides on?
-
The e2e recover tests are failing with the validator error related to invalid commit hash so currently debugging the UTxOs to see where is the discrepancy (commit utxo I am sending to hydra-node api and the local utxo we get from observing head transactions)
-
Realizing I just need to decode the datum of the deposit script output in order to get all parameters I need for the recover.
-
I looked at the increment observation tests and it turns out they are flaky. So not all of them fail but sometimes we experience some weird behavior and I decided to leave this problem for later on since the e2e is green and this should be a problem somewhere in how we wire things together (increment depends on deposit output also)
-
The most relevant next chunk of work should be connecting the recover endpoint to the handlers.
-
I slowly progressed one thing following another to make the e2e test that will prove that recover transaction worked.
-
This took some time - I decided to aslo encode the query string (
TxIn
) as base 16 string since the hash in the rendered transaction input doesn't play nice with query strings. -
Now I finally get the
unknown input
error on the recover transaction so progress! -
After making sure I use the correct
UTxO
for recover request (utxoFromTx depositTx
) I am gettingErrScriptExecutionFailed
D04
which is related to deposit deadline. Yeee-haw our validator is kicking in. I'll make sure to pass in the proper deadline. How to make the users life easier in terms of these recover inputs they need to provide? -
I commented out the validator check for the dealine to see if the locked commits are matching and I'll worry later on about the deadline. Oh nice...I get the
D05
so the locked commits are also not valid. Debug time! -
Looking at the issue again I realize users don't need to provide the deadline at all so removing the field from the request type.
-
If I just remove true from the recover validator I am getting the
NoFuelUTxOFound
error. I think we modeled the recover as if we will just run it as a standalone cli tool but to have hydra-node wallet balance these transactions we need more. What I think will work is to provide the UTxO which is locked at the deposit script so that we can satisfy the validator. Then we would also need the deposit script UTxO since we need to spend it so it must be given to the hydra-node wallet. -
Success. Validator doesn't run currently but at least our internal wallet is happy. Now turning on the validator checks so that I can construct a proper valid request and see the recover working. Feeling pretty close to success.
-
Ok, adding the appropriate slot lower bound made the deadline check pass. I need to make sure these values make sense.
-
The check for locked outputs is naturally ok once I've provided the correct UTxO to the recover tx so that's satisfying.
-
As the final step for this test let's make sure the deposited UTxO is returned back to the user and is not in the Head.
-
For some reason the test is hanging - can the user actually submit the tx node created?
-
I added the recover client input since I we need the hydra-node signature on the recover tx so it is easier if the node just does it.
-
Adding the
RecoverApproved
message so users can have some sort of insight in what is happening inside of Head. -
I feel burned out a bit, there are a LOT of changes needed.
-
After fixing the
Authenticate
network module to not drop messages, I ran into an issue whereoffline
mode would not open the head when configured with other--hydra-verification-key
parties. -
Fixing manually run
offline
mode nodes to communicate- Need to simulate other parties'
OnCommitTx
- Still seeing
MessageDropped
? - Disabled signature verification for now
- Sending the
ReqSn
now seems not to arrive on the other end. Use distinct keys per node (again). - Seeing
InvalidMultisignature
now:
BTW: at first the
InvalidMultisignature
content was tripping me now because it only contains a singlevkeys
when one signature fails- Each node is seeing the other node's snapshot signature in
AckSn
as invalid. I suspect something's wrong with offline-mode. - Re-generating keys did not help.
- Found it: the
headId
in the snapshot isoffline-alice
andoffline-bob
so we can't have networked offline heads because of that.
- Need to simulate other parties'
-
Side note: Is
cabal bench hydra-cluster
single mode broken? -
Lots of our tests depend on
PeerConnected
messages. How would we deal with "connectivity" information in case ofetcd
? It's not really that we are connected to node x or y, but rather whether we are connected to the majority or not. -
Etcd config
initial-cluster
requires a consistent list of peers across all instances. Right now I derive it from the given--host
,--port
and--peer
cli options, but this is very fragile. -
Problem: submitting messages is not possible while we are not connected to the majority or leader election is ongoing?
-
After giving
etcd
a bit time in the beginning the benchmarks work reliably. However, since I switched towatch
it got slower? Maybe using the command line parsing is not the fastest way to do this (and it depends on the internal loop ofetcdctl watch
).
-
Continuing where I left off yesterday - see how to fix the increment tx posting after changes in the deposit observation.
-
It seems like I need to re-gain some sanity here so let's break down things.
-
I made changes to the deposit observation so now what we return from the observation is not just the deposit UTxO but also what we locked inside of the deposit script so the code around observing deposits/ posting the increment needed to change to accomodate for these changes.
-
Now that e2e is green again and I also added the
GetUTxO
call in the end to assert deposited funds ended up in the head on L2 it is time to consolidate and try to get somebody involved with this work since there are too many things in scope and we want to get this work merged asap. -
One area in which I need to grow is to learn to ask for help! Right now I think the fastest thing to do is keep coding since I know I will lose time explaining what needs to be done but at the same time this puts the pressure on me to deliver all in time.
-
So what is outstanding is to implement recover e2e, implement api endpoint to list pending deposits, remove increment parts (observation mainly) but leave the transaction construction since we want to keep uninplemented validators to return False if we want to merge this thing. Also write a tutorial explaining the steps to deposit/recover.
-
I've looked at the increment observation tests and it seems like we will need to find a way to alter these tests so that we can return some UTxO that is working nicely with increment (needs to contain both head and deposit scripts). Main thing I see is we can't find the deposit datum in the observed UTxO
- After changing the deposit observation (reminder: still need to implement increment observation tests ) the e2e test is broken now since what I return from the observation is input UTxO locked in the deposit script not the actual UTxO we saw coming in from L1 so there is some work needed in order to get this right but nothing too hard to do.
-
As I realized before the observations are not done correctly confirmed by SN. They should be pretty similar to the commit ones since they are even using the same type
Commit
. -
I've made sure to implement the deposit observation in the same manner as commit one (since they are pretty much the same) but didn't have the strength to continue further (too tired) so I'll make it work on monday.
-
After adding the increment mutation I want to make sure that the increment tx is valid as a starting point.
-
Did a small change to provide the correct UTxO as the argument to
incrementTx
. What I need here is the output from a deposit tx. Right now I get:
Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Left (ScriptErrorEvaluationFailed (CekError An error has occurred:
The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["PT5"])),(ScriptWitnessIndexTxIn 1,Left (ScriptErrorMissingScript (ScriptWitnessIndexTxIn 1) (ResolvablePointers ShelleyBasedEraConway (fromList [(ConwaySpending (AsIx {unAsIx = 0}),(ConwaySpending (AsItem {unAsItem = TxIn (TxId {unTxId = SafeHash "913afe731ffea84a50b0b
so it seems like I didn't attach correctly a deposit script appropriately (index 1 here) and the head script (index 0 here) failed with PT5 (which is expected since validator returns false).
-
After returning
True
from a increment validator I can see that the head script got attached properly while the deposit is still missing (need to look at the deposit datum which is missing) -
Found out this was because of two issues:
- I used
mkScriptReference
and we are not referencing the deposit script so it should bemkScriptWitness
- I was attaching the headScript instead of deposit script derp
- I used
-
I quickly added a test to make sure we don't observe deposits from a different head
-
Now I am able to connect the deposit snapshot signing and posting of the increment tx as a result of it. Posting from head logic and connecting the handlers to be able to actually post the increment tx.
-
Wrote some Handler code to be able to post increment on deposit observation. Also added a type for observing the increment. We need this so that we can say the commit is finalized once the increment is observed.
-
I can try the e2e workflow now to make sure everything works up to observing the increment. It seems that deposit was not observed back after hitting the api so I investigate.
-
Turns out that after balancing we end up with a deposit that has two inputs and two outputs. One of these belongs to the hydra-node wallet as it's the one responsible to pay the fees to we need it's input and also we end up with the output that holds the change from balancing the deposit tx.
-
Debugging to see why we don't observe back the deposit. So it turns out we need a
utxoFromTx
on a deposit tx observation in order to be able to find the correct script output. -
What I notice is that we never issue AckSn but that is not so important right now.
-
Figured it out: I shouldn't include the the deposit tx in
localTxs
onReqSn
. This got the head stuck by waiting to apply the deposit transaction which is never applicable. (perhaps I am misunderstanding something since my explanation is fragile) -
We get to the next error which happens on posting the actual increment:
ConwayUtxowFailure (NotAllowedSupplementalDatums (fromList [SafeHash \"3135e9c31e91873b8b3a7d2922038e934785abd68401589bc67f40591af137c3\"
-
This happened because the increment tx construction was wrong and I should have used
InlineScriptDatum
when constructing the deposit witness. -
Progressing further since now I get to see the
CommitApproved
message and all that is left for the e2e to work is the increment observation! Nice, I am happy things are progressing nice and fast although there is really a lot of changes needed. -
Adding the increment observation now.
-
After needed changes I get to observe increment transaction but the committedTxOuts are not matching with the local deposit utxo. This should be easy to fix.
-
I've decided to return the UTxO directly from the observaton and compare it to the local one.
-
BAM! e2e is green now!
- Need to change semantics of
broadcast
as it should only "deliver" once the majority of the network saw a message. Going to give the callback a name too. - While
NetworkCallback{deliver}
makes things more consistent with literature and easier to read in the components, its also more verbose. Need to ask the team about this trade-off. - When disabling the
broadcast
shortcut, theNodeSpec
tests failed first, but only because it was asserting on additional message being submitted to ourselves. The mocked, recording network is not short-cutting and does not need to, so simply adding some more received messages and asserting less is fine here. - The
BehaviorSpec
tests were eating the whole memory until I removed the filter for own party in thebroadcast
implementation (to follow the expectation). Probably some busy loop wait for a message which never came. - After fixing the
ModelSpec
to also broadcast to self (increateMockNetwork
), I wondered.. which test is actually covering that short-cutting (or non-existence of it)? - Turns out only
hydra-cluster
end-to-end tests are testing this integration. Anyhow, making the lowest level network component (Hydra.Network.Ouroboros
) self-deliver after enqueuing a fire-forgt should resolve the previous behavior. - Not seeing the network layer operational right now (hard to tell through only
hydra-cluster
tests). I suspect it's because the network stack is filtering when seing a message delivered at the lowest layer.
-
Ok so after adding a field to the
Snapshot
that would hold the information on what will be committed to a Head it is time to move further and make the HeadLogic correct upon observing a deposit transaction. -
Looking at the issue it seems that deposit observation was a separate task but I did it already so I updated the issue.
-
Added some code to wait on unresolved decommit before issuing a
ReqSn
on deposit observation. -
Looking into the spec to figure out if there is any validation needed before including a depositted UTxO into a Head. Seems like there is no validation necessary which makes sense. If you managed to lock a utxo with a deposit tranasaction then this UTxO can't be spent twice so it is safe to include it in the Head UTxO.
-
I added check upon observing
ReqSn
where I need to make sure there is no other de/commit in flight before including the deposit UTxO into an active UTxO set and asking for a snapshot signature. -
Also added the
commitUTxO
to the localCoordinatedHeadState
to keep track of it. -
Seems like the time has come to actually construct the increment transaction from the observed deposit (and recover also). Perhaps it is better to make sure all tests are green first and implement recover workflow. The increment transaction is most interesting work but we need to build everything else around it since this would be the last step in this PR.
-
Made some logging tests green and now looking into failing deposit observations in hydra-chain-observer and hydra-node. Basic observation seems to work but plugging in the deposit generation + observation seems to fail.
-
Fixing this required me to pass in also the
utxoFromTx
so that the deposit can be observed since head UTxO doesn't contain the necessary datum as these transactions are not tied to the Head ones. -
Making almost all green (a part from one test in HTTPServer) paved the way directly into increment transaction construction and observation (I could do the observation in the separate PR).
-
Wondering if we merge this (incoplete) branch to master are we harming the regular hydra-node operation in any way? I'd assume not since users should only rely on the released versions anyway.
-
Detecting a place in code in the
HeadLogic
where the posting of the increment transaction should be triggered - most probably on snapshot ackgnowledgement. -
I was right, we can plug the increment posting easily on
AckSn
. Let's build the transaction first and then think about the plugNplay :) -
Doing this soon I realize we will need to add the Deposit to the script registry. BUT on the other hand I have the deposit UTxO so I can find the deposit input.
-
I tried running the end-to-end test for the incremental commits and get
PPViewHashesDoNotMath
which is surprising. -
Seems like our wallet code is assuming we are always having scripts in the Head transactions but deposit is different. It produces the script output but it doesn't run any scripts.
-
Tracing some code in the Hydra wallet module to figure out why we can't have a transactions without any scripts.
-
I did a conditional attaching of script integrity hashes with
scriptIntegrityHashTxL
and notice we also dorecomputeIntegrityHash
after a tx construction again. -
In case this hash is
SNothing
I just return already constructed tx body so hopefully if there are no scripts the wallet would still be able to construct the correct transaction. -
If I just remove
recomputeIntegrityHash
I got success on the deposit but the question is if we actually need this function for other transactions. Why did we need this in the first place? -
Decommit works just fine without
recomputeIntegrityHash
let's see about the other transactions but they should also be fine I think. -
So it turns out that
recomputeIntegrityHash
was a leftover that needed to be removed and all of the sudden the deposit tx is valid again. Nice! -
Moving on to the observation code now.
-
Writing simple observation didn't seem hard at all. Let; see if it actually works.
-
It worked after some tweaks - we look at the input UTxO and don't try to resolve the inputs as we are only interested if we can decode the datum to the correct one. Should we look at the deadline here too?
-
Moving into the changes in the HeadLogic now. I need to add some code to request a snapshot once a deposit tx is observed.
-
I stubbed out a function to be called on observed Deposit, added new field to hold commit utxo data in the
ReqSn
-
Added
utxoToCommit
to theSnapshot
andalphaUTxOHash
to theClose
datum. This required me to build a lot of modules. Tired now, continuing tomorrow.
-
Moving forward, I got the hydra-tx changes so I want to see the deposit end-to-end workflow working up to the API level.
-
Connected the hydra-tx/depositTx to our API and now doing the same for
recoverTx
-
Ok, the recover is connected too. I am itching to try out if deposit could be imported into a wallet but I need to work further :)
-
Continuing where I left off - making the script output that contains the datum hash so to prove that we either have a problem (most likely) or hydra decommit/fanout work as expected.
-
Realized that the
createOutputAtAddress
is just providing a pub key witness (usingmakeShelleyKeyWitness
). Somehow I need to also add theScriptWitness
but not sure exactly how in this setup and why is this needed now? Shouldn't I add the witness when trying to spend? ... Ok yeah. The error is coming from a hydra-node that tries to spend this script output so makes sense. -
It is actually funny since I was the one implementing the commit feature. I need to send a transaction that contains the script witness for the script I am trying to spend :)
-
Giving it a show and the test is green! hmmm.....
-
Failing test related to snapshot signature verification. This is surprising since I didn't touch the related code.
-
Trying out with removing the validation of some snapshot fields to see which field trips the validation.
-
It is the
utxoToDecommit
field that causes this. -
I can get the valid signature if I make sure that I also produce hash for
mempty
in case this field is not present and I don't see that any code changed at all which is surprising. -
It has to be something related to hashing of the empty UTxO by using
serializeData $ toBuiltinData mempty
. Previously we hadMaybe UTxO
as the argument and now we haveMaybe Hash
but I am not sure if I should spend time exploring this. -
Making the signature accept a
Hash
instead ofMaybe Hash
and fixing all the call places. This should introduce changes to our on-chain code mostly while we would still keepMaybe UTxO
in theSnapshot
. -
This yields some changes I would need to think about :
case
contestRedeemer of
Head.ContestCurrent{} -> -- TODO: think of the consequences of this
toBuiltin $ hashUTxO @Tx $ fromMaybe mempty utxoToDecommit
_ -> toBuiltin $ hashUTxO @Tx $ fromMaybe mempty utxoToDecommit
-
so the check
isNothing deltaUTxOHash
should be replaced with the check for empty hashhashTxOuts mempty
. -
Close mutations are affected by these changes.
H30
is happening now in the mutations forCloseUsed
-
Turns out in the
CloseOutdated
redeemer we need to make sure thatutxoToDecommit
is not present. TheCloseOutdated
means we are closing while the decommit is pending but I am not entirely sure why thenutxoToDecommit
needs to beNothing
? -
This highlights the fact that this knowledge needs to be captured somewhere (it was in my head). Most likely spec has the answer to this - checking.
-
The spec uses different terminolgy
Initial, UnusedDec, UsedDec
soUnusedDec
reminds me of ourCloseOutdated
but this is not the case where we expectutxoToDecommit
to be bottom! It is actually the filename of the mutation test that points to the correct redeemer on-chain! -
We need to make the connection between mutations and the spec/on-chain code obvious to avoid spending much time on this in the future and make it obvious which is which!
-
Working on incremental commits will hightlight this since we will need to add more close redeemers for the close!
- I realized I called deposit drafting
draftIncrementalCommitTx
so renaming that and connecting the chain handle to our API server. - Next step would be to actually construct and serve the deposit transaction but since I still don't have the code to do it I will move to the observation.
- I will not be able to complete the OnDepositTx observation off course but I can hook things up.
- After adding
OnDepositTx
I had to add a corresponding type toChainTransition
but should this really be here? Stubbed out the observation code with undefined's and errors here and there - Stubbed out the on-chain increment validator too and now since I got the green light on the PR I'll start using the new code and finally kick off incremental commits for real!
- Continuing with the off-chain changes for incremental commits
- Separating the
recover
flow to use REST DELETE soPOST /commit
route will be responsible fordeposit
flow as well as normal commits -
Commit
client input now takes aUTxO
instead oftx
- For the next stage I would need the deposit PR merged since I'd like to create and balance the deposit tx and return it to the user for signing/submitting.
- Could quickly specify
DELETE /commit
route for recovering pending deposit - Question arises: Should the hydra-node pay for the
recover
tx balancing or we should serve the tx to the user to balance/sign/submit? - It is a bit better UX to make the hydra-node do it since we don't have an easy way (yet) to balance the tx using cardano-cli and we are not sure if importing tx and balancing in the user wallet would even work (eternl can import the tx but not sure of other wallets)
- Let's make the node balance the transactions since we mentioned a plan already that we could have a feature to make only the requester's hydra-node pay for the transactions once they are the next leader.
- Tackling initial code to handle
DELETE /commits/tx-id
so that the user is able to also recover the deposit
-
Moving to this since I got a review and I need this work to continue on incremental commits
-
Not too bad, I can get this done quickly it seems
-
I should get the deadline and headid from the decoded datum of a recover UTxO
-
Recover mutations seem to be missing here also so I'll need to spend some time to add them
-
One mutation should be one that mutates the deadline of the recover transaction. I expect to see
DepositDeadlineNotReached
D04
. -
In order to do that I need to mutate the input datum deadline.
-
If I just remove the lower bound validity I should get
DepositNoLowerBoundDefined
D03
-
It took some sweet time to compile the mutation that changes the input datum deadline and I still don't get the wanted failure. This happens because of slot/time conversion so I am looking into this right now
-
I noticed I was using the same deadline for deposit and recover so that's one problem. I got the mutation working but broke the recover transaction so that's a bit puzzling. Working with time is always a headache especially in the pure Haskell setting like this.
-
Debugging the current seed:
input datum deadline is 7364703000
recover datum deadline is 7364703001
tx lower validity slot is 7364703
and we get D04
which means we tried to recover before deadline has passed or
said differently recover lower validity slot is not after the deadline.
- Ok, I was re-using the recover slot so found a bug. I really like logging the work since it helps me immensely.
- I could start working on implementing first the offchain parts of incremental commits since deposit/recover PR is now in review. I'll check with Noon where did they left off since they started looking at this initially.
- Ended up looking at haddoc builds and our website haskell documentation. Turns out you can't build haddocs if they are in the testlib (at least it seems that way)
- Ok, hydra-tx initial work is merged, deposit/recover validators and tx building is PR'd so I'll continue further and see where things are standing on the initial commit branch
- First I'll make everything compile again and then focus on the mermaid graphs and make changes which should be well specified already in our Hydra spec
- I removed the mermaid diagram in protocol.md which was I suppose added as a possible alternative and kept the one we are currently implementing.
- Did a small refactor to resolve a TODO in HTTPServerSpec and now looking into issue itself to make sure I don't miss something important before diving in.
- Added the new Chain handle to tackle the new request (could reuse the draftCommitTx too but wanted to be verbose at the beginning at least) and I look around the API server code to determine what changes are needed.
- Seems like we need two more types in order to support deposit/recover drafting. Possible code solution:
...
| IncrementalCommitDepositRequest
{ utxo :: UTxOType tx
, deadline :: UTCTime
}
| IncrementalCommitRecoverRequest
{ utxo :: UTxOType tx
, deposittedTxIn :: Text -- TODO: This should be a TxIn but in the API server we are not yet using the Tx (Cardano) type.
, deadline :: UTCTime
, recoverStart :: Natural
}
-
Slightly problematic is that in the API server level we are still abstract over the transaction type so using for example
TxIn
as the argument torecover
should be done differently. We can always just try to parse the request field into the type we expect off course. -
I wanted to explore first to get understanding and so that we all agree on the execution plan in order to minimize implementation time.
-
Will we call hydra-tx directly from the API server? Do we even need new
Chain
handle?
-
I wanted to add number of recover outputs in the recover redeemer
-
While doing this I realize the recover script witness is not properly attached
-
To fix the healthcheck for recover tx I need to do exactly what I mentioned couple of days ago - build depositTx and obtain the resulting utxo so I can create arguments to the recover tx from it.
-
Ok after all this is done my mutations still reveal the issue
D05
which means the locked utxo hash is not matching with the outputs. -
I realized I actually create a script output from a recover transaction. ouch. What I should do is try to decode the datum off-chain and produce whatever was locked in the datum.
-
I like current work on this as I feel like I am gaining better understanding but it can also be that I was too distracted when I implemented this originally?
-
Looking at what we did to unlock the committed outputs - logic should be exactly the same
-
Ok, so I ended up decoding the datum from the user provided TxOut and had a minor bug with the number of hashed outputs therefore the mutations turned out to be red
-
I have stubbed out the deposit and recover transactions and now it is time to do the same for the increment one.
-
Working on increment transaction should reveal missing parts in the deposit/recover (if any) and allow us to write mutation tests to test out the validators.
-
I drafted increment tx for the mutation tests but it is not yet valid:
Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Left
(ScriptErrorTranslationError (RedeemerPointerPointsToNothing (AlonzoSpending
(AsIx {unAsIx = 1}))))),(ScriptWitnessIndexTxIn 1,Left
(ScriptErrorRedeemerPointsToUnknownScriptHash (ScriptWitnessIndexTxIn 1)))]
-
It seems I didn't attach the deposit script properly but from what I see all is good
-
Since the deposit datum needs to match in the increment transaction I could assemble a deposit tx and grab a UTxO from it to pass it to the increment transaction. Maybe this is the cause of increment not being valid?
-
Ok, this seems to produce nicer errors
Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Right (ExecutionUnits
{executionSteps = 209565180, executionMemory =
672406})),(ScriptWitnessIndexTxIn 1,Left (ScriptErrorEvaluationFailed (CekError
An error has occurred: The machine terminated because of an error, either from
a built-in function or from an explicit use of 'error'.) ["C02","PT5"]))]
-
By the way
C01
means the PT token is missing and if I disregard this for now I get to seeD02
error which means deposit doesn't have upper bound defined. Nice! we are making progress in terms of deposit tx now. Mutation tests like this are really valuable since they allow us to make sure all parts of the assembled transactions are as we expect. -
If I want to add upper validity for the deposit transaction I need a slot and currently we accept UTCTime in hydra-tx cmd line arguments.
-
If I choose to accept slots how hard it would be to build a deposit datum?
-
I couldn't find appropriate functions for now to go from
SlotNo
->POSIXTime
since this type is what get's set in the deposit datum. -
Turns out I was looking at outdated tx traces 🤦 The spec PR contains the new changes related to these validators.
- Starting to work on https://github.com/cardano-scaling/hydra/issues/1574
- We want to have a deposit (both
$\mu_deposit$ and$\v_deposit$ ) and refund validators in order to be able to commit to a Head or refund to the user in case of unsuccessfull commit - I am looking at the tx trace graphs and mermaid diagrams for these workflows
-
$\nu_deposit$ is parametarized by the seed (outputs user wants to commit) and the$\v_deposit$ validator hash - It seems like it would be easiest to start with
$\v_deposit$ and specify needed mutations first but in order to specify mutations we need off-chain tx construction together with the on-chain code so I'll start with drafting the needed types/functions for$\v_deposit$ instead and then$\mu_deposit$ - I added checks for before/after deadline for the Claim/Recover redeemers, check that head id is correct and deposit token is burned (not complete yet)
- Another trial of magic-nix which uses the github internal cache as a binary cache for
nix
. - Magic nix does not offer out-of-the-box any public cache interaction and we would need to make this work together with cachix. In another exploration, @locallycompact encountered problems by having both enabled: https://github.com/DeterminateSystems/magic-nix-cache/issues/78
- However, before tackling this I wondered what the build time improvement would be?
- Plan: Getting a first single package built and rebuilt with both
magic-nix
orcachix
. - Magic-nix:
- Using
accept-flake-config
to seed magic cache from cachix. - Rerun of
hydra-node
build with primed cache: https://github.com/cardano-scaling/hydra/actions/runs/10418269282/job/28854308115 -> build step only:1m04s
, overall2m49s
(still uploading) - Most time is spent on evaluating the derivation (IFD)
- Post action uploads are rate limited - does a typical rebuild result in enough uploads?
- Seems like magix-nix post action is re-uploading similar store paths on each rerun -> overall build time counts:
3m14s
Are our builds non-deterministic due to IFD? - Rebuild hydra-node (changed version number) with magic nix: https://github.com/cardano-scaling/hydra/actions/runs/10418479282 ->
4m45s
and no pushed paths pushed? - Rerun of the same action also resulted in a rebuild (indeed no paths were pushed) ->
4:19s
, also no paths pushed again
- Using
- Cachix:
- Switching over to using
cachix-action
https://github.com/cardano-scaling/hydra/actions/runs/10419403396 -> rebuild of hydra-node in4m47s
- Rerun of cachix build https://github.com/cardano-scaling/hydra/actions/runs/10419403396/job/28857732485 ->
1m28s
,1m50s
- Dedicated rebuild of hydra-node with cachix https://github.com/cardano-scaling/hydra/actions/runs/10419717683/job/28858340501 ->
1m25s
(accidental cache hit) - Now with a proper, new version for rebuild with cachix: https://github.com/cardano-scaling/hydra/actions/runs/10420234054/job/28859845924 ->
5m04s
- Rerun after this https://github.com/cardano-scaling/hydra/actions/runs/10420234054/job/28860297642 ->
1m39s
- Switching over to using
- Conclusion:
- Magic nix is slightly faster in fetching (from github cache than download from cachix), but also spends more time in pushing paths after builds
- Also saw some paths pushed too often, or not at all (e.g. the deliberate rebuild of
hydra-node
was never cached) - It does not provide public access to the cached derivations
- Not worth it
-
When looking at the
BehaviorSpec
tests related to the fix we wanted to do I noticed actor is requesting a decommit with UTxO they don't own. -
Another problem ( and this is th original issue we started to look at) is can process transactions while decommit pending Here the problem is that we still have around the decommit transaction which was snapshotted but not observed (so it is still both in the new snapshot and the confirmed one) and we don't have a way of detecting if the next snapshot (L2 tx in this case) should ignore the pending decommit and just sign the snapshot.
-
I am tempted to add a field to
ReqSn
to detect if the reqest for snapshot is just L2 tx and in this case not check for decommit tx in the snapshot. -
Would this approach somehow influence also requests for snapshots that have some decommit tx? Not sure at this point but I don't think so.
-
So it is possible to have multiple snapshots with the same decommit tx in case there is a pending decommit but this decommit tx can't be altered until the decommit is settled. Only then you can request a new decommit and the local txs would still continue to work with the pending decommit.
- Browsed https://github.com/dao-xyz/peerbit/ a bit: A replicating
document storage framework that uses
libp2p
under the hood to discover peers and establish transport. - Peerbit applications seem to be structured around
Documents
that are replicated across peers. https://github.com/dao-xyz/peerbit/blob/master/packages/programs/data/document/document/src/program.ts#L89 - Eeach document has an and
index
and aSharedLog
that seems to concern itself with replication to "replicas". - Modifications on documents are
AppendOperations
replicated through the shared log https://github.com/dao-xyz/peerbit/blob/master/packages/programs/data/document/document/src/program.ts#L395 - More details on the sharding approach can be found here: https://peerbit.org/#/topics/sharding/sharding
- The shared log ultimately uses lbp2p pubsub to reach replicas and submit messages https://github.com/dao-xyz/peerbit/blob/5205167d4d1be4176741aa08458a6841f9b32c81/packages/programs/rpc/src/controller.ts#L284
- The
RemoteBlocks
abstraction suggests it does something similar like ouroboros BlockFetch https://github.com/dao-xyz/peerbit/tree/5205167d4d1be4176741aa08458a6841f9b32c81/packages/transport/blocks
- Fix I did to make this work enables us to not apply decommit transaction twice by looking at the previous confirmed snapshot.
- We may want to exclude
utxoToDecommit
on the next snapshot if this is the case. - I will draw transaction traces on the miro board to see possible scenarios
- There is another problem with the pr: by adding the
threadDelay
to make sure we observe transactions on the next block we broke some logging test. - Perhaps I can insert this thread delay only if the caller says so?
- Ended up wrapping the
withSimulatedChainAndNetwork
so it takes extra parameter that determines if there should be a delay when posting/observing a tx. - Other main issue is that we are not sure what are the consequences of
keeping/removing
utxoToDecommit
in the following snapshot/s. - It seems that we want to make sure that we do not sign snapshot if the
version is the same as the confirmed one and the
utxoToDecommit
is different. In this case we should reject thereqSn
request and in case where version andutxoToDecommit
are the same we just want to skip applying it. - There are tests already that help us reason about these changes and I'll focus next on trying to come up with a design for both spec and code that we care to maintain. Currently it feels fragile and it seems we can't predict the exact behaviour of decommit feature since we came a long way just to realize this problem right now.
- Looking at can only process one decommit at once in the
BehaviorSpec
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"serverOutput":{"decommitTxId":1,"headId":"31323334","tag":"DecommitFinalized"},"tag":"ClientEffect"},"effectId":0,"inputId":13,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effectId":0,"inputId":13,"tag":"EndEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"inputId":13,"tag":"EndInput"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"effect":{"serverOutput":{"decommitTxId":1,"headId":"31323334","tag":"DecommitFinalized"},"tag":"ClientEffect"},"effectId":0,"inputId":14,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"input":{"chainEvent":{"newChainState":{"slot":7},"observedTx":{"distributedOutputs":[42],"headId":"31323334","newVersion":0,"tag":"OnDecrementTx"},"tag":"Observation"},"tag":"ChainInput"},"inputId":14,"tag":"BeginInput"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"effectId":0,"inputId":14,"tag":"EndEffect"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"inputId":14,"tag":"EndInput"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"outcome":{"error":{"message":"decrement observed but no decommit pending","tag":"AssertionFailed"},"tag":"Error"},"tag":"LogicOutcome"}
- Seems like another issue we have is to somehow loose the local decommit tx before the observation happens?
- I see 4 observations of
OnDecommitTx
so on first occurrence we delete the local decommit tx but afterwards we experience the error? - Then afterwards we see a new
DecommitRequested
and aReqSn
:
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"effects":[{"serverOutput":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"headId":"31323334","tag":"DecommitRequested","utxoToDecommit":[22]},"tag":"ClientEffect"}],"stateChanges":[{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"newLocalUTxO":[],"tag":"DecommitRecorded"}],"tag":"Continue"},"tag":"LogicOutcome"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"serverOutput":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"headId":"31323334","tag":"DecommitRequested","utxoToDecommit":[22]},"tag":"ClientEffect"},"effectId":0,"inputId":15,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effectId":0,"inputId":15,"tag":"EndEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"message":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"snapshotNumber":2,"snapshotVersion":0,"tag":"ReqSn","transactionIds":[]},"tag":"NetworkEffect"},"effectId":1,"inputId":15,"tag":"BeginEffect"}
- And eventual failure because of not bumped snapshot version:
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"error":{"requirementFailure":{"lastSeenSv":0,"requestedSv":0,"tag":"ReqSvNumberInvalid"},"tag":"RequireFailed"},"tag":"Error"},"tag":"LogicOutcome"}
-
I start with a failing test case already where we try to decommit, wait to see the decommit approved and then do a new tx and expect it to be snapshotted correctly
-
Snapshot number 1 is confirmed with utxo = 2 and utxoToDecommit = 42
-
Then we see TxValid/ReqTx by both nodes after ReqSn one node can't apply next snapshot:
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"error":{"requirementFailure":{"error":{"reason":"cannot apply transaction"},"requestedSn":2,"tag":"SnapshotDoesNotApply","txid":1},"tag":"RequireFailed"},"tag":"Error"},"tag":"LogicOutcome"}
- So we have a utxoToDecommit in the next snapshot and it is not applicable since the utxoToDecommit was already removed from the confirmedUTxO but we also didn't observe the decommit tx.
- How would we know when the next snapshot comes if we already applied decommit tx or not?
- One idea I had is to check before trying to apply decommit tx and see if in the local confirmed snapshot we already have the same utxo and if that's the case then we would not try to apply the decommit transaction again.
- This works but we need to think if this strategy will work in all different scenarios we can have with L2 transactions while decommit tx is still kept locally/not observed.
-
Turns out the ledger rules are actually preventing empty inputs/outputs in some transaction. In case you don't provide any inputs you get
TxBodyEmptyTxIns
since it doesn't make sense to accept a transaction with empty inputs and in case there are no outputs theValueNotConservedUTxO
is thrown. -
I'll add one test that will assert this is true for decommit transaction and assert on the error so we can call this non-issue done.
- Exploring
hydra-doom
watch mode - Start by adding a command line flag to individually enable
-send-hydra
and-recv-hydra
- Looking around for command line arguments and found the -wss and -connect args from doom websockets.. should we integrate as a net_module too?
- The
doom-workers
repo is a nice, small, vanilla JS application to take inspiration from (good shortcuts likedisplay
) - When only watching with
-hydra-recv
I realized that the datum decoding was actually not working. - Disabling the demo makes it nice and idle if
?watch
is set - After figuring out encoding and decoding the redeemer, submitting
ticcmd_t
pieces this was quite easy. Now the question is.. how would we get a synchronized view/start of the game? How is this working in multiplayer?
- Everything looks good to me. Nice job, I am glad this is finally done.
- I introduced new snapshot type in our transaction traces https://miro.com/app/board/uXjVMA_OUlM=/?moveToWidget=3458764594997541131&cot=14
- There can be
Normal
,Inc
orDec
type of snapshots each signaling their purpose. Normal ones are reserved for new L2 transactions.Inc
snapshot includes some new UTxO into theHead
andDec
takes something back to L1. - For the spec changes I decided on the letter
$\kappa$ to represent this new snapshot type for no specific reason$\kapa \cup Normal \cup Inc \cup Dec$ - The tricky part is to explain all of the possible decisions we can make in close transaction based on this snapshot type and the type of a close redeemer.
- In the end it wasn't that bad to update the close transaction. I just had to specify 4 different close current/outdated with respect to
Inc
andDec
snapshot types and also update the figure to include the snapshot type$\kappa$ in the close redeemer.
- I updated the tx traces we added yesteday to depict the flow with the increment/decrement transactions
- I think we gained some clarity together with Franco
- Updated the drawing with the new snapshot type I propose and the close redeemer addition to specify this type
- Now close needs to distinguish what to do based on the snapshot type:
Normal, Inc or Dec
and the redeemer type:CloseCurrent, CloseOutdated, CloseInitial
- It is time to update the spec with these proposed changes
- Changes in spec mainly need to consist of explaining the new snapshot type where appropriate and use the new type in the increment/decrement transaction.
- Validators need to contain check for conditions I explained in the tx traces https://miro.com/app/board/uXjVMA_OUlM=/?moveToWidget=3458764595074776662&cot=14 which take into account the new snapshot type together with close redeemer type
- This snapshot type will be propagated through the close redeemer too
- Seems hard to find information on when the head was initialized on-chain?
- I used the slot/block when I published hydra scripts for that but how would you find it otherwise?
Answer: We could search for tx metadata. All hydra init transactions contain
HydraV1/InitTx
. - Do we need to expand on the documentation for these corner cases in order to provide more information to our users?
- After putting the close redeemer casing, I had also aligned the "outdated" case signature verification to use version - 1 -> this made tests fail!?
- Using
version
instead ofversion - 1
makes mutation tests green. This is wrong? - EndToEnd tests now also fail with "MustNotChangeVersion" -> this is connected!
- Fixing the version used on close to the last known open state version (
offChainVersion
), makes E2E pass - While the "can decommit utxo" was green right away, "can close with pending decommit utxo" was red first. After adding a delay of 2 secs after submitting the decrement it was green -> race condition in
hydra-node
when trying to close while decrement was not yet observed back. I think this is expected and we need to manage the UX here (e.g. try close again). - In any case, the pending commit E2E test is redundant.. it's not a strictly a pending decommit and the test is the same as the one above
- After mutation and e2e tests are fine, model tests still choke on new
MustNotChangeVersion
- Found an original bug (or current inconsistency?) in HeadLogic: version is incremented whenever the snapshot leader sees the decrement, but not to what the chain observation includes.. could be a race condition (model tests fail)
- When trying to tie the knot correctly in
onDecrementTx
, more and more abstract functions are needed. What is the most common denominator? - Testing correct observation of decrement is not easy as all our tests use "pure transactions" and only e2e tests spot misalignment on "real" transactions (with fees paid, change output etc.)
- Removing snapshot number check from decrement: should motivate not having number in open state datum. Running mutation tests, tx trace and model tests after dropping it.
- Only
UseDifferentSnapshotNumber
failed, and fails with a different reason:SignatureVerificationFailed
. - This is expected as the snapshot number to verify in
decrement
txs is still taken from the datum instead of the redeemer. Let's change that now. - Also an interesting
TxTrace
failure with a singleDecrement
of snapshot number0
. Was expected to fail and does not fail now (is this a problem?) - Sub-typing all datums and redeemers likely makes sense. At what cost?
- When fixing the TxTraces to not expect failure on decrement with snapshot 0, I run into another issue where
contestTx
is called with version0
or otherwise too old version, resulting in a "should not happen" case. - I realize there are pros and cons about
utxoToDecommit
anddeltaUTxOHash
being aMaybe UTxO
andMaybe Hash
respectively.
-
Maybe it can be worthwhile exploring the idea of how to trick the Head protocol into decommitting twice.
-
So we would ask for a decommit after which there would be no further snapshots.
-
Then we would close but the contestation should try to contest with the snapshot we used for decommit.
-
Would the fanout re-distribute already decommitted funds again?
-
Franco wrote an e2e test that wanted to send a decommit request, wait to see it approved but then close instead.
-
The test was failing because of
SignatureVerificationFailed
and it was happening because of the version missmatch. -
I fixed this by updating the version check:
- snapshotVersion = alreadyBumpedVersion - 1
+ snapshotVersion = max (version - 1) 0
-
This way we don't go below 0 which was causing the snapshot validation to fail.
-
This worked and got us to another interesting problem. Close transaction was failing with:
InsufficientCollateral (DeltaCoin 0) (Coin 2588312)``
-
This is interesting since the collateral input had around 21ADA available which should be more than enough.
-
I was looking at the cardano-node logs and what seems to happen is that the posted decommit tx got into the mempool just before the close one. The close tx was then failing with the error above but this doesn't explain the error I was seeing. What I would have expected is the close tx to fail with
BadInputsUTxO
. -
Our suspicion is that since the hydra-node has never observed the DecrementTx taking place on-chain, our local view of the wallet UTXO is invalid. This is likely causing the collateral error we are experiencing.
-
We decided to leave exploring this for now since it seemed like a race condition between the decommit and close tx but still I'd like to get to the bottom of it.
-
I wanted to prepare a small demo for the team and demonstrate that decommits are working as expected.
-
In the process of decommitting I had small problems submitting a decommit tx since the balancing is on user side so setting the fees is smallish annoyance.
-
After successfully decommitting (a single utxo of 10ADA) now the head is still open but there are no funds inside.
-
Posting a close yields:
An error happened while trying to post a transaction on-chain: FailedToPostTx {failureReason = "TxValidationErrorInCardanoMode (ShelleyTxValidationError ShelleyBasedEraBabbage (ApplyTxError (UtxowFailure (AlonzoInBabbageUtxowPredFailure (ExtraRedeemers [AlonzoSpending (AsIx {unAsIx = 0})])) :| [UtxowFailure (AlonzoInBabbageUtxowPredFailure (PPViewHashesDontMatch (SJust (SafeHash \"0e96eb70e55ac2cfa0b9d93aa5ef3fd57dff54de704d72b7d355896581e70a85\")) (SJust (SafeHash \"2862039cf16f77fc21cb3673a4cb6e9185636c9b6a3e519f3ec8416a4af469e5\")))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (ValueNotConservedUTxO (MaryValue (Coin 0) (MultiAsset (fromList []))) (MaryValue (Coin 9982912420) (MultiAsset ( fromList [(PolicyID {policyID = ScriptHash \"ff0fc67587997fc3330fb29dff6c388166baaf0d62f934882deee38d\"},fromList [(\"4879647261486561645631\",1),(\"f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d\",1)])])))))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash \"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2\"}) (TxIx 0),TxIn (TxId {unTxId = SafeHash \"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2\"}) (TxIx 2)])))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure NoCollateralInputs)),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (InsufficientCollateral (DeltaCoin 0) (Coin 2586728))))])))"}
-
So we get InsufficientCollateral errors and now I want to see why would hydra-node's internal wallet yield this error.
-
If there are no funds in the Head this should not be related to what hydra-node internal wallet owns.
-
Ok, the error above seems to be just the outcome of trying to Close multiple times without waiting for a close tx to be observed BUT fanout yields two UTxO with the same amount!
{
"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2#1": {
"address": "addr_test1vp5cxztpc6hep9ds7fjgmle3l225tk8ske3rmwr9adu0m6qchmx5z",
"datum": null,
"datumhash": null,
"inlineDatum": null,
"referenceScript": null,
"value": {
"lovelace": 9836259
}
},
"cddf3345a486396a8516418244c54161aab511efb99234580836114fa0f5b6cc#0": {
"address": "addr_test1vp5cxztpc6hep9ds7fjgmle3l225tk8ske3rmwr9adu0m6qchmx5z",
"datum": null,
"datumhash": null,
"inlineDatum": null,
"referenceScript": null,
"value": {
"lovelace": 9836259
}
}
}
which means we decommitted 98362259 lovelace, then closed with the same
ConfirmedSnapshot
which had the utxoToDecommit
field still populated and
that ended up creating another output with the same amount! A BUG!
-
We went through the decommit model for testing and determined we can't just randomly produce some utxo to snapshot. We need to maintain the utxo set of the head if we want to be able to decommit something but not alter the head value.
-
I did a spike yesterday and tried to evolve the head utxo set in the model but decided to drop it since I noticed even before this change there were some flaky tests.
-
Now I want to make sure everything is green all the time before I try to add decommits again.
-
I ended up removing the
Augmented
snapshot number constructor. We didn't account for them in thevalidFailingAction
and it was causing weird behavior in the presence of negative testing. Now all non-decrement actions are green and I need to come up with a way of evolving the head utxo and make sure there is some value to decrement before actually attempting to decrement (TxBodyOutputNegative
errors). -
I enabled
Decrement
actions again and made sure that the head utxo value contains the initial utxo value at the start. Then, when decrementing I make sure to pick some utxo from thespendableUTxO
and decrement it. This way all is green but we also need to make sure to test actions with various snapshots. -
This is not so straight forward to do and I need to think about the easiest way to do this.
-
Perhaps keep the seen snapshots in the model and occasionally pick one of the old ones?
-
Are the negative actions we produce enough to say the decrement is working properly?
-
Seems like everything is green just because we
adjustUTxO
and not really remove anything from the head utxo. -
I want to add a check to make sure the value in the Head is reduced right after a decrement.
- Lots of pain with emscripten and typescript, but forgot to keep track of log items... anyhow:
- Websocket.addEventListeners were leaking heavily, with removeEventListener things seem fine. I wonder though, is there a way to re-use "subscription infra" on the frontend and put events into a queue of some sorts?
- When running the scenario for a while, the log of the hydra-node grow heavily (should control log density). So I removed the file from disk (the file descriptor likely was still existing) .. and perceived performance got a bump!?
-
Last thing I remember is that we experienced
CannotFindHeadOutput
errors sometimes when running theTxTraceSpec
tests. There is some flakyness here but most of the time the tests are green which is not something I would expect. We wanted to see failures in close/fanout related to Uw (U omega) addition/checks. -
Sprinkled some tracing in the close/contest/fanout tx creation to see what kind of UTxO is there and detect why Head output is not found.
"Close: 9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace\n52f744b6ac#71 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14" "Contest: 1d3367a1b0#0 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14\n9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace" "Fanout: f5110cd941#0 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14\n9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace"
-
From the traces above I don't see clearly the head output with some PT's
-
I disabled the
Decommit
action generation and the same problem still exists- so this is not tied to the changes I added for decommit action. I'll try to go back few commits to assert tests were green before decommit snapshot addition.
-
Going back just leads to missing decommit utxo in the snapshot errors.
-
Tracing in the
State
module suprisingly reveals the head UTxO is actually there??? -
Ok, removing shrinking with
qc-max-shrinks=0
reveals the problem again.
Assertion failed (after 4 tests):
do action $ Close {actor = Alice, snapshot = Augmented 3}
failingAction $ Fanout {snapshot = Normal 3}
action $ Fanout {snapshot = Augmented 3}
pure ()
Model {headState = Closed, latestSnapshot = Augmented 3, alreadyContested = []}
Fanout {snapshot = Augmented 3}
Failed to construct transaction: CannotFindHeadOutputToFanout
-
Seems like we always seem to produce something to decrement (decommitUTxO in the snapshot can be
Just mempty
) which doesn't make sense (and there was a comment in the PR related to semantics ofJust mempty
vsNothing
). Let's fix this first since I think it is tied to missing Head utxo problem above. -
After directly producing the snapshot with something to decrement we get the problem of negative output
TxBodyOutputNegative
which is expected. We just generate snapshots without taking care the actual UTxO is part of the Head UTxO. -
I think the next step is to make sure whatever we decrement needs to be part of the Head UTxO.
-
After this change I see
H09
orHeadValueIsNotPreserved
error in the decrement tx. -
This means that the head input value does not match the head out value since we added decommit outputs.
-
If I update the UTxO state in the run model with the new UTxO containing the decommit outputs then I get
H24
which isFannedOutUtxoHashNotEqualToClosedUtxoHash
. -
This points to the fact that we don't do a good job of maintaining the UTxO set between close/fanout in our specs. But we wanted to keep it minimal so that was fine so far, maybe it is time to add more UTxO modelling in our specs.
-
We seem to think that the cardano-cli is using the TextEnvelope transaction format. At least this is what we expressed in one of our tests (accepts transactions produced via cardano-cli).
-
This test was generating an arbitrary transaction and then serialise it to the text envelope format and check if we can parse it back.
-
This test now fails because the tx type we expect is not correct. The transaction type depends on the fact that tx is signed or not ("Un/Witnessed Tx BabbageEra") but the
serialiseToTextEnvelope
produces just "Tx BabbageEra" which makes our test red. -
I tried assembling a tx using cardano-cli and noticed that what we get is the "Ledger CDDL" format instead and proper string for the tx type ("Un/Witnessed Tx BabbageEra").
-
So it seems there is some discrepancy between what we expect and what we actually get as the cardano-cli tx format.
-
Progress. I can see the blockfrost submit error with the instructions provided by MLabs.
-
Seems like the correct hash for the commit tx is detected (
12e5af821e4510d6651e57185b4dae19e8bd72bf90a8c1e6cd053606cbc46514
) but for whatever reason it doesn't hash to61f765d1abd17495366a286ab2da9dbcb95d34729928f0f2117f1e5ad9f9e57b
which was expected hash.
-
Last thing I did in the new model tests for decommits is to add the actual UTxO to decommit to our modelling of the Snapshots.
-
This led to H35/34 validator errors which means the Snapshot signature is invalid and in turn it means whatever outputs wanted to decommit are not complete in the Snapshot and/or they are not properly signed.
-
Tracing the produced Snapshot with something to decommit reveals that decommit output is actually present in the decommit transaction. Does this mean the snapshot is not properly signed?
-
Indeed this is the case. We create a snapshot and the signatures but don't set
utxoToDecommit
. Afterwards we set theutxoToDecommit
before callingdecrement
function but the signatures are invalid at that point. -
After re-signing the proper snapshot with decommit output we go further and the model spec fails with invalid Decrement action which is expected to fail.
-
Turns out the problem was that I was removing the decommit outputs from the utxo set and this is not what we want to do since the snapshot utxo is just generated and was never actually committed.
-
Now we come to a problem where head output was not found for contest/fanout txs.
-
Our users MLabs are having issues with the commit tx produced by hydra-node. The issue is that the auxiliary data hash is not correct.
-
Our current tests are all green and we explored the wallet code in order to figure out at which point the auxiliary data hash becomes invalid but with no success.
-
There are both property tests and the tests that actually complete the whole workflow and return the signed commit tx to the user but nothing reveals the issue.
-
The last attemp was to recalculate the aux hash in the conversion from/to ledger tx but this turned out to not be a proper fix.
-
After this I tried to assert the following for different combinations of:
- Aux data and hash is valid when create tx using
commitTx
function againstdraftCommitTx_
- Aux data and hash is valid when create tx using
draftCommitTx_
function against signedcommitTx
tx - Aux data and hash is valid when we encode/decode tx to make sure that the aux data is not lost in the process and all the tests are green ...
- Aux data and hash is valid when create tx using
-
Is this somehow blockfrost specific? I know MLabs are using blockfrost to submit this transaction and so maybe this is worth exploring.
-
They have the workaround by just recalculating the aux hash since they have a way to re-sign the hydra transaction but this is not a good solution.
-
I think further steps are to try to reproduce the issue together with MLabs and see if we can find the root cause.
-
When working on reproducing the contest bug we found out that our mock chain layer is not validating transactions properly.
-
When calling the
evaluateTx
we getEither EvaluationError EvaluationReport
but we never check ifEvaluationReport
contains someScriptExecutionError
-
Another problem we found is that
addNewBlockToChain
updates the new slot and block but keeps theUTxO
same. -
We added check when trying to
submitTx
to see if the tx we are trying to submit is actually valid, but, with the changes above we observeH03
error which isReimbursedOutputsDontMatch
. -
We error out in the
checkAbort
with this error in case the hash of the committed UTxO doesn't match with our hashed outputs in the abort. -
So since the error points that there is perhaps something wrong in the way we simulate commits I added check in the
performCommit
to make sure when we see theCommitted
output, the UTxO inside matches with what we actually committed. I think we were observing other node's commits and that led toH03
we were seeing. -
Now we get a completely different failure (seed is
590033147
):
TransactionInvalid (TransactionValidityTranslationError (TranslationLogicMissingInput (TxIn (TxId {unTxId = SafeHash "2c7ce242a2cd37f5d53e0c733985eb1430a2cbb52ade7e2d95225f0be578e64c"}) (TxIx 5))))
-
The only thing is, when printing a tx I actually see the input
2c7ce242a2cd37f5d53e0c733985eb1430a2cbb52ade7e2d95225f0be578e64c
with Ix 5 so WAT is happening. -
I see that we need to invoke
toApi
on our utxo in theevaluateTransactionExecutionUnits
and this is justcoerce
so it shouldn't matter but I added roundtrip tests anyway just to regain some sanity.
- Worked on the model and some back and forth, succeeded to issue close
transactions of the
initialUTxO
available in the mock chain (surprised that it is kept around and made available asfutureCommits
?) - The model did not fail and wee seemingly can contest and fanout without problems. Let's look back at the logs of the bug issue
- State changed events: Both nodes see the
HeadClosed
at slot38911656
with deadline2024-01-18T08:48:47Z
- After four blocks with
TickObserved
, theHeadIsReadyToFanout
- Node 1 stores a
ChainRolledBack
state changed event, seemingly to slot38911656
(which would be exactly the point where the head closed) - Node 2 has two rollback events, one to slot
38911660
(after close) and then to38911656
(at close) - In the logs, node 1 observes at
2024-01-18T08:47:36.188861Z
in slot38911656
aOnCloseTx
, at2024-01-18T08:47:40.410448Z
in slot38911660
aOnContestTx
with snapshot1
, and at2024-01-18T09:09:22.762856Z
in slot38911660
anotherOnContestTx
of snapshot1
(Arnaud mentioned he tried to wipe the state, re-observe and try fanout again) - Node 1 tried to post twice the
CloseTx
, first on2024-01-18T08:47:04.257801Z
with success producing tx94a7f0b86ee448dedf3667114f968dd825055bee4c90379a4659d8563b2e7d4a
, the second attempt at2024-01-18T08:47:28.565181Z
failed withTxValidationErrorInMode (ShelleyTxValidationError ShelleyBasedEraBabbage (ApplyTxError [UtxowFailure (AlonzoInBabbageUtxowPredFailure (ExtraRedeemers [RdmrPtr Spend 0])),UtxowFailure (AlonzoInBabbageUtxowPredFailure (PPViewHashesDontMatch (SJust (SafeHash \"1119bb152dc2c11c54389752f0d6c8398912582a73d2517570e9898c1818bb3e\")) (SJust (SafeHash \"7bf7ccdce09da7e95d17ed5faf048b52f3d77a9c96928dd7656ad4e1647059b1\")))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (ValueNotConservedUTxO (MaryValue 0 (MultiAsset (fromList []))) (MaryValue 89629254 (MultiAsset (fromList [(PolicyID {policyID = ScriptHash \"1f8f4076f72a0427ebe14952eb0709c046a49083baef749920cf6bfa\"},fromList [(\"1ad7cb51c9e2d6250bd80395a5c920f6f628adc4b1bd057a81c9be98\",1),(\"dc9f50cac6e292e1aeaa438c647010f7f6f4fc85010092ff6cb21cf3\",1)]),(PolicyID {policyID = ScriptHash \"3ee6157268c7a3b7d21111b6772bc06aed0f425ef64b285223b308fc\"},fromList [(\"2d2da703f0e1fb682f997f121e3f90084d12c11203dc8c1314da16d3\",1),(\"4879647261486561645631\",1),(\"c312067b108b2ddd1a50ad1e757720570100fad3f14d6f88fd64ed42\",1)])])))))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash \"3b6ded925ce15cffc3f47ebee62242e75bb57b63d7dad943397ebe55112c7094\"}) (TxIx 0),TxIn (TxId {unTxId = SafeHash \"b3704d8d7d7067c4b6910e4dd84b2074ffa4713027c14d81f18928ae3700d40d\"}) (TxIx 1)])))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure NoCollateralInputs)),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (InsufficientCollateral (Coin 0) (Coin 5293200))))]))
- Node 1 did post a
ContestTx
at2024-01-18T08:47:36.189167Z
with id495878f0639aa2aed384af4b72bcdea058b339e6fd5fc6f63b5e7238d7297baf
- Then node 1 did post a
FanoutTx
at2024-01-18T08:49:18.378833Z
which failed validation with error codeH25
- All further attempts to fanout from node 1 failed for the same reason (2024-01-18T08:50:07.49598Z, 2024-01-18T08:58:38.158696Z, 2024-01-18T09:03:42.313769Z, 2024-01-18T09:17:42.869217Z)
- Node 2 did close the head with the initial snapshot at
2024-01-18T08:46:49.728882Z
with tx id5a4ed22b4e8137118eff07f26b366e031d880e66ebf11b3fafa130b50fc95441
on slot38911656
- All fanout attempts of node 2 failed with error code
H24
, on2024-01-18T08:49:18.323083Z
,2024-01-18T09:04:42.427032Z
and2024-01-18T09:18:44.892685Z
- Node 2 observed
OnCloseTx
at2024-01-18T08:47:36.18Z
on slot38911656
,OnContestTx
at2024-01-18T08:47:40.478408Z
on slot38911660
and the sameContestTx
again at2024-01-18T09:09:26.563872Z
- Unclear what actually caused it
- Situation: Tx requestor did not sign snapshot, the others (including snapshot leader) did
- Consequence: No re-requesting of snapshot, original requestor can't do anything now?
- Can we test this?
- Could / should simulate this in tests (e.g. by deliberately using wrong hydra keys in BehaviorSpec)
- Manipulating
state
is hard, especially as we also haveacks
andnetwork-messages
which need to be consistent- Why are we storing non-confirmed data? e.g. things that aggregate into
localUTxO
andlocalTxs
- Why are we storing non-confirmed data? e.g. things that aggregate into
- Coordinated reset https://github.com/input-output-hk/hydra/issues/1284?
- Not loading in-flight information from persistence would have the same effect
- Gossip of signed snapshots is related to this as well (and would not require persisting network messages?)
- Use the chain to check point a snapshot to coordinatedly (safe) reset point?
- Multiple
ReqSn
for same snapshot (upon new transactions "in the mempool"?)- Can't recover from non-consensus
- Remove
localUTxO
to at least void "inconsistent" local views and instead do that only upon Snapshot request / validation? - This whole thing feels super ad-hoc
- Concrete next step: Sasha pays for closing and initiates a new head
-
Tried removing the commit witness in order to test out if the produced cbor is compliant with the HW/CIP-21 but we get
ScriptFailedInWallet
when trying to cover the fee using internal wallet. -
Not sure if I really want to pursue this further but at the same time it is interesting to see the needed changes.
-
I might just do this in my free time. One option is to make the user pay for the fees instead of hydra-node wallet.
-
Trying to figure out how to produce commit tx from hydra-node that is compatible with HW.
-
There is no option to import the tx into trezor-suite and simply test that it works.
-
Pasting here the answer from the Eternl guys just for the reference:
The issue comes down to the body map ordering like I said in the beginning of
this thread. I then said that it was a requirement to have all maps sorted
canonically according to CIP21. This is the issue we face with Eternl as well.
Eternl parses the input cbor, extracts the body cbor raw as is, ie as its
sorted in original cbor.
This raw body cbor is hashed and signed.
Eternl then needs to re-construct a final tx, for this, CSL is used to
de-serialize the tx, inject the witness Eternl creates and then outputs the new
tx cbor.
But when this is done, the body map is sorted in canonical order and thus the
body doesn't match the original one any more.
So the witness created by Etenl is correct, but the tx cbor returned contain a
different structure for the body
Possible solutions
- Make sure that tx created follow CIP-21
- After creating hydra tx, pass it through cardano-hw-cli transaction transform
to reconstruct body to follow CIP-21. Must be done before any witnesses are
added.
- Change Eternl to put tx back together using the original body cbor and not CSL parsed body.
-
cardano-hw-cli
tool is not really acceptable to us since we should be able to figure out how to produce appropriate tx using cardano-api. Seems like what we produce is just not compatible with CIP-21.
-
Continuing work on incremental decommits I wanted to add the off-chain observation code.
-
Looking at contest tx which should be similar to decrement in a sense that it consumes Head output and produces another one.
-
I added the observation code without too much problems but need to revisit to make sure I didn't skip some expected checks.
-
Now our test fails because of
NotEnoughFuel
which makes sense since we are actually setting fees to zero inmkSimpleTx
. -
Increasing the amount of seeded lovelace for hydra internal walled fixes this issue and now the only thing that fails is the last check in the test where we want to assert that the UTxO we just created is adopted and can be queried from cardano-node.