Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-org safety: watch asset transactions and re-create proofs when necessary #419

Merged
merged 11 commits into from
Aug 17, 2023

Conversation

guggero
Copy link
Member

@guggero guggero commented Jul 26, 2023

Depends on #424, the first 5 commits aren't part of this PR anymore.

Fixes #317.

The PR is fully ready for review!

TODOs:

  • Add re-org watcher to custodian to watch received assets
  • Add re-org watcher to freighter to watch sent assets
  • Allow minting proofs to be updated in the universe/federation after re-org
  • Integration tests for above mentioned use cases
  • Unit test re-org watcher

This change is Reviewable

Copy link
Contributor

@ffranr ffranr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished reviewing, looks good so far. I think you could roll the first two commits into a PR and get those merged as is. I'll try out the new review tool with this PR.

@guggero
Copy link
Member Author

guggero commented Jul 27, 2023

I haven't finished reviewing, looks good so far. I think you could roll the first two commits into a PR and get those merged as is. I'll try out the new review tool with this PR.

Good idea, done: #424

@guggero guggero force-pushed the reorg-safe branch 2 times, most recently from 027128f to 10c4486 Compare July 28, 2023 15:39
@guggero guggero marked this pull request as ready for review July 28, 2023 15:40
@guggero guggero force-pushed the reorg-safe branch 2 times, most recently from d53d81e to 4118058 Compare July 28, 2023 16:58
tapgarden/interface.go Show resolved Hide resolved
tapcfg/config.go Show resolved Hide resolved
tapgarden/re-org_watcher.go Show resolved Hide resolved
tapgarden/re-org_watcher.go Show resolved Hide resolved
tapgarden/re-org_watcher.go Outdated Show resolved Hide resolved
proof/archive.go Outdated Show resolved Hide resolved
tapgarden/planter.go Outdated Show resolved Hide resolved
tapgarden/custodian.go Outdated Show resolved Hide resolved
tapdb/universe.go Outdated Show resolved Hide resolved
universe/interface.go Outdated Show resolved Hide resolved
@dstadulis dstadulis added the v0.3 label Jul 31, 2023
tapgarden/caretaker.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@jharveyb jharveyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The callback in each subsystem pattern is really neat.

I don't think I saw rebroadcast logic for the actual anchor TX after a reorg, is that handled by the backing LND or did I just miss it?

Copy link
Contributor

@ffranr ffranr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I've finished reviewing the whole PR. I couldn't see any further issues that haven't already been commented on.

The only dynamics that concern me are those that relate to multiple actions (mint, send, recv) occurring in close succession. But I think that this PR in its current state is still a big improvement and should probably be merged regardless.

@guggero
Copy link
Member Author

guggero commented Aug 4, 2023

I've updated the proof watching logic and also added a new itest to verify the new behavior.

Unfortunately the way we store proofs is a bit weird, so this requires a bit of an explanation.
Assume we have a chain of proofs (A mints, sends full amount to itself, then sends some to B, receiving change) of an asset with ID xyz:
ScriptKey_A_1 -> ScriptKey_A_2 -> ScriptKey_A_3/ScriptKey_B_3 where ScriptKey_x stands for a script key belonging to node x and _y stands for the block where that proof was confirmed in.

If node A minted (ScriptKey_A_1) the asset and transferred it to itself (ScriptKey_A_2), then sent part of it it out to node B (ScriptKey_B_3 and ScriptKey_A_3 for the change), then the two nodes have the following proofs on disk (and in the DB):

node A:
  proofs/xyz/ScriptKey_A_1.assetproof (num_proofs=1)
  proofs/xyz/ScriptKey_A_2.assetproof (num_proofs=2)
  proofs/xyz/ScriptKey_A_3.assetproof (num_proofs=3)

node B:
  proofs/xyz/ScriptKey_B_3.assetproof (num_proofs=3)

Now further assume that both blocks 2 and 3 are not yet fully confirmed (<6 confs) and a re-org happens that removes both anchor transactions out of the chain, then we need to update all proofs for block 2 and 3.

So for block 2 we would need to update the following files:

node A:
  proofs/xyz/ScriptKey_A_2.assetproof (replace the last proof)
  proofs/xyz/ScriptKey_A_3.assetproof (replace the second-to-last proof)

node B:
  proofs/xyz/ScriptKey_B_3.assetproof (replace the second-to-last proof)

And for block 3 we would need to update the following files:

node A:
  proofs/xyz/ScriptKey_A_3.assetproof (replace last proof)

node B:
  proofs/xyz/ScriptKey_B_3.assetproof (replace last proof)

Now the issue is that in case both anchor transactions for state 2 and 3 confirm in the same block now, we don't know in what order the confirmation notifications come in. So however we order things, the proof files where we need to update more than one proof will be temporarily invalid until we have updated both proofs.

Currently we do this by just using a dummy header verifier (see TODO in ReplaceProofInBlob).

I think this is fine for now as eventually the proofs will be valid again once all confirmation notifications were acted upon. I don't think it's worth attempting to do a full dependency tracking and resolution mechanism to address this (it would require the proof watcher to be refactored quite a bit).

Curious to hear your thoughts though.

@guggero guggero force-pushed the reorg-safe branch 2 times, most recently from 15e2131 to 5ab7cc1 Compare August 4, 2023 13:36
@guggero
Copy link
Member Author

guggero commented Aug 9, 2023

@jharveyb

I'm a bit concerned about reading all proofs on startup, but AFAICT that will come from the DB and not files on disk?

The proofs are queried from both the file system and the DB and the first one that has a certain proof returns it. It is a bit unfortunate that we have to do this. Maybe if this becomes a bottleneck we could also store the proof's block height in the DB so we can filter more efficiently?

Would be cool to also test the wallet re-broadcast in this situation, but not needed in this PR.

I checked again, and because we're using the btcwallet's TX publish mechanism, each transaction we create in tapd is automatically re-broadcast by the wallet (now on every new incoming block I think). Since that's part of lnd's responsibility, I don't think we need to add an extra test in this repo.

Do we need to check that these blocks are actually connected?

That's also part of lnd's validation logic. All re-orgs are validated by the chain backend first before lnd sends out notifications. So again, I don't think we need to check this in tapd.

@ffranr

Are the universe stats updated if a issuence is re-orged?

Yes, once the issuer detects a re-org, it will update the universe(s) it initially sent the issuance proof to. And the universe will do an upsertAssetGen() which includes the block height of the new proof.

Also, do we restrict universe syncing to proofs which are at adequate depth?

No. But what I added is logic in the proof watcher to address this:
Let's say an asset is minted by Alice in block 100, Bob syncs his daemon with Alice's universe, a re-org happens. Alice will update the universe and let it know the proof changed. But Bob doesn't know that, and doesn't really care. BUT, Bob now receives an asset from Alice with the updated issuance proof, which is valid. And the next time Bob syncs with the Universe, he notices the roots have changed and re-syncs the proof as well.
So that shouldn't be an issue, really (can always manually re-sync to the universe if necessary).

Copy link
Collaborator

@jharveyb jharveyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @ffranr, @guggero, @positiveblue, and @Roasbeef)

@ guggero thanks for double-checking on the btcwallet and lnd behavior in this situation!

Re: proof querying, yes tagging proofs with block height sounds helpful, since the current locator scheme provides no sense of time / ordering.

Also, for the archiver proof reading, would the file backend terminate early if the DB returns first, or complete with the result dropped? Wondering if slow proof reads would slow down DB reads for the same proof or set of proofs (can we short-circuit proof reads, essentially). No changes needed in this PR.

@guggero
Copy link
Member Author

guggero commented Aug 10, 2023

Also, for the archiver proof reading, would the file backend terminate early if the DB returns first, or complete with the result dropped? Wondering if slow proof reads would slow down DB reads for the same proof or set of proofs (can we short-circuit proof reads, essentially). No changes needed in this PR.

Reading a proof goes through each backend sequentially and the first that returns a non-error is used and the rest is then skipped. Since we add the DB first, then the file archive, the DB will always be asked first. If we feel like that might slow things down, we could perhaps switch the order of the backends. But that might have other impacts, so I'd do it in a separate PR.

@guggero
Copy link
Member Author

guggero commented Aug 10, 2023

The itest ran into:

test_harness.go:142: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: received confirmation for unknown anchor TX 06d6e9abd34ef40a31c7b660e6b6112b44a3b1332a9b62ad4f40f0ff0d272a2b

Will take a look.

@guggero
Copy link
Member Author

guggero commented Aug 10, 2023

The itest ran into:

test_harness.go:142: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: received confirmation for unknown anchor TX 06d6e9abd34ef40a31c7b660e6b6112b44a3b1332a9b62ad4f40f0ff0d272a2b

Will take a look.

Okay, looks like this was a flake, wasn't able to reproduce locally. But I removed the error and turned it into a log statement since this might happen in some edge cases around restart and when a block and the conf notification come in at the same time (if we process the block update first and the conf notification afterwards).

PR should be good to go now!

proof/file.go Outdated Show resolved Hide resolved
proof/file.go Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
return
}

err = w.MaybeWatch(f, w.DefaultUpdateCallback())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the earlier suggestion, if we only pass the items that actually are in "danger" of a re-org, then we wouldn't need this check. We'd also be able to avoid reading all the proof data above here as well (for all the open assets that is).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even with the optimized query where we only fetch the assets not yet buried, we'll want to inspect the full proof file as there might be ancestor transactions that haven't fully confirmed yet.

"of the chain, will update proof "+
"with next confirmation", txHash)

// We continue to watch the transaction until
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh ok I think that's the behavior I missed before: that we'll get another confirmation notification in the event of a re-org. So given that, why do we even need to reorg channel in the first place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, just for logging, which was helpful for debugging the itest. In the future, we could react to it.

@Roasbeef
Copy link
Member

Needs a rebase after the utxo locking PR was merged!

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🏒

@guggero guggero added this pull request to the merge queue Aug 17, 2023
Merged via the queue into main with commit 6412264 Aug 17, 2023
13 checks passed
@guggero guggero deleted the reorg-safe branch August 17, 2023 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

Ensure daemon is robust to chain re-orgs
5 participants