Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsynced DAO state forcing to restart Bisq in order to fix sync #6073

Closed
w0000000t opened this issue Feb 23, 2022 · 8 comments
Closed

Unsynced DAO state forcing to restart Bisq in order to fix sync #6073

w0000000t opened this issue Feb 23, 2022 · 8 comments

Comments

@w0000000t
Copy link
Contributor

w0000000t commented Feb 23, 2022

As per today's support weekly call the issue of "untakeable BSQ swap offers" reported by @jmacxx was further discussed. He explained how there is a limit to the amount of DAO data that peers are "allowed" to download, and swaps introduce a significant data traffic increase, that often will lead to a peer not downloading all the available data, thus resulting out of consensus. This, in turn, will prevent a swap maker to accept takers, or a swap taker to be accepted by makers.
The solution would be to resync DAO state, and what was before an "annoying popup" that got disabled, is being reinstated to this purpose, with the risk of resuming the flow of support requests regarding the aforementioned popup.

My take on this issue, from my understanding, is that the restart of the application will allow Bisq to "resume" the download of the additional data missing from the previous sync attempt, thus restoring the sync state; when the missing data is significant, more consecutive restarts will be prompted by the popup until sync is finally achieved.
Regarding this, why is it needed for the application to be restarted? Is it not possible to have Bisq, on one side, limiting the amount of downloaded data in one go, and on the other side, resume at a later moment (after N minutes, for example) downloading the next batch of allowed data, until sync is finally complete, in a transparent way for the user, and especially not requiring a restart?

As an alternative, if the above is not technically feasible, it would be a nice addition to make the "annoying unsynced DAO popup" be manageable by the user.
For example, briefly explain what happened, that it might affect the user's ability to participate in swaps (and if this is needed, it's necessary to restart Bisq until sync is successful), and that you can ignore the error if you are not interested in swaps; additionally, have a checkbox to "not show this again".

@ghost
Copy link

ghost commented Feb 23, 2022

The significant data increase of account age witness records was a bug, fixed by #5974. So that may have settled down by now, and it could be that we're back to the previous "normal" rate of consensus errors.

The reminder popup was coded to show a maximum of once per bisq session; if we put a "do not show again" option, it would defeat the purpose which is to get all nodes in consensus. I agree though, wording could be made nicer to explain cause and effect as was done in a related way here: #6063

The other questions @chimp1984 is best suited to answer.

@chimp1984
Copy link
Contributor

Agree with @jmacxx
The data limitation issues should be fixed by now. In the logs it can be seen if that is an issue, but I highly doubt.
I think there is a bug in the snapshot handling and/or peristence for DAO data. With separating DAO blocks and DaoState we introduced more risk that DAO data gets out of sync (before it was all in the DaoState, thus lower risk but it became a scalability problem).

The bug is not trivial to find. I would suggest to add lot of logs into that code area and hopefully it will help to reveal where the issue gets caused. A very critical code review about all that code paths might help as well.
I am offline the next days....

@ghost
Copy link

ghost commented Mar 4, 2022

[edit - removed most of this status report because some of the results turned out to be caused accidentally as a side effect of some diagnostic code I had inserted in an attempt to observe some statuses. ]

"DAO state chain not connecting with the new data" (turned out to be a false alarm)

Other observations:

  • If you compare a known good hashchain against peers going back further you see a variable pattern of several hundred matching - not matching - matching hashes. The not matching parts are always ones where the peer self-generated the hash, and the matching parts are ones where the hash was from seednode.

Still investigating.

@ghost ghost mentioned this issue Mar 4, 2022
@ghost
Copy link

ghost commented Mar 12, 2022

One reproducable error came to light from a user experiencing an issue in the support channel. Installing bisq as a new user about 6 weeks after the latest release will consistently produce a DAO state which is out of consensus. Steps to reproduce:

  • Delete the local/share/Bisq directory (i.e. starting with a completely empty data directory).
  • Run the Bisq release from the previous month (at the time of writing, v1.8.2).
  • 6000 BsqBlocks are received from the seednode advancing the height from 719240 to 725240.
  • No subsequent request for BsqBlocks is made, and the chain remains un-synced indefinitely.

This is due to L237 in LiteNode.java which does not request subsequent blocks if the BitcoinJ chain is still downloading.


If you allow the BitcoinJ chain to complete its sync and then perform the same test a different error presents itself:

  • With a fully synced wallet, stop Bisq and delete the Bisq/btc_mainnet/db directory.
  • Run the Bisq release from the previous month (at the time of writing, v1.8.2).
  • 6000 BsqBlocks are received from the seednode advancing the height from 719240 to 725240.
  • The remainder of the BsqBlocks are requested and received advancing the height to current.
  • Lots of red warnings appear in the log when processing the BsqBlocks.
  • The DAO network status indicates it is out of sync with seednodes and needs to be rebuilt.

The errors seem to indicate that the BlindVoteStore data file received from the seednode is missing data.

We have a blindVoteTx but we do not have the corresponding blindVote payload
We could not find a list which matches the majority so we cannot calculate the vote result. Please restart and resync the DAO state.

I think this may be that the seednode had to truncate the GetDataResponse payload due to too many AccountAgeWitness and/or BlindVotePayload records.

1552038 bytes : BlindVoteStore size first time
1595935 bytes : BlindVoteStore size second and subsequent times.

The same error can be produced without deleting the whole data directory, just AccountAgeWitness and BlindVotePayload.

@ripcurlx
Copy link
Contributor

One reproducable error came to light from a user experiencing an issue in the support channel. Installing bisq as a new user about 6 weeks after the latest release will consistently produce a DAO state which is out of consensus. Steps to reproduce:

  • Delete the local/share/Bisq directory (i.e. starting with a completely empty data directory).
  • Run the Bisq release from the previous month (at the time of writing, v1.8.2).
  • 6000 BsqBlocks are received from the seednode advancing the height from 719240 to 725240.
  • No subsequent request for BsqBlocks is made, and the chain remains un-synced indefinitely.

This is due to L237 in LiteNode.java which does not request subsequent blocks if the BitcoinJ chain is still downloading.

If you allow the BitcoinJ chain to complete its sync and then perform the same test a different error presents itself:

  • With a fully synced wallet, stop Bisq and delete the Bisq/btc_mainnet/db directory.
  • Run the Bisq release from the previous month (at the time of writing, v1.8.2).
  • 6000 BsqBlocks are received from the seednode advancing the height from 719240 to 725240.
  • The remainder of the BsqBlocks are requested and received advancing the height to current.
  • Lots of red warnings appear in the log when processing the BsqBlocks.
  • The DAO network status indicates it is out of sync with seednodes and needs to be rebuilt.

The errors seem to indicate that the BlindVoteStore data file received from the seednode is missing data.

We have a blindVoteTx but we do not have the corresponding blindVote payload
We could not find a list which matches the majority so we cannot calculate the vote result. Please restart and resync the DAO state.

I think this may be that the seednode had to truncate the GetDataResponse payload due to too many AccountAgeWitness and/or BlindVotePayload records.

1552038 bytes : BlindVoteStore size first time 1595935 bytes : BlindVoteStore size second and subsequent times.

The same error can be produced without deleting the whole data directory, just AccountAgeWitness and BlindVotePayload.

And it doesn't matter how often you restart the node to get everything in-sync?

@ghost
Copy link

ghost commented Mar 15, 2022

And it doesn't matter how often you restart the node to get everything in-sync?

In the tests I've done so far, it has not gone back into sync after restarting, only when the user explicitly rebuilds the DAO state. I tried waiting 30 blocks in case the snapshot process somehow would fix it, but it did not. The DAO state of the v1.8.2 loaded as in the example above is missing 4 param changes and 675 spent BSQ tx.

About 33% of all network nodes currently have their most recent DAO hashes not in consensus.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed because of inactivity. Feel free to reopen it if you think it is still relevant.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants