Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node loses consensus, but does not regain it #1700

Open
lynnbendixsen opened this issue Sep 28, 2021 · 4 comments
Open

Node loses consensus, but does not regain it #1700

lynnbendixsen opened this issue Sep 28, 2021 · 4 comments

Comments

@lynnbendixsen
Copy link
Contributor

lynnbendixsen commented Sep 28, 2021

Note: Summary might need changed to add "for a very long time". I just saw a case that might be similar where it took 3 hours to regain consensus. (Will include more logs from Uphold if/when they arrive)

Environment:
Indicio Indy Networks running indy-node version 1.12.4

Steps to replicate: (the following general behavior was seen in log files for 2 nodes that lost consensus on 2 different networks, a week apart.)

  1. Node loses connectivity to 3f+1 nodes (4 or 5 nodes, on Indicio networks) for 30 seconds or more.
  2. Nodes reconnect within 5 minutes.
  3. Node reports "Out of Consensus" and sends request for VIEW change every 5 minutes.
  4. Node does not return to consensus within 45 minutes (in anonyome node example), so node was restarted, at which point it did return to consensus and regained normal operation.

Expected Behavior: Node returns to consensus quickly after conditions which caused it to go out are restored.

Actual Behavior: Node did not return to consensus for a long time (at least 45 minutes in one case).

Notes: In Anonyome log, start at about 2021-09-17 15:32:53 to see above behaviour. OOC messages (reporting) started at about 2021-09-17 15:41:00 plus or minus 1 minute.
In Opsnode-dn log, start at about 2021-09-10 01:59:15, and OOC occurred at about 2021-09-10 02:23:00.
Uphold log does not display a similar pattern(only disconnected from one node, the primary, at 15:24:15), but it went out of consensus at about 2021-9-28 15:45:00 and returned to consensus about 3 hours later at about 18:43:00. This one is possibly unrelated, but also took a very long time to return to consensus. (Regained connection to primary at 2021-09-28 15:24:54)

@lynnbendixsen
Copy link
Contributor Author

@lynnbendixsen
Copy link
Contributor Author

@WadeBarnes
Copy link
Member

@lynnbendixsen, is this still an issue. Any further insight?

@lynnbendixsen
Copy link
Contributor Author

I have not seen this specific issue for a while, but I also don't allow nodes to stay out of consensus if I can help it. I don't know of any changes to indy-node that would have repaired the issue.
The "rapid view change requests" part of this report still happens pretty regularly for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants