-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace panic with error in snap sync #3176
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
panic!("Gossibsub protocol is disabled."); | ||
panic!("Gossipsub protocol is disabled."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the other place where we panic in async code. Is it a logic error, or should we replace it with returning an error to the (async) caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hard one. On one hand this is not a local decision that can be made in this context, on the other hand if we get to this point, it is a very fundamental application composition that I would say justified crashing the whole app since a lot of things will not work properly if this is actually the case.
Ultimately though we should probably simply remove gossipsub until we actually use it (it can be enabled or disabled during instantiation of the networking and it is currently disabled in 100% of cases and was for a very long time).
This change contradicts the previous PR: #3044 |
It doesn't really. It is still an error, previous version just assumed that no segments means they don't exist, while in practice it might be caused by network issues. |
We don't treat the new error as critical on the upper level, do we? |
It'll still get stuck I think, so shouldn't make a lot of difference in practice. |
Not sure I follow. What I mean by the initial remark is that we have rolled back the PR which improved UX and if we want to remove the panic then we should propagate and/or handle the error correctly. |
It ended up being worse UX due to hanging. I think what we should do is to not eat an error in |
I mean the same when I suggest error propagation. |
Panicking in async code causes a race condition which hangs the node roughly half the time on my machine. (The other half it exits with an error.)
As part of this race, the Ctrl-C handler is dropped, because it is an async future running in the same executor. So the user can't use Ctrl-C to exit.
The only way I found to exit the process was
kill -KILL
. A SIGTERM was ignored.Close #3175.
Code contributor checklist: