-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node hangs when started on broken network, after worker tries to shut down #3175
Comments
Did you try Ctrl+C after above panic? We'll want to fix that panic for sure, but I suspect it is the reason Ctrl+C didn't work, it probably broke the signal handler somehow. |
|
The panic is in snap sync, saying we don't have enough archived history. (The DB folder should be empty, because I used This appears to be a race condition, the second time I ran it, the node exited with an error:
|
I understand that the panic happened, I'm asking about when were you trying to press Ctrl+C: before or after the panic? |
After the panic. Our signal handler is an async future run by the tokio runtime, so as soon as the runtime starts to shut down, it stops working. (Runtimes drop pending tasks when they shut down, then wait for running tasks to yield.) That's why in PR #3170 I launched a Ctrl-C handler in another runtime. Maybe something we could think about. |
In general we don't use |
We could build our runtimes with |
Actually yes, we should probably do that. Runtime is typically created by Substate and not custimizable, but we do create it using much lower-level APIs and we do have full control over it. I think we should use that for all apps. |
Oh, actually, that method is currently only supported in the |
We could install a panic hook that exits the process instead? |
And it is also unstable. I really don't like these implicit and statically uncheckable Tokio APIs. It is easy to write, but blows up in runtime, exactly the things I'm able to avoid in Rust most of the time except when dealing with Tokio and Tracing. Exiting process on panic is fine by me. |
On macOS 13.7 on M1 Max, I ran the following command:
(Yes, I know this network is partly shut down!)
Using the binaries from:
https://github.com/autonomys/subspace/actions/runs/11418607109
This caused a node hang after snap sync panicked, and the worker tried to shut down. After the panic, the node didn't exit when I pressed Ctrl-C, or when I used
kill
(SIGTERM) on its pid.The logs were:
I'll re-run to get a full stacktrace, and also see which threads/stacks are still active.
The text was updated successfully, but these errors were encountered: