-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPFS Frequently Hangs #8409
Comments
I can't see anything strongly related with the last occurrence. |
Ah, sorry, those are the instructions for the latest master (next release). Here are the instructions for v0.9.1: https://github.com/ipfs/go-ipfs/blob/v0.9.1/docs/debug-guide.md, the script is in https://github.com/ipfs/go-ipfs/blob/v0.9.1/bin/collect-profiles.sh. |
Thank, but I think this has the same problem of requiring the daemon to be responsive as it is just asking the daemon to produce the dumps. Furthermore I am service the API via a unix socket for access control so the old script wouldn't be helpful even if ipfs was responsive. |
That is bizarre. I've seen this kind of thing before, but it usually means IPFS is out of memory (which doesn't look like the case here. Next time this happens, could you kill go-ipfs with a SIGQUIT and capture STDOUT+STDERR? That'll kill IPFS and dump go stack traces. I'd also check for any suspicious messages in |
I checked and there was no dmesg output from that time. Is there any particular reason you are interested in dmesg? Seems like an odd place to look for problems. I'll try to send SIGQUIT next time this occurs. |
Just a wild guess, but, maybe check journalctl and specifically look for OOM killers; the node might not be using all the ram but the system might be preventing it from doing so. Also, does your fstab file put any size limits on tmpfs filesystems? |
Nothing. Also this system is almost always lightly loaded on RAM <4/16GiB. Although IIUC go-ipfs is just one process so if it was OOM killed it shouldn't hang, just be shut down. |
I checked and there was no dmesg output from that time. Is there any particular reason you are interested in dmesg? Seems like an odd place to look for problems.
"Complete lockups" can mean something is stuck in the kernel or hardware. I wanted to rule that out.
|
Nothing. Also this system is almost always lightly loaded on RAM <4/16GiB.
Although IIUC go-ipfs is just one process so if it was OOM killed it shouldn't hang, just be shut down.
Ah, no. Linux is terrible about that. The entire system will hang for a while first.
I'd suggest installing something like earlyoom.
|
The rest of the system is operating fine. So unless somehow go-ipfs is particularly sensitive that is an unlikely cause. |
I killed it with SIGQUIT and got call traces. Oddly generating these traces took a looooooong time, about 12h long. Other processes on the system were very responsive and while there were a small number of minor page faults I can't see anything that would explain this slowness. Furthermore the network traffic and CPU usage didn't stop so it seems like some background process was still going on and maybe somehow slowing down the trace handler? The graphs look similar. However interestingly the CPU usage is trending down. I'm curious what will happen if I leave it hung for 48 hours or similar to see what happens to those bursts of CPU usage. The do appear to be flattening out but it is not completely clear. |
Ok, so, it looks like you're receiving a lot of DHT provider puts. The strange thing is that all of these operations seem to have been stalled for 10 hours. The worker that's supposed to handle them is processing a provider get, but that get doesn't appear to be stalled. The simplest workaround is to turn on DHT "client" mode (set |
Ok, I'll try setting that conf and see if it mitigates the issue. |
I haven't seen this hang in a couple of days so it does seem that the mitigation is working. |
@Stebalien is this related to libp2p/go-libp2p-kad-dht#729? |
That and timing out/giving up, yes. |
Following up, have you tried newer versions of kubo (aka go-ipfs)? |
I haven't removed the override and have shut down my IPFS node for now. |
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days. |
This issue was closed because it is missing author input. |
Checklist
Installation method
third-party binary
Version
Config
Note,
ipfs config show
hangs when this problem is occurring. This was obtained by:Description
After running for a while go-ipfs hangs.
I tried following the steps here: https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md
All of the following hang.
I tried getting backtraces from GDB but the results only contain the go runtime. Let me know if you want a copy of those.
My monitoring shows an interesting symptom. This issue can reliably be spotted by all of the memory transitioning to anon from file and slab and the CPU and network traffic appear to become very irregular, occurring only in bursts.
Killing the process regularly doesn't work, it must be killed with SIGKILL.
The text was updated successfully, but these errors were encountered: