Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage in public nodes #486

Open
scottyeager opened this issue Oct 25, 2024 · 1 comment
Open

High memory usage in public nodes #486

scottyeager opened this issue Oct 25, 2024 · 1 comment

Comments

@scottyeager
Copy link

I'm not sure if this is related to #423, but the graphs from the relevant time periods look fairly different so this felt worth a new issue. Both of the public nodes in Finland appear to be in an OOM reaping loop, getting restarted every 10-15 minutes.

The situation for other nodes is generally better but many have similar issues. Singapore has a similar sawtooth pattern on memory usage and has currently been up for about 1.5 hours. The nodes in Germany have both settled into a steady state of seemingly high memory usage after periods of memory use spikes and what look like OOM kills. The node in India has been up for almost two days but was showing a similar pattern before that.

Perhaps of particular interest, the nodes in Belgium both appear to have stopped forwarding packets altogether with strong correlation to their own memory spikes over the last 10 days.

I started investigating after having trouble connecting to remote hosts over Mycelium today. Not sure if this is directly related, but I noticed a large amount of messages in my laptop's Mycelium logs (hundreds per second) indicating routes lost and acquired from the nodes in Finland. That seems to have subsided now and connectivity from my laptop has improved.

@LeeSmet
Copy link
Contributor

LeeSmet commented Oct 25, 2024

The high memory usage is indeed new, it seems that memory usage doesn't drop to expected levels after the queue of inbound messages is cleared. This is something which will need some debugging. Note that when 3.15 is released on mainnet, the zos nodes should update to a new mycelium version which reduces protocol traffic and should significantly improve the situation.

The belgian nodes are running a modified binary which rates limit inboud connections, to allow a steady build up over time. unfortunately it seems certain connections are unstable, so they lose connections at roughly the same rate as they accept them now. For these nodes, it can take some time before you manage to connect to them in the current situation. This rate limiting is something which needs some tuning (that's also why it's only these nodes that have it).

As a side note, these crappy connections which get reset all the time due to (presumably) the lower network being unstable are also pretty bad, since they generate additional protocol traffic and waste some cpu time on the peer to retract the node and then add it again after the reconnect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants