-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
celestia node grabbing excessive RAM #3129
Comments
We were recently debugging a case reported by @mycodecrafting where a node similarly grabbed a lot of RAM but wasn't using it(as per profiles), and the kernel could still reclaim that memory, as we proved in an experiment. In htop we saw it taking 25G, but once we launched another memory-heavy process - the node quickly shrank to around 1G. As we are not aware of any other leaks, I would like to first exclude the above. The only difference I see is that in your case, the node gets killed or OOMed, while in the above, everything was ok, but I need more information on how it gets killed, like logs from k8s. Additionally, I need profiles from the node, and this will definitely confirm if this issue is related to #3107 |
If this is specific to launching new nodes that need to sync, app has a similar open issue that devops sees. Just an fyi, not sure if there could be any common issues between the reports. |
@MSevey, are there similar reports for the node or only for app? |
currently just app to my knowledge. But thought it might be useful to touch base with app to see if anything they looked into trigger new ideas here. |
closing as i believe we identified this as a thundering herd that hit the running node |
Celestia Node version
0.12.3
OS
Alpine 3.18.4
Install tools
Using the ghcr docker container in k8s. This is deployed with our helm chart: https://github.com/astriaorg/dev-cluster/tree/main/charts/celestia-node, utilizing an override which provides a PVC for storage.
k8s statefulset file:
Others
We have played around with the resources allocated to the light node, recently raising to 25GB of ram available, it consumes all available resources and then is killed to maintain safety. Can see init and commands in the above k8s stateful set.
Steps to reproduce it
Deploy using helm chart w/ pvc configured. Very consistenly overconsimes
Expected result
The node to not consume 25GB of ram and ideally be functional on stated minimum system requirements (500 MB?)
Actual result
Can see graph of memory usage in GB of the node with each restart noted with different colors below

Relevant log output
No response
Notes
No response
The text was updated successfully, but these errors were encountered: