You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upon checking automation.pdap.io, I noticed that none of the jobs had run for over 24 hours. After snurfling around more closely, I realized that the "Built-In" Node which runs Jenkins job stopped running because it didn't have storage space left.
More specifically, when I went to /var/lib/docker and ran du -sh overlay2 (which checks the size of the overlay2 directory in the docker directory, it showed 42 Gigabytes in use, with our node only having 50 GB of storage space in total.
After doing some additional online snurfling, I found I could quickly solve this by
running docker system prune --all --volumes --force, which prunes a lot of old images and other data.
Restarting Jenkins by entering https://automation.pdap.io/safeRestart in the browser
Once I did that, I reclaimed 37.93 GB of space. Neat! So in the future, whenever this happens, we can do that again as a stopgap measure.
Of course, ideally we don't have to do this at all. Unfortunately, at the moment it's not clear what the problem is, and we also don't have a way of being alerted if the node goes offline. So we should solve both of those:
Requirements
Set up means of sending out a notification when the Jenkins Built-In Node goes offline (or when storage threshold on droplets is exceeded).
Figure out and resolve Docker storage leak problem
Additional Resources
This docker thread discusses the issue and postulates some possible causes.
One of those possible causes is the images themselves writing log files or other data to disk, which is not being properly cleaned up.
The text was updated successfully, but these errors were encountered:
Upon checking
automation.pdap.io
, I noticed that none of the jobs had run for over 24 hours. After snurfling around more closely, I realized that the "Built-In" Node which runs Jenkins job stopped running because it didn't have storage space left.More specifically, when I went to
/var/lib/docker
and randu -sh overlay2
(which checks the size of theoverlay2
directory in thedocker
directory, it showed 42 Gigabytes in use, with our node only having 50 GB of storage space in total.After doing some additional online snurfling, I found I could quickly solve this by
docker system prune --all --volumes --force
, which prunes a lot of old images and other data.https://automation.pdap.io/safeRestart
in the browserOnce I did that, I reclaimed 37.93 GB of space. Neat! So in the future, whenever this happens, we can do that again as a stopgap measure.
Of course, ideally we don't have to do this at all. Unfortunately, at the moment it's not clear what the problem is, and we also don't have a way of being alerted if the node goes offline. So we should solve both of those:
Requirements
Additional Resources
This docker thread discusses the issue and postulates some possible causes.
The text was updated successfully, but these errors were encountered: