-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit number of workers to prevent system OOM #580
Comments
Perhaps there should also be a timeout for worker thread in lxcfs after which it should return EIO to the application making the fuse call. That will prevent the libfuse+kernel deadlock even if we do end up with lots of stuck lxcfs threads? |
It seems I was mistaken and the OOM was primarily due to consuming applications behaving badly when their reads were stuck. So while this probably would still be ideal I mistakenly thought more ram was consumed by lxcfs. A timeout may be sensible however depending on where the deadlock exists it may not be possible to action it. |
@mihalicyn can lxcfs timeout if the worker thread does not return in a specified time and return EIO or similar to the caller? |
@lathiat @nkshirsagar yep, that's a good idea. I'll think about that, of course. Upd: libfuse versions >= 3.12.0 has In snap environment Ubuntu Focal is used, so, we have libfuse 3.9.0 |
It seems there is no limit to the number of concurrent workers for LXCFS requests.
In situations where lxcfs requests are going slowly for some reason (whether deadlocked or just going slow due to high load or some other cause) and many such requests are coming in lxcfs can consume 1000s of threads and 10s-100s of GB of memory and crash the entire system. As seen while working #471 and #579.
I suggest that we need a limit, even if a fairly high one, to prevent this from happening. This should include non-debug level logging of when the limit is hit.
The text was updated successfully, but these errors were encountered: