-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: give kinda helpful message if too many open files #1110
base: main
Are you sure you want to change the base?
feature: give kinda helpful message if too many open files #1110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me for parallel_attempts
, what are your thoughts on adding a similar guard in generators/base.py
related to parallel_requests
as well?
In theory, if both were set the error would bubble up from the generator sub-processes however since parallel_requests
is independent a generator that requires a single request per call could produce a similar error when parallel_attempts
was not set.
At the same time I wonder about the value of catching OSError like this, are we going down a path that will require additional handlers for various resource limitation errors across supported operating systems?
Consider the command used to test this, run on a Windows installl with only 4GB of RAM can raise:
File "C:\Users\Win10x64\miniconda3\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\Win10x64\miniconda3\Lib\multiprocessing\context.py", line 337, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\Win10x64\miniconda3\Lib\multiprocessing\popen_spawn_win32.py", line 75, in __init__
hp, ht, pid, tid = _winapi.CreateProcess(
^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1455] The paging file is too small for this operation to complete
Amendments:
Validation:
-- i hope the windows message is alright - i don't have a great idea of how this goes wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for extending to parallel_requests
this looks ready.
pool_size = min( | ||
generations_this_call, | ||
_config.system.parallel_requests, | ||
_config.system.max_workers, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct access to config here suggest we should have a helper that owns process or thread pools that references _config.system
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the churn, final validation identified that the new system.max_workers
is not taking overrides into account.
garak.site.yaml:
system:
max_workers: 2000
python -m garak: error: argument --parallel_attempts: Parallel worker count capped at 500 (config.system.max_workers)
iworkers = int(workers) | ||
if iworkers <= 0: | ||
raise argparse.ArgumentTypeError("Need >0 workers (int)" % workers) | ||
if iworkers > _config.system.max_workers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing show this is not really configurable as _config
has not yet loaded garak.site.yaml
and also has not loaded --config
supplied file if passed on cli.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, thanks, will amend and I guess write a test for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, without undue gymnastics, we can either:
- have a hardcoded, non-configurable cap on CLI param validation (in where?
garak.cli
isn't universal, but otohargparse
only applies within this module; top-level values ingarak._config
is the opposite of the direction we're trying to move in) - drop CLI
max_workers
validation but let the config-based one be enforced
Having params that are only configurable in garak.core.yaml
isn't a good option
wdyt? are there other options that make sense? currently leaning toward (2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just do the cli provided validation call after config is loaded and before starting to instantiate things say around here?
Lines 369 to 371 in 27d4554
# base config complete | |
OS can get upset if parallel_attempts goes too high. Give a clearer error message about this.
Verification
List the steps needed to make sure this thing works
garak -m test -p test.Test --parallel_attempts 1000
, the new error should pop up on CLI and in log. If it doesn't, try a higher number, or reduce ulimit.