-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use job script on disk for LSF driver and use stdin for PBS #7396
Conversation
6b6a28c
to
91fff17
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7396 +/- ##
==========================================
+ Coverage 84.48% 85.19% +0.71%
==========================================
Files 384 384
Lines 22857 22877 +20
Branches 884 876 -8
==========================================
+ Hits 19310 19491 +181
+ Misses 3435 3277 -158
+ Partials 112 109 -3 ☔ View full report in Codecov by Sentry. |
2f898a6
to
e9088d3
Compare
e9088d3
to
b312116
Compare
Onprem testing:
|
8e13e55
to
5067a06
Compare
Some clearification:
Currently, the on-prem /global/bin/bsub cannot take the job script on stdin, which leaves the two goals above impossible to attain. |
ab861f8
to
0c5db19
Compare
c019c08
to
50063af
Compare
cca6304
to
ca13b08
Compare
Integration tests with --openpbs on Azure and --lsf onprem has been manually verified (error there will not be visible until next komodo build) |
ca13b08
to
33f7472
Compare
src/ert/scheduler/openpbs_driver.py
Outdated
f"cd {shlex.quote(str(runpath))}\n" | ||
f"exec -a {shlex.quote(executable)} {executable} {shlex.join(args)}\n" | ||
) | ||
script_path = runpath / "job_script.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use the same tempfile.NamedTemporaryFile
pattern like in lsf_driver since when runpath=None
this will not work or ?
Actually where is job_script.sh
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it is not used any longer after changing this PR to use stdin
33f7472
to
b80eef6
Compare
pass | ||
if str(tmp_path).startswith("/tmp"): | ||
print( | ||
"Please use --basetemp option to pytest, PBS tests needs a shared disk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Please use --basetemp option to pytest, PBS tests needs a shared disk" | |
"Please use --basetemp option to pytest, the real PBS cluster needs a shared disk" |
pass | ||
if str(tmp_path).startswith("/tmp"): | ||
print( | ||
"Please use --basetemp option to pytest, LSF tests needs a shared disk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Please use --basetemp option to pytest, LSF tests needs a shared disk" | |
"Please use --basetemp option to pytest, the real LSF cluster needs a shared disk" |
with open("poly.ert", mode="a+", encoding="utf-8") as f: | ||
f.write("QUEUE_SYSTEM TORQUE\nNUM_REALIZATIONS 2") | ||
f.write("QUEUE_SYSTEM TORQUE\nNUM_REALIZATIONS 2\n") | ||
if queue_name is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if queue_name is not None: | |
if (queue_name: = os.getenv("_ERT_TESTS_DEFAULT_QUEUE_NAME")) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get one less line also for the other tests since queue_name
is used only in the if statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used a fixture as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just small comments. It looks very good otherwise! :)
@@ -19,12 +20,19 @@ def mock_openpbs(pytestconfig, monkeypatch, tmp_path): | |||
mock_bin(monkeypatch, tmp_path) | |||
|
|||
|
|||
@pytest.mark.timeout(30) | |||
@pytest.fixture() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
For OpenPBS the explicit job script only exists in memory and is provided to qsub via stdin. For LSF, the bsub command seemingly cannot read stdin, and the job script is placed on runpath with a unique filename that will be left behind. This enables more complex jobs (e.g. redirection) submitted to the drivers submit() function. Align Python LSF driver with C-driver in supplying the runpath as the last argument. This is needed to pass a legacy test. Note that legacy LSF and legacy Torque drivers differ in this behaviour, the latter does not supply runpath. The type of the runpath object supplied is changed from str to pathlib.Path
c8047ca
to
cb90b19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @berland 👍 🚀
Let PBS and LSF make explicit job scripts for test robustness
For OpenPBS the explicit job script only exists in memory and
is provided to qsub via stdin.
For LSF, the bsub command seemingly cannot read stdin, and the
job script is placed on runpath with a unique filename that
will be left behind.
This enables more complex jobs (e.g. redirection) submitted to the
drivers submit() function.
Align Python LSF driver with C-driver in supplying the runpath as the
last argument. This is needed to pass a legacy test. Note that legacy
LSF and legacy Torque drivers differ in this behaviour, the latter does
not supply runpath.
The type of the runpath object supplied is changed from str to
pathlib.Path
Issue
Resolves #7327
Approach
Short description of the approach
(Screenshot of new behavior in GUI if applicable)
When applicable