-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add STOP_ON_FAIL option for wf jobs #6101
Conversation
3f8c486
to
6407e20
Compare
I think we should flip the boolean value and call it |
6407e20
to
e40abf0
Compare
e40abf0
to
549d19d
Compare
Not sure about this flip, from discussions we agreed that the default would be to continue on error, so imo making it explicit when to stop makes more sense than making it explicit when to continue |
925ceff
to
498ae77
Compare
Codecov Report
@@ Coverage Diff @@
## main #6101 +/- ##
==========================================
+ Coverage 83.39% 83.53% +0.14%
==========================================
Files 343 343
Lines 20827 20777 -50
Branches 944 938 -6
==========================================
- Hits 17368 17357 -11
+ Misses 3170 3129 -41
- Partials 289 291 +2
... and 39 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
498ae77
to
9c49c2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good change! Tried running it in the gui though, and it just hangs for about a minute before I get a popup saying that the workflow failed (I changed to PRE_SIMULATION). The simulation window then changes to Done, but still appears to waiting.
The running time stopped counting after 2 seconds, but did not get the popup for about a minute.
""" | ||
LOAD_WORKFLOW_JOB failing_job failjob | ||
LOAD_WORKFLOW dump_failing_workflow wffail | ||
HOOK_WORKFLOW wffail POST_SIMULATION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could change this to PRE_SIMULATION to avoid spinning up the ensemble evaluator, or perhaps you want it to be POST
to check that we exit the tracking loop? If so, that might be worth a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will look a bit more into this, running tests it also hangs on PRE_SIMULATION because it waits 60 seconds for the websocket client to connect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should maybe be another issue, I tried some stuff w/ threading, like setting the monitor thread to be a daemon, but there is still something else that keeps it hanging.
When the thread named ert_cli_simulation_thread
is spawned, it spawns some other threads too:
Should maybe be a separate issue/thing to look into/thing to postpone until experiment server milestone which will refactor this away anyway?
) | ||
) | ||
|
||
parsed = ert_parser(None, args=[ENSEMBLE_EXPERIMENT_MODE, "poly.ert"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use TEST_RUN_MODE
to avoid running all realizations.
e3226ed
to
54bd310
Compare
A lot of workflows are using the plugin system to register, but as far as I can tell, this is not really exposed through that api? https://ert.readthedocs.io/en/latest/experimental/plugin_system.html#workflow-jobs |
67cc24b
to
e356855
Compare
e356855
to
910f26f
Compare
Thanks for good work, when I ran it I noticed that I got a stack trace in my terminal when running from the gui though:
which I would not expect. However the terminal gave me the error message, but the box that popped up in the gui did not. |
910f26f
to
e28903c
Compare
I used |
7cebd3f
to
78c336b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice improvement! Remember to clean up commits before merging.
Should this also be published in a bigger "note" to our users that this is now possible? There have been multiple requests regarding this right? |
78c336b
to
af93a47
Compare
Yes, this should be part of the release notes yes! But, can we update the documentation for workflows in this PR to explain the new behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚢
0c842eb
to
b97c406
Compare
b97c406
to
047d0c3
Compare
Issue
Resolves #2386
Approach
Add kw to workflow job parser and halt execution if the
stop_on_fail
(defaults toFalse
) attribute of a workflow job is True.STOP_ON_FAIL=True can be added to ert scripts like this:
Or to
.sh
scripts like this:Or the workflow job declaration file:
In all cases, this defaults to
False
. The declaration in the workflow job will have the highest priority, if it is declared in multiple places.(Screenshot of new behavior in GUI if applicable)
Pre review checklist
Ground Rules),
and changes to existing code have good test coverage.
Pre merge checklist