-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taq 145 interrupt resume #113
Taq 145 interrupt resume #113
Conversation
When starting to write a README section for the QUARK user who may get confronted with an interrupted QUARK run I realized that at the very end QUARK probably should write an overall status summery to the console. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functionality works in general following the instruction_demo and the documentation in the readme and read-the-docs is very helpful. Please align the code with PR #112 and provide a way to detect CTRL-C and mark it as interrupt such that the user can resume the run later.
From my point of view the pull request is now ready for merge (I don't find how to remove the "draft" status.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functional, hence I approve. However, the keyboard interrupt recovery is not yet working as I would expect. I will open an issue as a basis for an upcoming PR to fix that.
As already mentioned quite some time ago we need the support for submitting quantum jobs asynchronously to some server. We now have implemented a solution in our QUARK clone. To get the asynchronous mode running we also had to do some extensions in main, BenchmarkManager and ConfigManager. These changes can be described as introducing
an "interrupt/resume" mechanism to the BenchmarkManager. The asynchronious job submission is a special application of the more general "interrupt/resume" mechanism and is not part of this merge request. (If you are interested this might become a seperate pull request some times later)
Here is an explanation of the interrupt/resume mechanism:
QUARK modules may return instructions to the benchmark manager as first entry in the return value of pre and post-process.
Currently the following instructions are supported:
PROCEED
INTERRUPT
PROCEED: if the benchmark manger gets a PROCEED (or no instruction at all) he continues with the regular QUARK work flow. If the job manager can finish the current job without getting an INTERRUPT or exception he adds "quark_job_status"=FINISHED to the metrics.
INTERRUPT: if the benchmark manager gets an INTERRUPT he stops the current QUARK work flow, adds "quark_job_status"=INTERRUPTED to the metrics, saves all the metrics written so far to the BenchmarkRecord and continues with the configuration/repetition loop.
QUARK resume-mode:
After running QUARK in its regular mode QUARK can be run again on the same results directory in the resume mode by specifiying the exsiting results directory with the --resume-dir option. This can be done repeatedly for the same results directory.
If QUARK is called in resume mode the benchmark manager does the configuration/repetitions loop as in the regular mode,
reads for every iteration index (actually a pair of indices) the corresponing entry from the old results.json
If this entry contains quark_job_status:FINISHED the benchmark manager stores the old results entry in the new results and continues with the next loop item.
passes the module specific part of this entry (or an empty dict if there is no matching information available) as a key word argument "previous_job_info" to the current module.
It is then up to the specific module what to do with this information.
Note that this way every module has some data store available which is available on each resume run.
compatibility:
The new wrapper functions BenchmarkManager.preprocess/postprocess ensure that modules which do not use the new instruction mechanism behave as if they returned Instruction.PROCEED
use cases:
asynchronous job submission
The use case which has motivated this mechanism is the submission of some quantum jobs to some QPU provider and the collection of the results possibly a long time after submission (hours or even days). The INTERRUPT/resume mechanism allows to do the submission and the result collection in separate QUARK runs.
interrupt with CTRL-C and continue with resume