Add option to perform a `dryrun` with some rocoto commands #114

aerorahul · 2024-12-09T14:22:06Z

An ability to perform dryrun without executing the underlying rocoto command would be valuable. One such use case would be to be able to rocotorun with a dryrun option to obtain the batch card without actually submitting the job. This can enable the user to validate visually the batch card.

An effort towards achieving this is made here
Is this the right track?

The text was updated successfully, but these errors were encountered:

christopherwharrop-noaa · 2024-12-09T17:58:01Z

Thanks @aerorahul for your report and initial work. The existing design makes a dry-run submission a little tricky. There are some subtleties regarding how jobs get submitted within a detached daemon process, and how the job is added to the database before the submission attempt even occurs (so that it can track submit failures/delays etc), that need to be accounted for. The expected tuple returned by the submit is the jobid and the output of the submit command. If the submission succeeds, the jobid will be a valid jobid, and the output will be the usual output from the command that was parsed to retrieve the jobid. If it fails, jobid will be nil, and the output will be the error message. It might be better to create a new method just for dry run submissions and to add logic in the various boot, run, rewind, etc. to handle that based on whether the dry-run option is active. There is a lot of room for improvement in how all of this is designed and handled.

aerorahul · 2024-12-09T18:55:27Z

Thanks @christopherwharrop-noaa.
If I create a submit_dryrun, I would have to duplicate the contents of submit like the creation of the temporary file. Correct?
I will give it a try, but it seems very much involved and needs an understanding of the inner design of rocoto.

christopherwharrop-noaa · 2024-12-10T16:54:48Z

Let me think about it more deeply. There might be a simpler way that I'm just not thinking of. Of course, any user can already get the submit script using the -v option, but not as a dry-run (a live job submission will be attempted). I understand how that can be detrimental when you are trying to debug, as you don't want a valid (from Slurm's point of view), but wrong, submission to occur while you are working on building the workflow.

aerorahul · 2024-12-10T16:58:31Z

The approach we have been using/brainstorming is not elegant and extremely hacky.

rocotorun -v 10 .... or rocotoboot -v 10 ...
get the jobid
scancel jobid

As you note, it likely breaks provenance of the rocoto db, and I am sure there are unintended consequences.

christopherwharrop-noaa · 2024-12-10T17:07:17Z

I think your request for the dry-run (or whatever we want to call it) feature to get the script that Rocoto will submit for a particular task, is totally reasonable. I strongly suspect this is something that many others would find useful. One other thing, though, is that whatever is implemented has to work for PBSPro and any other supported batch system. Right now, those are, realistically, the only ones in use. The thing that makes it weird is Rocoto's way of submitting the jobs asynchronously in a daemon spawned by the main process. That daemon server process often lives after the main rocotorun process has terminated. And it is the thing that builds the submit script. I think we just need to make all parts aware of when a dry-run is active or not. Some of the plumbing that happens just before job submission attempts are made needs some modification so that it doesn't do things in dry-run mode like store the submit attempt in the database, etc.

aerorahul · 2024-12-10T17:10:54Z

Thanks for explaining the work involved.
I started w/ slurm, just to get the conversation going and gauge interest.

aerorahul · 2025-01-08T22:20:31Z

@christopherwharrop-noaa In the branch mentioned above, I added the dryrun for the other batch systems in the same spirit as slurm.

If you can help me with:

I think we just need to make all parts aware of when a dry-run is active or not. Some of the plumbing that happens just before job submission attempts are made needs some modification so that it doesn't do things in dry-run mode like store the submit attempt in the database, etc.

I would greatly appreciate your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to perform a `dryrun` with some rocoto commands #114

Add option to perform a `dryrun` with some rocoto commands #114

aerorahul commented Dec 9, 2024

christopherwharrop-noaa commented Dec 9, 2024

aerorahul commented Dec 9, 2024

christopherwharrop-noaa commented Dec 10, 2024

aerorahul commented Dec 10, 2024

christopherwharrop-noaa commented Dec 10, 2024

aerorahul commented Dec 10, 2024

aerorahul commented Jan 8, 2025

Add option to perform a dryrun with some rocoto commands #114

Add option to perform a dryrun with some rocoto commands #114

Comments

aerorahul commented Dec 9, 2024

christopherwharrop-noaa commented Dec 9, 2024

aerorahul commented Dec 9, 2024

christopherwharrop-noaa commented Dec 10, 2024

aerorahul commented Dec 10, 2024

christopherwharrop-noaa commented Dec 10, 2024

aerorahul commented Dec 10, 2024

aerorahul commented Jan 8, 2025

Add option to perform a `dryrun` with some rocoto commands #114

Add option to perform a `dryrun` with some rocoto commands #114