-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: preemption #25
feat: preemption #25
Conversation
Signed-off-by: vsoch <[email protected]>
still having an issue with not finding the gs flags Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snakemake has now preemption parameters built in the main codebase. Hence, I don't understand why you define those options in this executor in addition. Can you clarify? Why don't you use self.workflow.remote_execution_settings.preemptible_rules.is_preemptible(job.rule.name)
?
I used the exact logic from the previous snakemake life sciences module, so if that is the case, it wasn't updated to support this design. Can you point me to a plugin that is using preemtible correctly? Also note that I'm blocked from working on this not because of any issue with that, but because the snakemake command is telling me I'm missing a family of storage "gs" args. |
There isn't any yet. But in principle it boils down to check whether the job is supposed to be preemptible with |
Signed-off-by: vsoch <[email protected]>
@johanneskoester when I remove all my custom logic (and allow snakemake to install plugins it needs) I reproduce the error - it is looking for both gcs and gs. This is installing snakemake from the main branch. I can add the install of the gcs plugin, but then I'll be where I was before, looking for the gs plugin. |
ah! Just found it in an environment locally - will test removing here and seeing if that somehow (magically) removes it from the remote. That doesn't make sense, but if snakemake is getting the plugins from my local call, it actually would! Will report back. |
Looks like there is a bug with preemptible_rules in cli.py:
Fix here: snakemake/snakemake#2616 |
we need to use the snakemake provided preemtible args and not write custom ones here. Signed-off-by: vsoch <[email protected]>
echo "I am batch index ${BATCH_TASK_INDEX}" | ||
export PATH=/opt/conda/bin:${PATH} | ||
if [ $BATCH_TASK_INDEX = 0 ] && [ ! -d "/opt/conda" ] ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure about best practice here - if we should install snakemake just to the main executor (running the workflow) or as I've done, allow installing to all. For the other jobs (that ran OK) I was just installing to index 0. We can run a test again after this and see if the previous are still working OK. Intuitively unless there is an obvious shared filesystem (there isn't at /opt) I would want snakemake installed across nodes.
policy = self.get_allocation_policy(job) | ||
|
||
# If we have preemption for the job and retry, update task retries with it | ||
retries = self.workflow.remote_execution_settings.preemptible_retries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task has retries, and I figure that parameter corresponds to "preemtible retries" when that is set. Note that without it, it's just the number of retries regardless of preemptible.
Signed-off-by: vsoch <[email protected]>
the logic for checking the accuracy of the preemptible rules (that if one is set but not the others, and raising a workflow error if this is off) should belong upstream as this is no longer a parameter owned by googlebatch. Signed-off-by: vsoch <[email protected]>
Co-authored-by: Johannes Köster <[email protected]>
🤖 I have created a release *beep* *boop* --- ## [0.3.0](v0.2.0...v0.3.0) (2024-02-15) ### Features * add simple example with cos ([#22](#22)) ([2951454](2951454)) * add support for Google Batch GPUs ([#26](#26)) ([f2af21c](f2af21c)) * expose network policy interfaces ([#28](#28)) ([41c8d44](41c8d44)) * preemption ([#25](#25)) ([d6913a1](d6913a1)) * support for boot disk type, size, and image ([#27](#27)) ([d5de5a1](d5de5a1)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This is a WIP because there is a bug #24 with what looks to be the google storage plugin installed. Ping @johanneskoester I need to get #24 fixed before can proceed with more features here. Happy New Year!