-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: preemption #25
Merged
+100
−53
Merged
feat: preemption #25
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
f09a21e
feat: preemption
vsoch b874811
updates
vsoch 4a0eb48
clean up install of snakemake assets
vsoch 985fa73
Update install-snek.sh
vsoch f5471ea
Update install-snek.sh
vsoch b6258f9
save state
vsoch 02f2ca3
bugfixes and updates to preemptible
vsoch 2fbb0b1
assume job.rules produces strings of the job name
vsoch f68e66a
clean up is preemptible function
vsoch e4c5a23
Update snakemake_executor_plugin_googlebatch/executor.py
vsoch File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -331,4 +331,4 @@ rule hello_world: | |
googlebatch_snippets="mpi,myscript.sh" | ||
shell: | ||
"..." | ||
``` | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Snakemake Google Batch Examples | ||
|
||
- [hello-world](hello-world): The most basic hello world example with multiple langauges | ||
- [hello-world-intel-mpi](hello-world-intel-mpi): Run an example MPI job to calculate Pi with intel MPI | ||
- [hello-world-cos](hello-world-cos): The same using the container operating system (COS) | ||
- [hello-world-preemption](hello-world-preemption): Example asking for preemptible jobs for all rules. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Hello World | ||
|
||
> with preemtible instances | ||
Here is an example specifying to use an s3 bucket and asking for preemptible instances. | ||
|
||
```bash | ||
snakemake --jobs 1 --executor googlebatch --googlebatch-region us-central1 --googlebatch-project llnl-flux --default-storage-provider s3 --default-storage-prefix s3://snakemake-testing-llnl --preemptible-rules | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# By convention, the first pseudorule should be called "all" | ||
# We're using the expand() function to create multiple targets | ||
rule all: | ||
input: | ||
expand( | ||
"{greeting}/world.txt", | ||
greeting = ['hello', 'hola'], | ||
), | ||
|
||
# First real rule, this is using a wildcard called "greeting" | ||
rule multilingual_hello_world: | ||
output: | ||
"{greeting}/world.txt", | ||
shell: | ||
""" | ||
mkdir -p "{wildcards.greeting}" | ||
sleep 5 | ||
echo "{wildcards.greeting}, World!" > {output} | ||
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
#!/bin/bash | ||
|
||
url=$1 | ||
|
||
tmp=$(mktemp -d -t snek-install-XXXX) | ||
rm -rf ${tmp} | ||
|
||
git clone --depth 1 ${url} ${tmp} | ||
cd ${tmp} | ||
/opt/conda/bin/python -m pip install . | ||
cd - | ||
rm -rf ${tmp} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -110,6 +110,19 @@ def validate(self, job): | |
f"Job {job} has container without image_family batch-cos*." | ||
) | ||
|
||
def is_preemptible(self, job): | ||
johanneskoester marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Determine if a job is preemptible. | ||
|
||
The logic for determining if the set is valid should belong upstream. | ||
""" | ||
is_p = self.workflow.remote_execution_settings.preemptible_rules.is_preemptible | ||
if job.is_group(): | ||
preemptible = all(is_p(rule) for rule in job.rules) | ||
johanneskoester marked this conversation as resolved.
Show resolved
Hide resolved
|
||
else: | ||
preemptible = is_p(job.rule.name) | ||
return preemptible | ||
|
||
def get_command_writer(self, job): | ||
""" | ||
Get a command writer for a job. | ||
|
@@ -178,6 +191,9 @@ def run_job(self, job: JobExecutorInterface): | |
setup.script = batch_v1.Runnable.Script() | ||
setup.script.text = setup_command | ||
|
||
# Placement policy | ||
# https://cloud.google.com/python/docs/reference/batch/latest/google.cloud.batch_v1.types.AllocationPolicy.PlacementPolicy | ||
|
||
# This will ensure all nodes finish first | ||
barrier = batch_v1.Runnable() | ||
barrier.barrier = batch_v1.Runnable.Barrier() | ||
|
@@ -202,8 +218,15 @@ def run_job(self, job: JobExecutorInterface): | |
group.permissive_ssh = True | ||
|
||
# This includes instances (machine type) boot disk and policy | ||
# Also preemtion | ||
policy = self.get_allocation_policy(job) | ||
|
||
# If we have preemption for the job and retry, update task retries with it | ||
retries = self.workflow.remote_execution_settings.preemptible_retries | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The task has retries, and I figure that parameter corresponds to "preemtible retries" when that is set. Note that without it, it's just the number of retries regardless of preemptible. |
||
if self.is_preemptible(job) and retries: | ||
self.logger.debug(f"Updating preemptible retries to {retries}") | ||
task.max_retry_count = retries | ||
|
||
batchjob = batch_v1.Job() | ||
batchjob.task_groups = [group] | ||
batchjob.allocation_policy = policy | ||
|
@@ -262,6 +285,12 @@ def get_allocation_policy(self, job): | |
policy.machine_type = machine_type | ||
policy.boot_disk = boot_disk | ||
|
||
# Do we want preemptible? | ||
# https://github.com/googleapis/googleapis/blob/master/google/cloud/batch/v1/job.proto#L479 and # noqa | ||
# https://github.com/googleapis/google-cloud-python/blob/main/packages/google-cloud-batch/google/cloud/batch_v1/types/job.py#L672 # noqa | ||
if self.is_preemptible(job): | ||
policy.provisioning_model = 3 | ||
|
||
instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate() | ||
instances.policy = policy | ||
allocation_policy.instances = [instances] | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure about best practice here - if we should install snakemake just to the main executor (running the workflow) or as I've done, allow installing to all. For the other jobs (that ran OK) I was just installing to index 0. We can run a test again after this and see if the previous are still working OK. Intuitively unless there is an obvious shared filesystem (there isn't at /opt) I would want snakemake installed across nodes.