Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow using on-demand instead of spot only #622

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

samos123
Copy link
Contributor

@samos123 samos123 commented Aug 3, 2024

This fixes #621

@samos123 samos123 marked this pull request as ready for review August 12, 2024 17:29
@samos123
Copy link
Contributor Author

@markblee could you review please? I need this in order to test on internal on-demand capacity.

@samos123
Copy link
Contributor Author

@Ethanlm would love your thoughts too

@@ -508,8 +516,9 @@ def _build_pod(self) -> Nested[Any]:
logging.info("Found tier=%s in env. Using reservation=%s", tier, cfg.reservation)
selector.update({"cloud.google.com/reservation-name": cfg.reservation})
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using tier == "1" for spot and tier == "2" for on-demand? You can configure the quota file s.t. there's only on-demand quota like

[[total_resources]]
v5litepod = 0

[[total_resources]]
v5litepod = 0

[[total_resources]]
v5litepod = X

...

Which corresponds to 0 quota for tier 0/1 and X quota on tier 2. More generally we can also make the logic for tier -> selector mapping configurable, for more flexible schemes. WDYT?

Copy link
Contributor

@ruomingp ruomingp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +519 to +521
if not cfg.enable_ondemand:
logging.info("Found tier=%s in env. Using spot quota", tier)
selector.update({"cloud.google.com/gke-spot": "true"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unittest that can be updated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to use TPU on GKE using on-demand quota
3 participants