Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: GH resuming workflow #207

Open
casperdcl opened this issue Mar 19, 2022 · 4 comments
Open

guide: GH resuming workflow #207

casperdcl opened this issue Mar 19, 2022 · 4 comments
Labels
documentation Markdown files epic Collection of sub-issues p1-important High priority

Comments

@casperdcl
Copy link
Contributor

casperdcl commented Mar 19, 2022

Add a self-hosted long-running example to https://cml.dev/doc/cml-with-dvc (or somewhere else)

  1. GH action launches "self-hosted" GCP/AWS using cml runner --reuse --labels=cml and probably --cloud-spot
  2. GH action runs the rest of the workflow on the "self-hosted" runner using runs-on: [self-hosted, cml] and timeout-minutes: 50400
  3. If GH action is about to timeout, CML will restart the workflow
@casperdcl casperdcl added documentation Markdown files p1-important High priority labels Mar 19, 2022
@casperdcl
Copy link
Contributor Author

more musings (for cml runner --cloud-spot):

live = dvclive.Live(resume=True)
model = Model(load="model.pkl" if Path("model.pkl").exists() else None)
while (epoch := live.get_step()) < 100:
    history = model.fit(X, Y, epochs=1)
    if epoch % 10 == 0:  # at most 10 epochs are lost upon CML respawing a spot instance
        model.save("model.pkl")
    live.log("loss", history['loss'])
    live.next_step()

@jorgeorpinel
Copy link
Contributor

Out of curiosity, what makes this p1? Perhaps there are there lots of support cases that could be avoided by or redirected to this? Thanks

@casperdcl
Copy link
Contributor Author

lots of support requests over YEARS; super overdue.

@omesser
Copy link

omesser commented Apr 13, 2023

deprioritized and frozen. Removing from CML project board for now

@dacbd dacbd removed their assignment Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Markdown files epic Collection of sub-issues p1-important High priority
Projects
None yet
Development

No branches or pull requests

4 participants