Skip to content

fix: Increase GHA workflow timeout to 35 days from 72h #1067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,7 @@ environment variables for passing your cloud service credentials to the
workflow.

Note that `cml runner` will also automatically restart your jobs (whether from a
[GitHub Actions 72-hour timeout](https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits)
[GitHub Actions 35-day workflow timeout](https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits)
or a
[AWS EC2 spot instance interruption](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html)).

Expand All @@ -438,7 +438,7 @@ jobs:
train-model:
needs: deploy-runner
runs-on: [self-hosted, cml-gpu]
timeout-minutes: 4320 # 72h
timeout-minutes: 50400 # 35 days
container:
image: docker://iterativeai/cml:0-dvc2-base1-gpu
options: --gpus all
Expand Down
10 changes: 8 additions & 2 deletions bin/cml/runner.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ let RUNNER;
let RUNNER_SHUTTING_DOWN = false;
let RUNNER_TIMER = 0;
const RUNNER_JOBS_RUNNING = [];
const GH_5_MIN_TIMEOUT = (72 * 60 - 5) * 60 * 1000;
const GH_5_MIN_TIMEOUT = (35 * 24 * 60 - 5) * 60 * 1000;

const shutdown = async (opts) => {
if (RUNNER_SHUTTING_DOWN) return;
Expand Down Expand Up @@ -319,7 +319,7 @@ const runLocal = async (opts) => {
new Date().getTime() - new Date(job.date).getTime() >
GH_5_MIN_TIMEOUT
) {
shutdown({ ...opts, reason: 'timeout:72h' });
shutdown({ ...opts, reason: 'timeout:35days' });
clearInterval(watcherSeventyTwo);
}
});
Expand Down Expand Up @@ -401,6 +401,12 @@ const run = async (opts) => {
}
}

if (driver === 'github') {
winston.warn(
'Github Actions timeout has been updated from 72h to 35 days. Update your workflow accordingly to be able to restart it automatically.'
);
}

winston.info(`Preparing workdir ${workdir}...`);
await fs.mkdir(workdir, { recursive: true });
await fs.chmod(workdir, '766');
Expand Down