Skip to content

Commit

Permalink
jobs/build: stop waiting for multi-arch jobs to take lock
Browse files Browse the repository at this point in the history
The main reason we added that was because in the new "rerun build-arch
and release jobs" path, there was a higher likelihood that the release
job could in theory take the locks before the build-arch jobs. But with
0664cd6 ("jobs/build: wait when re-running mArch jobs"), this is no
longer a concern.

There's still the theoretical possibility the race happens even in
the regular path (especially when `EARLY_ARCH_JOBS` is unset), but (1)
something must be really slow in the multi-arch jobs for that to happen
(in which case, it might end up taking more than our 5 minute timeout
anyway) and (2) the worst case is that we release without that arch
before it's built, which is salvageable (by rerunning the release job).

So overall, IMO maintaining this code is not worth the complexity. We
can always bring it back and adjust the timeout if this is a recurring
issue.
  • Loading branch information
jlebon committed Feb 23, 2023
1 parent 0664cd6 commit b813111
Showing 1 changed file with 0 additions and 32 deletions.
32 changes: 0 additions & 32 deletions jobs/build.Jenkinsfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import org.yaml.snakeyaml.Yaml;
import org.jenkinsci.plugins.workflow.steps.FlowInterruptedException;

node {
checkout scm
Expand Down Expand Up @@ -522,16 +521,6 @@ def run_multiarch_jobs(arches, src_commit, version, cosa_img, wait) {
string(name: 'PIPECFG_HOTFIX_REPO', value: params.PIPECFG_HOTFIX_REPO),
string(name: 'PIPECFG_HOTFIX_REF', value: params.PIPECFG_HOTFIX_REF)
]
if (!wait) {
// Wait until the locks taken by the `build-arch` jobs are taken
// before continuing. This closes a potential race in which once we
// trigger the `release` job afterwards, it could end up taking the
// locks before the multi-arch jobs.
// This really should never take more than 5 minutes. Having a
// timeout ensures we don't wait for a long time if we somehow
// missed the transition.
wait_until_locked_or_continue("release-${version}-${arch}", 5)
}
}]}
}
}
Expand All @@ -552,24 +541,3 @@ def run_release_job(buildID) {
]
}
}

// XXX: generalize and put in coreos-ci-lib eventually
def wait_until_locked_or_continue(resource, timeout_mins) {
try {
timeout(time: timeout_mins, unit: 'MINUTES') {
waitUntil {
lock(resource: resource, skipIfLocked: true) {
return false
}
return true
}
}
} catch (FlowInterruptedException e) {
// If the lock was still not taken, then something went wrong. For
// example, the job might've failed during the initial `git clone`. The
// timeout is to ensure we don't wait forever and here we continue to
// try to at least release for the arches that did succeed. We may be
// able to salvage the failed arch in the next run.
echo "Timed out waiting for lock ${resource} to be taken. Continuing..."
}
}

0 comments on commit b813111

Please sign in to comment.