fix(crates): Detect stale upload caches and retry #98

jan-auer · 2020-06-09T10:11:15Z

The publishing errors also happen locally. When publishing a crate with --no-verify, it seems that the crate caches can go out of date and cargo will fail to publish. We can detect this and retry publishing with exponential backoff.

I even ran into this when publishing manually with sufficient time between the workspace crates. The error always looks like this:

cargo publish --no-verify
    Updating crates.io index
   Packaging symbolic v7.3.4 (/private/tmp/symbolic)
error: failed to prepare local package for uploading

Caused by:
  failed to select a version for the requirement `symbolic-symcache = "^7.3.4"`
  candidate versions found which didn't match: 7.3.3, 7.3.2, 7.3.1, ...
  location searched: crates.io index
required by package `symbolic v7.3.4 (/private/tmp/symbolic)`

It is unclear what the root cause of this is. When cargo updates its index, it runs a "fast path" request against the GitHub API which seems to get cached. Waiting for 20-30 seconds seems sufficient to invalidate those caches.

Fixes #67

jan-auer · 2020-06-09T10:11:42Z

I'm open to suggestions regarding tests :)

tonyo

Let's try 🤷‍♂️

tonyo · 2020-06-09T10:23:32Z

src/targets/crates.ts

+        return await spawnProcess(CARGO_BIN, args, { env });
+      } catch (e) {
+        if (i == 0 && e.message.includes(VERSION_ERROR)) {
+          logger.debug('Potential stale cache detected, trying again...');


Suggested change

logger.debug('Potential stale cache detected, trying again...');

logger.warn('Potential stale cache detected, trying again...');

To make it more visible.

Just double-checking - are you certain? I'd rather swallow this and only if it occurs again emit an error. Usually the entire crate publishing stage is silent, so it might be a bit surprising if suddenly the warning pops up but then nothing after that.

It's just to detect these cases in case the user forgot to set log level to debug.

Usually the entire crate publishing stage is silent

We can log something else to make it less lonely?

OK, let me add some info logs and also the crate name in the warning.

Updated, how do you like the new messages?

Bbbjutiful 👍

src/targets/crates.ts

Co-authored-by: Anton Ovchinnikov <[email protected]>

jan-auer · 2020-06-09T10:33:18Z

I'd like to verify this once with an actual release. I'll check back in once it worked and then we can merge this.

BYK · 2020-06-09T12:59:49Z

src/targets/crates.ts

+    };
+
+    logger.info(`Publishing ${crate.name}`);
+    for (let i = 0; i <= 1; i++) {


I think this for loop construct makes it harder to read the code than repeating the return await spawnProcess(CARGO_BIN, args, { env }); part. If you don't want to repeat that, we can abstract that out into its own function.

Or at least we should define MAX_RETRIES=1 and do i <= MAX_RETRIES.

Good point, I’ll repeat for now.

jan-auer · 2020-06-09T14:17:33Z

Unfortunately, that still did not help. Craft detects the condition successfully, but I'm still getting the error. I am not sure what the proper resolution is, because as soon as I try it manually, it just magically works.

You can see that it even pulls a new version of the "crates.io index". Potentially increase the timeout?

...
ℹ info [crates] › Publishing symbolic
⚠ warn [crates] › Potential stale cache detected, trying again...
✖ error Error: Process "cargo" errored with code 101

  STDOUT: cargo:


  STDERR:cargo:     Updating crates.io index
  cargo:    Packaging symbolic v7.3.5 (/var/folders/7p/zvqpr1r53zv_b6tzlz7tw2vw0000gn/T/craft-crates-hJ3Y7K)
  cargo: error: failed to prepare local package for uploading
  cargo:
  cargo: Caused by:
  cargo:   failed to select a version for the requirement `symbolic-common = "^7.3.5"`
  cargo:   candidate versions found which didn't match: 7.3.4, 7.3.3, 7.3.2, ...
  cargo:   location searched: crates.io index
  cargo: required by package `symbolic v7.3.5 (/var/folders/7p/zvqpr1r53zv_b6tzlz7tw2vw0000gn/T/craft-crates-hJ3Y7K)`
  cargo:

src/targets/crates.ts

BYK · 2020-06-09T14:31:11Z

src/targets/crates.ts

+      if (e.message.includes(VERSION_ERROR)) {
+        logger.warn(`Potential stale cache detected, trying again...`);
+        await sleepAsync(RETRY_DELAY_MS);
+        return spawnProcess(CARGO_BIN, args, { env });


hah, skipping await, very clever optimization :)

It's closer to the original. However, I think I'll have to revert this and really spin for multiple times + increase the timeout. No idea what the root cause of this is.

Maybe one of these help:

Updating local registry rust-lang/cargo#8273

Please provide a subcommand to refresh the crates.io index rust-lang/cargo#3377 (comment)

Thanks, I tried that as well. Unfortunately, you can see in the above output that the crates index was in fact refreshed, but still didn’t contain the new version.

This leads me to believe that it’s server side caches that interfer with this, and the only way around would be larger delays and more retries.

What if we kept polling the API directly until we get a response (ie https://crates.io/api/v1/crates/symbolic/7.3.5)

There are two challenges with this:

The crates API is viewed as an implementation detail as far as I could see in the cargo issues. I'd rather stick to what cargo offers

For publishing a package, we would have to first resolve all dependencies, and then check the API for every dependency. This is much more code that I'd hoped for.

Even then, it doesn't guarantee us that cargo properly refreshes the index.

jan-auer · 2020-06-10T15:18:52Z

I ran another test with this implementation and got the same result now:

ℹ info [crates] › Publishing symbolic
⚠ warn [crates] › Potential stale cache detected, trying again in 3s...
⚠ warn [crates] › Potential stale cache detected, trying again in 3s...
⚠ warn [crates] › Potential stale cache detected, trying again in 3s...
⚠ warn [crates] › Potential stale cache detected, trying again in 3s...
✖ error Error: Process "cargo" errored with code 101

  STDERR:cargo:     Updating crates.io index
  cargo:    Packaging symbolic v7.3.6 (/var/folders/7p/zvqpr1r53zv_b6tzlz7tw2vw0000gn/T/craft-crates-FkEZbQ)
  cargo: error: failed to prepare local package for uploading
...

It is clearly visible how cargo updates the index, and then still doesn't find the crate after 12 seconds. This happens predictably, and I even reproduced this with a manual release. This can only be something we're missing.

I'm wondering if we should just remove the --no-verify and take into account that it might take some time to build and verify the crates.

* master: (23 commits) feat(crates): Add noDevDeps option (#112) fix: Write to cache in base artifact provider fix: Logger scopes for gcs and artifact providers build(ci): Have better defaults for CI environments (#110) fix(gha): Use single quotes for string literals (#108) ref(gha): Remove ENV inputs, add no-merge and keep-branch (#107) fix(docker): Fix CARGO_HOME and RUST_HOME since GHA changes HOME (#106) docs: Add missing CHANGELOG entry for cargo upgrade (#105) build(docker): Upgrade cargo to a recent version (#104) fix(gha): Remove no-merge and keep-branch temporarily fix(gha): Try to skip empty args using null fix(gha): Remove defaults on craft arguments fix(gha): Only pass publish args to publish feat(gha): Add GitHub Action for Craft (#103) docs: Fix `changelogPolicy` enum (#102) build(docker): Add a `craft` binary into the Docker image (#101) docs: Fix `artifactProvider` example (#100) release: 0.10.0 meta: Update Changelog build(gcb): Add a public Docker image (#99) ...

jan-auer · 2020-06-18T17:34:29Z

Ok, it does what it should do now. Can I haz a new approve before marge, please?

Here is fully trustworthy proof:

ℹ info [crates] › Publishing symbolic-common
ℹ info [crates] › Publishing symbolic-sourcemap
ℹ info [crates] › Publishing symbolic-unreal
ℹ info [crates] › Publishing symbolic-debuginfo
ℹ info [crates] › Publishing symbolic-demangle
ℹ info [crates] › Publishing symbolic-proguard
ℹ info [crates] › Publishing symbolic-minidump
ℹ info [crates] › Publishing symbolic-symcache
ℹ info [crates] › Publishing symbolic
⚠ warn [crates] › Publish failed, trying again in 2s...
⚠ warn [crates] › Publish failed, trying again in 4s...
⚠ warn [crates] › Publish failed, trying again in 8s...
⚠ warn [crates] › Publish failed, trying again in 16s...
ℹ info [crates] › Crates release complete

tonyo

🎖️

BYK · 2020-06-18T17:50:40Z

src/targets/crates.ts

+      try {
+        return await spawnProcess(CARGO_BIN, args, { env });
+      } catch (e) {
+        if (i < MAX_ATTEMPTS && e.message.includes(VERSION_ERROR)) {


This differs from the for loop's condition, intentional? I find having two checks a bit confusing btw. Feels like this should be a while loop. How about a slightly different approach?

const MAX_WAIT_SECS = 60; let totalWait = 0; let delay = RETRY_DELAY_SECS; let error; do { if (error) { if (error.message.includes(VERSION_ERROR)) { logger.warn(`Publish failed due to potentially stale cache. Trying again in ${delay}s...`); await sleepAsync(delay * 1000); totalWait += delay; delay *= RETRY_EXP_FACTOR; } else { break; } } try { return await spawnProcess(CARGO_BIN, args, { env }); } catch (err) { error = err; } } while (totalWait <= MAX_WAIT_SECS) throw error;

Yes this is intentional. The loop needs to to take another turn so that we get into the else branch. In your implementation, you do not throw the error in the last iteration.

I thought about a total wait time (which could be written as a for loop, too), but rather wanted a fixed number of retries. And then doing RETRY_DELAY_SECS * Math.pow(RETRY_EXP_FACTOR, MAX_ATTEMPTS) seemed to cumbersome.

In your implementation, you do not throw the error in the last iteration.

~~Yes I do as I do not check the count or total wait in the if separately.~~

I need more sleep/coffee.

BYK

<3

fix(crates): Detect stale upload caches and retry

ea9781b

jan-auer requested a review from tonyo June 9, 2020 10:11

jan-auer self-assigned this Jun 9, 2020

tonyo approved these changes Jun 9, 2020

View reviewed changes

fix: Use strict equals

7d99de9

Co-authored-by: Anton Ovchinnikov <[email protected]>

ref(crates): Add more logging

ebc0ed9

BYK reviewed Jun 9, 2020

View reviewed changes

ref(crates): Simplify retry logic

4cec5fa

BYK approved these changes Jun 9, 2020

View reviewed changes

ref: Retry a couple of times

86f43aa

jan-auer added 2 commits June 18, 2020 19:26

ref(crates): Attempt exponential backoff up to 62s

5995907

jan-auer requested review from BYK and tonyo June 18, 2020 17:34

tonyo approved these changes Jun 18, 2020

View reviewed changes

BYK reviewed Jun 18, 2020

View reviewed changes

BYK approved these changes Jun 18, 2020

View reviewed changes

jan-auer merged commit c9d32ea into master Jun 18, 2020

jan-auer deleted the fix/crates-cache-retry branch June 18, 2020 18:13

	logger.debug('Potential stale cache detected, trying again...');
	logger.warn('Potential stale cache detected, trying again...');

Uh oh!

fix(crates): Detect stale upload caches and retry #98

fix(crates): Detect stale upload caches and retry #98

Uh oh!

Conversation

jan-auer commented Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jan-auer commented Jun 9, 2020

Uh oh!

tonyo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-auer Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jan-auer commented Jun 9, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-auer commented Jun 9, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-auer commented Jun 10, 2020

Uh oh!

jan-auer commented Jun 18, 2020

Uh oh!

tonyo left a comment

Choose a reason for hiding this comment

Uh oh!

BYK Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BYK Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BYK left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jan-auer commented Jun 9, 2020 •

edited

Loading

jan-auer Jun 9, 2020 •

edited

Loading

BYK Jun 18, 2020 •

edited

Loading

BYK Jun 18, 2020 •

edited

Loading