Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go module for v2.8.0 changed content/checksum #4397

Closed
marians opened this issue Oct 31, 2024 · 3 comments
Closed

Go module for v2.8.0 changed content/checksum #4397

marians opened this issue Oct 31, 2024 · 3 comments
Labels
question Further information is requested

Comments

@marians
Copy link

marians commented Oct 31, 2024

We have a Go project with github.com/Azure/azure-service-operator/v2 v2.8.0 pinned in go.sum.

When building the project via a CI pipeline (Probably empty Go cache), go mod tidy exits with the following error message:

verifying github.com/Azure/azure-service-operator/[email protected]: checksum mismatch
        downloaded: h1:VeXvLrgMD3/LEbyuSddDTcnGR0CK+YE2vKRvx1tiY4k=
        go.sum:     h1:BcyB8LvRmtgVIIUaXwWIJz5eHvknyno0qq5LkDuvM/s=

SECURITY ERROR
This download does NOT match an earlier download recorded in go.sum.
The bits may have been replaced on the origin server, or an attacker may
have intercepted the download attempt.

For more information, see 'go help module-auth'.

My understanding based on go help module-auth is that this is to help prevent supply chain attacks and injection of malicious code.

So for this particular case, I would like to know: how can we know that the version with checksum h1:VeXvLrgMD3/LEbyuSddDTcnGR0CK+YE2vKRvx1tiY4k= is legit?

If it's legit, what's the cause of this change? In general, my thinking is that a Go module released with a specific version should not be modified. Instead, a new release should be published.

@matthchr
Copy link
Member

matthchr commented Oct 31, 2024

Did this error just recently start happening? Do you by chance know what date you originally pulled and got the BcyB8LvRmtgVIIUaXwWIJz5eHvknyno0qq5LkDuvM hash?

I do see even on my local fork that it looks like we re-tagged this release (I don't pull tags very often so hadn't noticed this til now)

 ! [rejected]            v2.8.0                       -> v2.8.0  (would clobber existing tag)

The old tag seems to have pointed to this commit: 17474f2, created at Tue Jun 25 06:48:09 2024 +0000

The new tag points to 38446a3, created at Wed Jun 25 15:43:11 2024 +0000 (~10 hours later).

Is it possible that you originally pulled v2.8.0 sometime between Tue Jun 25 06:48:09 2024 +0000 and Wed Jun 25 15:43:11 2024 +0000? (and haven't rebuilt since?)
If so, there's not a security issue here, we re-tagged the release not long after it was initially tagged.

You're generally right that once released, a tag shouldn't be updated. To understand what happened here and why we did re-tag, you'll need to understand a bit about ASOs release process:

  1. We tag a commit, which kicks off a GitHub job to build a container image and publish it. This image along with various YAMLs are included in the GitHub release associated with the tag.
  2. Once the container image is published, a PR is auto-generated that updates the Helm chart to add a new version.

If something in the job in step 1 fails, we have a tag but no container image. This is what happened for 2.8.0 - we had some issues with the job configuration that resulted in a failed container image publish. At this point we were left with a v2.8.0 tag and no v2.8.0 container image or YAML. So we fixed the issue, deleted and re-tagged the release and produced a v2.8.0 tag that was 1 commit newer, which then had a container image published for it (and Helm chart, etc).

The reason we re-tag in a case like this is that without a container image to process the YAML, the v2.8.0 tag isn't functional, and we feel that the most critical deliverable of an ASO release is the actual container + deployment + CRDs (none of which were there for the initial failed v2.8.0 release, because the job failed), so we optimize for that and re-tagged

We could've left the v2.8.0 tag and done a v2.8.1, but there wouldn't ever have been a v2.8.0 image or v2.8.0 CRDs then, only a v2.8.1.

TL;DR: It's our fault, although I'm not sure that an alternative solution to the problem that avoided this issue would've resulted in a better experience for users.

@matthchr matthchr added question Further information is requested waiting-on-user-response Waiting on more information from the original user before progressing. and removed needs-triage 🔍 labels Oct 31, 2024
@matthchr
Copy link
Member

matthchr commented Oct 31, 2024

Now that we have the experimental release pipeline, which exercises our release path regularly, it's much less likely that we hit a publishing issue with the official release, so I suspect in practice this problem won't happen again.

In fact, I'll update our release procedure to make sure we explicitly check the experimental release has passed for the latest commit. This doesn't actually 100% guarantee we won't hit a release issue, but it makes it very unlikely compared to where we were for v2.8.0

@marians
Copy link
Author

marians commented Nov 4, 2024

Thanks for the comprehensive explanations! We mitigated the problem for ourselves now by upgrading to 2.9.0. Feel free to close the issue.

@matthchr matthchr closed this as completed Nov 4, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Recently Completed in Azure Service Operator Roadmap Nov 4, 2024
@matthchr matthchr removed the waiting-on-user-response Waiting on more information from the original user before progressing. label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Development

No branches or pull requests

2 participants