Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pristine and verifiable source releases #3565

Open
pmatilai opened this issue Feb 12, 2025 · 10 comments
Open

Pristine and verifiable source releases #3565

pmatilai opened this issue Feb 12, 2025 · 10 comments
Labels
build Build-system related release Release creation
Milestone

Comments

@pmatilai
Copy link
Member

Rpm source releases have always been something that one needs to build. This antipattern was the norm in autotools world that we no longer need with cmake, but due to the autotools history, we traditionally bundled pre-built documentation in the source releases because, why not. Shipping unreproducible content in your source tarballs is an ill-fitting concept in today's world.

We want our source releases to be bit per bit identical to what you get straight out of git, with zero build steps to generate content, defined by a git tag. We still want a stable archive of that content generated and hosted on rpm.org because, GH archive creation could change any day and render checksums unverifiable. That archive should be generated in a "hermetic" environment rather than a developers workstation. But that "canonical tarball" is just a convenience really, the contents are trivially diffable to match 100%, something that our traditional tarballs do not pass at all.

@pmatilai pmatilai added this to RPM Feb 12, 2025
@github-project-automation github-project-automation bot moved this to Backlog in RPM Feb 12, 2025
@pmatilai pmatilai added build Build-system related release Release creation labels Feb 12, 2025
@cgwalters
Copy link
Contributor

cgwalters commented Feb 12, 2025

Note that last I heard, git makes no promises that the output of git archive will forever be reproducible either, although I don't think it has changed in practice. IIRC github changed their archive generation a while ago, then backed off from it.

But I did create https://github.com/cgwalters/git-evtag/ which is partly to address some of this problem domain from the other direction - ensuring that git tag has the same security properties as a tarball.

hosted on rpm.org

Sure, why not, though of course github releases support attached artifacts, and for e.g. bootc we generate a git archive as an artifact (alongside a Rust vendor snapshot) attached to the github "release", so one doesn't need to host out of band to have 100% fixed tarballs on github.

(I would still say though that IMO, distributions like Fedora should encourage fetching directly from git and not use tarballs at all...which is something that RPM is somewhat in a position to help encourage, but that's a bigger discussion...)

@dmnks
Copy link
Contributor

dmnks commented Feb 12, 2025

[...] for e.g. bootc we generate a git archive as an artifact (alongside a Rust vendor snapshot) attached to the github "release"

Do you generate & upload those via a GH action or is that done manually?

@cgwalters
Copy link
Contributor

Currently, manually. However it would make total sense to do it via actions and is especially valuable now with things like https://docs.github.com/en/actions/security-for-github-actions/using-artifact-attestations/using-artifact-attestations-to-establish-provenance-for-builds

@dmnks
Copy link
Contributor

dmnks commented Feb 12, 2025

Thanks (for the link, too)!

Indeed. We're currently thinking of doing that with an action that triggers on release creation (done manually via the web UI or via gh release create perhaps), creates a tarball from the given tag (basically with git archive) and attaches it to the just-created release as an additional asset (with gh release upload).

I've seen this done in your other project (podman), actually 😄, and did a quick PoC in my fork of rpm, it works as expected.

This is based on the assumption that we don't want to rely on the default tarballs provided in a GH release as it looks like they might be done on the server dynamically (upon downloading), which I guess is also the reason you're doing the above.

@dmnks
Copy link
Contributor

dmnks commented Feb 12, 2025

it looks like they might be done on the server dynamically (upon downloading)

OK, this is probably not the case since I'm getting the same checksum on multiple downloads of the same tarball here, but still, not sure if we could rely on it never changing (for the given release).

@dmnks
Copy link
Contributor

dmnks commented Feb 12, 2025

Note that last I heard, git makes no promises that the output of git archive will forever be reproducible

Now I realize this is the reason for you attaching a custom tarball, i.e. to make that reproducible. (Default GH tarballs are most likely static, after all, but not necessarily reproducible.) Makes sense.

@pmatilai
Copy link
Member Author

pmatilai commented Feb 13, 2025

Note that last I heard, git makes no promises that the output of git archive will forever be reproducible either, although I don't think it has changed in practice. IIRC github changed their archive generation a while ago, then backed off from it.

Well, I said as much in the description:
We want our source releases to be bit per bit identical to what you get straight out of git, with zero build steps to generate content, defined by a git tag. We still want a stable archive of that content generated and hosted on rpm.org because, GH archive creation could change any day and render checksums unverifiable.

The bit-per-bit output of git-archive may change and make the exact archive non-reproducible at an unknown point in the future, but the actual contents will still match bit-per-bit, and that's what ultimately matters. And we don't have that now, because the source releases contain some amount of built data.

@dmnks
Copy link
Contributor

dmnks commented Feb 13, 2025

I'm getting the same checksum on multiple downloads of the same tarball here

Having slept on it, I realized this doesn't mean anything; even if GitHub generated the archive on-the-fly for every request, git archive (which it reportedly uses underneath) would still produce the same bit-by-bit archive every time, of course.

not sure if we could rely on it never changing (for the given release)

According to this LWN article (and the associated GitHub blog post), this is indeed not guaranteed:

GitHub doesn’t guarantee the stability of checksums for automatically generated archives. These are marked with the words “Source code (zip)” and “Source code (tar.gz)” on the Releases tab. If you need to rely on a consistent checksum, you may upload archives directly to GitHub Releases. These are guaranteed not to change.

Thus, we just need to continue producing our own tarballs, even if we start doing GitHub releases.

@dmnks
Copy link
Contributor

dmnks commented Feb 14, 2025

Just stumbled upon a follow-up blog post by GitHub that answers and clarifies all of my above questions and ponderings: https://github.blog/open-source/git/update-on-the-future-stability-of-source-code-archives-and-hashes/

@dmnks
Copy link
Contributor

dmnks commented Feb 14, 2025

Implemented the above via #3576 as it's pretty much independent of the other subtasks (e.g. making the tarball pristine). Once we do those, we'll just update the yaml file accordingly (if needed at all, we'll probably keep the make dist target anyway).

@pmatilai pmatilai added this to the 6.0.0 beta milestone Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build-system related release Release creation
Projects
Status: Backlog
Development

No branches or pull requests

3 participants