Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for hashed mode to Poudriere #751

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

allanjude
Copy link
Member

Requires the related patches to pkg and the ports tree

src/share/poudriere/common.sh Outdated Show resolved Hide resolved
@bdrewery
Copy link
Member

Can you also give more context for what this is?

@allanjude
Copy link
Member Author

Can you also give more context for what this is?

When combined with freebsd/pkg#1829

This creates a pkg repo that looks like this:

#ls -al /usr/local/poudriere/data/packages/121amd64-default/All/
total 12021
drwxr-xr-x  2 root    wheel       16 Apr 13 20:06 ./
drwxr-xr-x  4 root    wheel        9 Apr 13 20:06 ../
-rw-r--r--  1 nobody  wheel   161820 Apr 13 19:27 gettext-runtime-0.20.1+ec3887d3a2.txz
lrwxr-xr-x  1 nobody  wheel       37 Apr 13 19:27 gettext-runtime-0.20.1.txz@ -> gettext-runtime-0.20.1+ec3887d3a2.txz
-rw-r--r--  1 nobody  wheel  2527876 Apr 13 19:29 gettext-tools-0.20.1_1+7101c9ffe5.txz
lrwxr-xr-x  1 nobody  wheel       37 Apr 13 19:29 gettext-tools-0.20.1_1.txz@ -> gettext-tools-0.20.1_1+7101c9ffe5.txz
-rw-r--r--  1 nobody  wheel     5828 Apr 13 19:19 indexinfo-0.3.1+1cd9c1a735.txz
lrwxr-xr-x  1 nobody  wheel       30 Apr 13 19:19 indexinfo-0.3.1.txz@ -> indexinfo-0.3.1+1cd9c1a735.txz
-rw-r--r--  1 nobody  wheel   387088 Apr 13 19:27 libtextstyle-0.20.1+6c117ad74e.txz
lrwxr-xr-x  1 nobody  wheel       34 Apr 13 19:27 libtextstyle-0.20.1.txz@ -> libtextstyle-0.20.1+6c117ad74e.txz
-rw-r--r--  1 nobody  wheel   236640 Apr 13 20:04 nano-4.8+74c3c14712.txz
lrwxr-xr-x  1 nobody  wheel       23 Apr 13 20:04 nano-4.8.txz@ -> nano-4.8+74c3c14712.txz
-rw-r--r--  1 nobody  wheel  8787688 Apr 13 20:06 pkg-1.13.99.7.l+46fd66c8a7.txz
lrwxr-xr-x  1 nobody  wheel       30 Apr 13 20:06 pkg-1.13.99.7.l.txz@ -> pkg-1.13.99.7.l+46fd66c8a7.txz
-rw-r--r--  1 nobody  wheel    35452 Apr 13 19:54 zxfer-1.1.7+2e39bad872.txz
lrwxr-xr-x  1 nobody  wheel       26 Apr 13 19:54 zxfer-1.1.7.txz@ -> zxfer-1.1.7+2e39bad872.txz

So when you install a package, it fetches the file with the hash in the URL:

# pkg install zxfer
Updating Test repository catalogue...
Test repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
        zxfer: 1.1.7

Number of packages to be installed: 1

35 KiB to be downloaded.

Proceed with this action? [y/N]: y
[1/1] Fetching zxfer-1.1.7+2e39bad872.txz: 100%   35 KiB  35.5kB/s    00:01
Checking integrity... done (0 conflicting)
[1/1] Installing zxfer-1.1.7...
[1/1] Extracting zxfer-1.1.7: 100%

This will allow package repositories to be served from CDNs and web caches, since the unique hash in the filename will avoid the need for cache invalidation on the actual package files, and a short lifetime on the pkg meta files is all that would be required.

src/share/poudriere/common.sh Outdated Show resolved Hide resolved
@allanjude
Copy link
Member Author

@bapt with freebsd/pkg@36dfb48 merged into pkg, I've refreshed this patch to add a -H flag to poudriere bulk, which builds a repo using the hashed mode.

It currently implies --symlink as well, because poudriere doesn't find the already build packages during an incremental without it yet.

@allanjude
Copy link
Member Author

The change committed to pkg is different than the original proposal (create hashed filenames during pkg create). The version that was merged to pkg is for pkg repo which does all the work in one step at the end, and requires a lot fewer changes to poudriere that way.

@darkfiberiru
Copy link

darkfiberiru commented Sep 16, 2020

@allanjude Can we also have a poudriere.conf knob. Below is from my attempt I will write and test an additional commit to work with yours to do that.

# Have pkg create hashed versions of the pkg filenames with symlinks to
# original pkg names. The packagesite.yaml file will point to the hashed version
 # of these files. By using hashed pkg filenames, this allows users to lazily
 # synchronise packages without conflicting with the current packages,
 # for example using rsync or CDNs.  Once the packages are synced the much> 
# smaller meta files can then be synced. Allowing a near atomic update of repo.
 # On caching cdn this means a need to purge 2-5 files instead of all pkgs that
 # have been updated.
 #PKG_HASH=no**

@darkfiberiru
Copy link

@allanjude As discussed oob I will try to get a patch version that includes -H flag or poudriere.conf options and generate a new pr/reopen #786

@igalic
Copy link
Contributor

igalic commented Dec 21, 2020

@darkfiberiru do you still have plans to pick this up again?
it seems Allan is perpetually busy with something else

@allanjude allanjude force-pushed the hashed_pkgs branch 2 times, most recently from a59ca49 to 30c4b23 Compare October 31, 2021 19:10
@allanjude
Copy link
Member Author

One thing I noticed, with the new default config, the pkgs get owned by 'nobody', but the symlink's to the hashed versions are owned by root.
Is this a case of pkg repo should be run as nobody, or that we just need to do a chown after pkg repo?

I notice packagesite etc are not owned by nobody.

Creates the repo with hash-based filenames to allow use of a CDN

Setting `PKG_HASH="yes"` in poudriere.conf will build a repo where
all of the packages are in All/Hashed/ and the repo manifest points there.

It also creates a set of symlinks in the All/ directory, but these
are purely for poudriere itself, to find dependencies. The symlinks
should NOT be published, only the Hashed/ directory is required.

This mechanism ensures that the package files themselves can be
cached by a CDN as the filename will change if the contents ever
differ.

The repo metadata files (those outside of All/) should be set
to have a very low cache expiration, so that when a new package
set is published they are updated and reflect the new packages.

Technically this feature also allows "previous" versions of packages
to continue to be available via the CDN, but that is a side-effect
not a purposeful feature.

Sponsored-by: Klara, Inc.
Sponsored-by: TitanHQ
@allanjude
Copy link
Member Author

I have refreshed this and solved the issues that have been reported (Poudriere was trying to delete the Hashed directory as it thought it was an orphaned package)

@bdrewery
Copy link
Member

bdrewery commented Jul 9, 2024

Looking at this today.

Copy link
Member

@bdrewery bdrewery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick changes requested


# Remount rw
# mount_nullfs does not support mount -u
umount ${UMOUNT_NONBUSY} ${MASTERMNT}/packages || \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 43cca93
s/${UMOUNT_NONBUSY}/-n/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I think you can just use remount_packages.

@@ -9542,6 +9560,11 @@ build_repo() {
sign_pkg pubkey "${PACKAGES:?}/Latest/pkg.${PKG_EXT}"
fi
fi

# Remount ro
umount ${UMOUNT_NONBUSY} ${MASTERMNT}/packages || \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UMOUNT_NONBUSY to -n here too. Or better remount_packages -o ro.

@@ -417,6 +421,8 @@ delete_pkg_xargs() {
# Delete the package and the depsfile since this package is being deleted,
# which will force it to be recreated
{
# If ${pkg} is a symlink, delete the target as well
[ -L "${pkg}" ] && echo $(realpath "${pkg}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid && as if it ends up being the last in a statement the statement can return non-zero with set -e. So we avoid it as a pattern because it sneaks in eventually.

# sh -c 'set -o pipefail; set -e; dolink() { [ -L / ] && true; }; dolink; echo done'; echo $?
1

@@ -28,7 +28,7 @@
.\"
.\" Note: The date here should be updated whenever a non-trivial
.\" change is made to the manual page.
.Dd July 5, 2022
.Dd September 26, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to more recent given the PR idle time.

mkdir -p ${MASTERMNT}/tmp/packages
if [ -n "${PKG_REPO_SIGNING_KEY}" ]; then
msg "Signing repository with key: ${PKG_REPO_SIGNING_KEY}"
install -m 0400 "${PKG_REPO_SIGNING_KEY}" \
"${MASTERMNT:?}/tmp/repo.key"
injail ${PKG_BIN:?} repo \
${PKG_REPO_FLAGS} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A goal I have is for set -u to work. Please use ${PKG_REPO_FLAGS-} here and in the next places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants