Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COPY --link still invalidates cache of successor COPY --link #5297

Open
hholst80 opened this issue Sep 4, 2024 · 4 comments
Open

COPY --link still invalidates cache of successor COPY --link #5297

hholst80 opened this issue Sep 4, 2024 · 4 comments

Comments

@hholst80
Copy link

hholst80 commented Sep 4, 2024

I am building an image by assembling it from parts using COPY.

The final assembly is simply a list of COPY --link --from=... statements like

FROM ironforge.sh/stage1:$tag AS prefinal
COPY --link --from=ironforge.sh/stage2/m4:$tag / /
COPY --link --from=ironforge.sh/stage2/ncurses:$tag / /
COPY --link --from=ironforge.sh/stage2/bash:$tag / /
COPY --link --from=ironforge.sh/stage2/coreutils:$tag / /
COPY --link --from=ironforge.sh/stage2/diffutils:$tag / /
COPY --link --from=ironforge.sh/stage2/file:$tag / /
COPY --link --from=ironforge.sh/stage2/findutils:$tag / /
COPY --link --from=ironforge.sh/stage2/gawk:$tag / /
COPY --link --from=ironforge.sh/stage2/grep:$tag / /
COPY --link --from=ironforge.sh/stage2/gzip:$tag / /
COPY --link --from=ironforge.sh/stage2/make:$tag / /
COPY --link --from=ironforge.sh/stage2/patch:$tag / /
COPY --link --from=ironforge.sh/stage2/sed:$tag / /
COPY --link --from=ironforge.sh/stage2/tar:$tag / /
COPY --link --from=ironforge.sh/stage2/xz:$tag / /
COPY --link --from=ironforge.sh/stage2/binutils:$tag / /
COPY --link --from=ironforge.sh/stage2/gcc:$tag / /
COPY --link --from=ironforge.sh/stage2/skel:$tag / /

Suppose I now change the findutils image, I see this in the build log:

=> CACHED FROM ironforge.sh/stage2/gawk:latest                            0.0s
 => CACHED FROM ironforge.sh/stage2/grep:latest                            0.0s
 => CACHED [prefinal  2/19] COPY --link --from=ironforge.sh/stage2/m4:lat  0.0s
 => CACHED [prefinal  3/19] COPY --link --from=ironforge.sh/stage2/ncurse  0.0s
 => CACHED [prefinal  4/19] COPY --link --from=ironforge.sh/stage2/bash:l  0.0s
 => CACHED [prefinal  5/19] COPY --link --from=ironforge.sh/stage2/coreut  0.0s
 => CACHED [prefinal  6/19] COPY --link --from=ironforge.sh/stage2/diffut  0.0s
 => CACHED [prefinal  7/19] COPY --link --from=ironforge.sh/stage2/file:l  0.0s
 => [prefinal  8/19] COPY --link --from=ironforge.sh/stage2/findutils:lat  0.1s
 => [prefinal  9/19] COPY --link --from=ironforge.sh/stage2/gawk:latest /  0.1s
 => [prefinal 10/19] COPY --link --from=ironforge.sh/stage2/grep:latest /  0.1s
 => [prefinal 11/19] COPY --link --from=ironforge.sh/stage2/gzip:latest /  0.1s
 => [prefinal 12/19] COPY --link --from=ironforge.sh/stage2/make:latest /  0.1s
 => [prefinal 13/19] COPY --link --from=ironforge.sh/stage2/patch:latest   0.1s
 => [prefinal 14/19] COPY --link --from=ironforge.sh/stage2/sed:latest /   0.1s
 => [prefinal 15/19] COPY --link --from=ironforge.sh/stage2/tar:latest /   0.1s
 => [prefinal 16/19] COPY --link --from=ironforge.sh/stage2/xz:latest / /  0.1s
 => [prefinal 17/19] COPY --link --from=ironforge.sh/stage2/binutils:late  0.1s
 => [prefinal 18/19] COPY --link --from=ironforge.sh/stage2/gcc:latest /   0.1s
 => [prefinal 19/19] COPY --link --from=ironforge.sh/stage2/skel:latest /  0.1s
 => [stage2 1/1] COPY --from=prefinal /mnt/lfs/ /                          2.1s

Why would the COPY commands below the find be invalidated? They are simply going to be merged at the end of the build stage so I don't see why it would matter or not.

@tonistiigi
Copy link
Member

Do you have commands to replicate this from start to finish?

What installation of buildkit was this? Note that buildkit embedded into dockerd without containerd image store https://docs.docker.com/desktop/containerd/ does not support COPY --link cache semantics and will perform old-style copy instead.

@hholst80
Copy link
Author

hholst80 commented Sep 16, 2024

Sorry @tonistiigi for the super-slow feedback on my side.

A short repro:

mkdir foo bar
touch foo/foo bar/bar
cat > Dockerfile <<HERE
FROM scratch
COPY --link --from=foo / /
COPY --link --from=bar / /
HERE

Build and populate cache.

docker build --progress plain --build-context foo=foo --build-context bar=bar .

Now, change the foo context

touch foo/baz

This will invalidate both COPY statements in the build:

docker build --progress plain --build-context foo=foo --build-context bar=bar .

@tonistiigi
Copy link
Member

The COPY --link step internally executes copy and merge steps that in progressbar are combined together. In your case the copy would be cached but the merge still needs to run (shows "merging 0.0s done" if it does). The difference is that merging is making up a array of layers and performance does not depend on the layer size nor is the data needed locally for this operation.

Eg. if you change to

FROM alpine AS slow
RUN sleep 10
FROM scratch
COPY --link --from=foo / /
COPY --link --from=slow /etc/passwd /

Run once, then look up the cache record for sleep 10 from buildx du --verbose. Eg.

ID:		6koxlyl52m8vr01zpj0pi4ekj
Parent:		pavgcfk49twndmxttnrdmq24j
Created at:	2024-09-16 20:20:08.057463974 +0000 UTC
Mutable:	true
Reclaimable:	true
Shared:		false
Size:		0B
Description:	mount / from exec /bin/sh -c sleep 10
Usage count:	1
Last used:	9 seconds ago
Type:		regular

Delete this record docker buildx prune --filter 'id=6koxlyl52m8vr01zpj0pi4ekj', touch a file in foo and run the build again. The sleep will not run anymore even though there is no cache for it. It is because the layer from previous COPY --link is used directly and no copy of files needs to run again.

@hholst80
Copy link
Author

So the step 8-19 are actually ONE thing happening taking 0.1 seconds in total and not 0.1 seconds per step? Because there is a big difference, these 0.1 seconds adds up if have have 10..100 stages or contexts in my build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants