Skip to content

Commit

Permalink
Merge pull request #3010 from amrmahdi/pranavp/azblob-rebase
Browse files Browse the repository at this point in the history
[remotecache] Add Azure Blob Storage support
  • Loading branch information
tonistiigi authored Sep 2, 2022
2 parents f1f9bde + 522573b commit 9114527
Show file tree
Hide file tree
Showing 257 changed files with 41,551 additions and 210 deletions.
23 changes: 23 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,29 @@ jobs:
run: |
hack/s3_test/run_test.sh
test-azblob:
runs-on: ubuntu-20.04
needs:
- base
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Expose GitHub Runtime
uses: crazy-max/ghaction-github-runtime@v2
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
version: ${{ env.BUILDX_VERSION }}
driver-opts: image=${{ env.REPO_SLUG_ORIGIN }}
buildkitd-flags: --debug
-
name: Test
run: |
hack/azblob_test/run_test.sh
test-os:
runs-on: ${{ matrix.os }}
strategy:
Expand Down
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ Join `#buildkit` channel on [Docker Community Slack](http://dockr.ly/slack)
- [Local directory](#local-directory-1)
- [GitHub Actions cache (experimental)](#github-actions-cache-experimental)
- [S3 cache (experimental)](#s3-cache-experimental)
- [Azure Blob Storage cache (experimental)](#azure-blob-storage-cache-experimental)
- [Consistent hashing](#consistent-hashing)
- [Metadata](#metadata)
- [Systemd socket activation](#systemd-socket-activation)
Expand Down Expand Up @@ -496,6 +497,50 @@ Others options are:
* `manifests_prefix=<prefix>`: set global prefix to store / read manifests on s3 (default: `manifests/`)
* `name=<manifest>`: name of the manifest to use (default `buildkit`)

#### Azure Blob Storage cache (experimental)

```bash
buildctl build ... \
--output type=image,name=docker.io/username/image,push=true \
--export-cache type=azblob,account_url=https://myaccount.blob.core.windows.net,name=my_image \
--import-cache type=azblob,account_url=https://myaccount.blob.core.windows.net,name=my_image
```

The following attributes are required:
* `account_url`: The Azure Blob Storage account URL (default: `$BUILDKIT_AZURE_STORAGE_ACCOUNT_URL`)

Storage locations:
* blobs: `<account_url>/<container>/<prefix><blobs_prefix>/<sha256>`, default: `<account_url>/<container>/blobs/<sha256>`
* manifests: `<account_url>/<container>/<prefix><manifests_prefix>/<name>`, default: `<account_url>/<container>/manifests/<name>`

Azure Blob Storage configuration:
* `container`: The Azure Blob Storage container name (default: `buildkit-cache` or `$BUILDKIT_AZURE_STORAGE_CONTAINER` if set)
* `blobs_prefix`: Global prefix to store / read blobs on the Azure Blob Storage container (`<container>`) (default: `blobs/`)
* `manifests_prefix`: Global prefix to store / read blobs on the Azure Blob Storage container (`<container>`) (default: `manifests/`)

Azure Blob Storage authentication:

There are 2 options supported for Azure Blob Storage authentication:

* Any system using environment variables supported by the [Azure SDK for Go](https://docs.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication). The configuration must be available for the buildkit daemon, not for the client.
* Secret Access Key, using the `secret_access_key` attribute to specify the primary or secondary account key for your Azure Blob Storage account. [Azure Blob Storage account keys](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage)

`--export-cache` options:
* `type=azblob`
* `mode=<min|max>`: specify cache layers to export (default: `min`)
* `min`: only export layers for the resulting image
* `max`: export all the layers of all intermediate steps
* `prefix=<prefix>`: set global prefix to store / read files on the Azure Blob Storage container (`<container>`) (default: empty)
* `name=<manifest>`: specify name of the manifest to use (default: `buildkit`)
* Multiple manifest names can be specified at the same time, separated by `;`. The standard use case is to use the git sha1 as name, and the branch name as duplicate, and load both with 2 `import-cache` commands.

`--import-cache` options:
* `type=azblob`
* `prefix=<prefix>`: set global prefix to store / read files on the Azure Blob Storage container (`<container>`) (default: empty)
* `blobs_prefix=<prefix>`: set global prefix to store / read blobs on the Azure Blob Storage container (`<container>`) (default: `blobs/`)
* `manifests_prefix=<prefix>`: set global prefix to store / read manifests on the Azure Blob Storage container (`<container>`) (default: `manifests/`)
* `name=<manifest>`: name of the manifest to use (default: `buildkit`)

### Consistent hashing

If you have multiple BuildKit daemon instances but you don't want to use registry for sharing cache across the cluster,
Expand Down
209 changes: 209 additions & 0 deletions cache/remotecache/azblob/exporter.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
package azblob

import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"time"

"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob"
"github.com/containerd/containerd/content"
"github.com/moby/buildkit/cache/remotecache"
v1 "github.com/moby/buildkit/cache/remotecache/v1"
"github.com/moby/buildkit/session"
"github.com/moby/buildkit/solver"
"github.com/moby/buildkit/util/compression"
"github.com/moby/buildkit/util/progress"
digest "github.com/opencontainers/go-digest"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
)

// ResolveCacheExporterFunc for "azblob" cache exporter.
func ResolveCacheExporterFunc() remotecache.ResolveCacheExporterFunc {
return func(ctx context.Context, g session.Group, attrs map[string]string) (remotecache.Exporter, error) {
config, err := getConfig(attrs)
if err != nil {
return nil, fmt.Errorf("failed to create azblob config: %v", err)
}

containerClient, err := createContainerClient(ctx, config)
if err != nil {
return nil, fmt.Errorf("failed to create container client: %v", err)
}

cc := v1.NewCacheChains()
return &exporter{
CacheExporterTarget: cc,
chains: cc,
containerClient: containerClient,
config: config,
}, nil
}
}

var _ remotecache.Exporter = &exporter{}

type exporter struct {
solver.CacheExporterTarget
chains *v1.CacheChains
containerClient *azblob.ContainerClient
config *Config
}

func (ce *exporter) Finalize(ctx context.Context) (map[string]string, error) {
config, descs, err := ce.chains.Marshal(ctx)
if err != nil {
return nil, err
}

for i, l := range config.Layers {
dgstPair, ok := descs[l.Blob]
if !ok {
return nil, errors.Errorf("missing blob %s", l.Blob)
}
if dgstPair.Descriptor.Annotations == nil {
return nil, errors.Errorf("invalid descriptor without annotations")
}
var diffID digest.Digest
v, ok := dgstPair.Descriptor.Annotations["containerd.io/uncompressed"]
if !ok {
return nil, errors.Errorf("invalid descriptor without uncompressed annotation")
}
dgst, err := digest.Parse(v)
if err != nil {
return nil, errors.Wrapf(err, "failed to parse uncompressed annotation")
}
diffID = dgst

key := blobKey(ce.config, dgstPair.Descriptor.Digest.String())

exists, err := blobExists(ctx, ce.containerClient, key)
if err != nil {
return nil, err
}

logrus.Debugf("layers %s exists = %t", key, exists)

if !exists {
layerDone := progress.OneOff(ctx, fmt.Sprintf("writing layer %s", l.Blob))
ra, err := dgstPair.Provider.ReaderAt(ctx, dgstPair.Descriptor)
if err != nil {
return nil, layerDone(fmt.Errorf("failed to get reader for %s: %v", dgstPair.Descriptor.Digest, err))
}
if err := ce.uploadBlobIfNotExists(ctx, key, content.NewReader(ra)); err != nil {
return nil, layerDone(err)
}
layerDone(nil)
}

la := &v1.LayerAnnotations{
DiffID: diffID,
Size: dgstPair.Descriptor.Size,
MediaType: dgstPair.Descriptor.MediaType,
}
if v, ok := dgstPair.Descriptor.Annotations["buildkit/createdat"]; ok {
var t time.Time
if err := (&t).UnmarshalText([]byte(v)); err != nil {
return nil, err
}
la.CreatedAt = t.UTC()
}
config.Layers[i].Annotations = la
}

dt, err := json.Marshal(config)
if err != nil {
return nil, fmt.Errorf("failed to marshal config: %v", err)
}

for _, name := range ce.config.Names {
if innerError := ce.uploadManifest(ctx, manifestKey(ce.config, name), bytesToReadSeekCloser(dt)); innerError != nil {
return nil, errors.Errorf("error writing manifest %s: %v", name, innerError)
}
}

return nil, nil
}

func (ce *exporter) Config() remotecache.Config {
return remotecache.Config{
Compression: compression.New(compression.Default),
}
}

// For uploading manifests, use the Upload API which follows "last writer wins" sematics
// This is slightly slower than UploadStream call but is safe to call concurrently from multiple threads. Refer to:
// https://github.com/Azure/azure-sdk-for-go/issues/18490#issuecomment-1170806877
func (ce *exporter) uploadManifest(ctx context.Context, manifestKey string, reader io.ReadSeekCloser) error {
defer reader.Close()
blobClient, err := ce.containerClient.NewBlockBlobClient(manifestKey)
if err != nil {
return errors.Wrap(err, "error creating container client")
}

ctx, cnclFn := context.WithTimeout(ctx, time.Minute*5)
defer cnclFn()

_, err = blobClient.Upload(ctx, reader, &azblob.BlockBlobUploadOptions{})
if err != nil {
return errors.Wrapf(err, "failed to upload blob %s: %v", manifestKey, err)
}

return nil
}

// For uploading blobs, use the UploadStream with access conditions which state that only upload if the blob
// does not already exist. Since blobs are content addressable, this is the right thing to do for blobs and it gives
// a performance improvement over the Upload API used for uploading manifests.
func (ce *exporter) uploadBlobIfNotExists(ctx context.Context, blobKey string, reader io.Reader) error {
blobClient, err := ce.containerClient.NewBlockBlobClient(blobKey)
if err != nil {
return errors.Wrap(err, "error creating container client")
}

uploadCtx, cnclFn := context.WithTimeout(ctx, time.Minute*5)
defer cnclFn()

// Only upload if the blob doesn't exist
eTagAny := azblob.ETagAny
_, err = blobClient.UploadStream(uploadCtx, reader, azblob.UploadStreamOptions{
BufferSize: IOChunkSize,
MaxBuffers: IOConcurrency,
BlobAccessConditions: &azblob.BlobAccessConditions{
ModifiedAccessConditions: &azblob.ModifiedAccessConditions{
IfNoneMatch: &eTagAny,
},
},
})

if err == nil {
return nil
}

var se *azblob.StorageError
if errors.As(err, &se) && se.ErrorCode == azblob.StorageErrorCodeBlobAlreadyExists {
return nil
}

return errors.Wrapf(err, "failed to upload blob %s: %v", blobKey, err)
}

var _ io.ReadSeekCloser = &readSeekCloser{}

type readSeekCloser struct {
io.Reader
io.Seeker
io.Closer
}

func bytesToReadSeekCloser(dt []byte) io.ReadSeekCloser {
bytesReader := bytes.NewReader(dt)
return &readSeekCloser{
Reader: bytesReader,
Seeker: bytesReader,
Closer: io.NopCloser(bytesReader),
}
}
Loading

0 comments on commit 9114527

Please sign in to comment.