Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifact(download): skip non-zip files #1874

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions packages/artifact/__tests__/download-artifact.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,18 @@ const cleanup = async (): Promise<void> => {
const mockGetArtifactSuccess = jest.fn(() => {
const message = new http.IncomingMessage(new net.Socket())
message.statusCode = 200
message.headers['content-type'] = 'zip'
message.push(fs.readFileSync(fixtures.exampleArtifact.path))
message.push(null)
return {
message
}
})

const mockGetArtifactGzip = jest.fn(() => {
const message = new http.IncomingMessage(new net.Socket())
message.statusCode = 200
message.headers['content-type'] = 'application/gzip'
message.push(fs.readFileSync(fixtures.exampleArtifact.path))
message.push(null)
return {
Expand All @@ -124,6 +136,7 @@ const mockGetArtifactFailure = jest.fn(() => {
const mockGetArtifactMalicious = jest.fn(() => {
const message = new http.IncomingMessage(new net.Socket())
message.statusCode = 200
message.headers['content-type'] = 'zip'
message.push(fs.readFileSync(path.join(__dirname, 'fixtures', 'evil.zip'))) // evil.zip contains files that are formatted x/../../etc/hosts
message.push(null)
return {
Expand Down Expand Up @@ -178,6 +191,7 @@ describe('download-artifact', () => {
)
expectExtractedArchive(fixtures.workspaceDir)
expect(response.downloadPath).toBe(fixtures.workspaceDir)
expect(response.skipped).toBe(false)
})

it('should not allow path traversal from malicious artifacts', async () => {
Expand Down Expand Up @@ -231,6 +245,7 @@ describe('download-artifact', () => {
).toBe(true)

expect(response.downloadPath).toBe(fixtures.workspaceDir)
expect(response.skipped).toBe(false)
})

it('should successfully download an artifact to user defined path', async () => {
Expand Down Expand Up @@ -280,6 +295,7 @@ describe('download-artifact', () => {
)
expectExtractedArchive(customPath)
expect(response.downloadPath).toBe(customPath)
expect(response.skipped).toBe(false)
})

it('should fail if download artifact API does not respond with location', async () => {
Expand Down Expand Up @@ -316,6 +332,7 @@ describe('download-artifact', () => {
// mock http client to delay response data by 30s
const msg = new http.IncomingMessage(new net.Socket())
msg.statusCode = 200
msg.headers['content-type'] = 'zip'

const mockGet = jest.fn(async () => {
return new Promise((resolve, reject) => {
Expand Down Expand Up @@ -444,7 +461,39 @@ describe('download-artifact', () => {
)
expect(mockGetArtifactSuccess).toHaveBeenCalledTimes(1)
expect(response.downloadPath).toBe(fixtures.workspaceDir)
expect(response.skipped).toBe(false)
}, 28000)

it('should skip if artifact does not have the right content type', async () => {
const downloadArtifactMock = github.getOctokit(fixtures.token).rest
.actions.downloadArtifact as MockedDownloadArtifact
downloadArtifactMock.mockResolvedValueOnce({
headers: {
location: fixtures.blobStorageUrl
},
status: 302,
url: '',
data: Buffer.from('')
})

const mockHttpClient = (HttpClient as jest.Mock).mockImplementation(
() => {
return {
get: mockGetArtifactGzip
}
}
)

const response = await downloadArtifactPublic(
fixtures.artifactID,
fixtures.repositoryOwner,
fixtures.repositoryName,
fixtures.token
)

expect(mockHttpClient).toHaveBeenCalledWith(getUserAgentString())
expect(response.skipped).toBe(true)
})
})

describe('internal', () => {
Expand Down Expand Up @@ -499,6 +548,7 @@ describe('download-artifact', () => {

expectExtractedArchive(fixtures.workspaceDir)
expect(response.downloadPath).toBe(fixtures.workspaceDir)
expect(response.skipped).toBe(false)
expect(mockHttpClient).toHaveBeenCalledWith(getUserAgentString())
expect(mockListArtifacts).toHaveBeenCalledWith({
idFilter: {
Expand Down Expand Up @@ -550,6 +600,7 @@ describe('download-artifact', () => {

expectExtractedArchive(customPath)
expect(response.downloadPath).toBe(customPath)
expect(response.skipped).toBe(false)
expect(mockHttpClient).toHaveBeenCalledWith(getUserAgentString())
expect(mockListArtifacts).toHaveBeenCalledWith({
idFilter: {
Expand Down
38 changes: 24 additions & 14 deletions packages/artifact/src/internal/download/download-artifact.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,11 @@ async function exists(path: string): Promise<boolean> {
}
}

async function streamExtract(url: string, directory: string): Promise<void> {
async function streamExtract(url: string, directory: string): Promise<boolean> {
let retryCount = 0
while (retryCount < 5) {
try {
await streamExtractExternal(url, directory)
return
return await streamExtractExternal(url, directory)
} catch (error) {
retryCount++
core.debug(
Expand All @@ -59,18 +58,23 @@ async function streamExtract(url: string, directory: string): Promise<void> {
export async function streamExtractExternal(
url: string,
directory: string
): Promise<void> {
): Promise<boolean> {
const client = new httpClient.HttpClient(getUserAgentString())
const response = await client.get(url)
if (response.message.statusCode !== 200) {
throw new Error(
`Unexpected HTTP response from blob storage: ${response.message.statusCode} ${response.message.statusMessage}`
)
} else if (response.message.headers['content-type'] !== 'zip') {
core.debug(
`Invalid content-type: ${response.message.headers['content-type']}, skipping download`
)
return false
}

const timeout = 30 * 1000 // 30 seconds

return new Promise((resolve, reject) => {
return new Promise<boolean>((resolve, reject) => {
const timerFn = (): void => {
response.message.destroy(
new Error(`Blob storage chunk did not respond in ${timeout}ms`)
Expand All @@ -92,7 +96,7 @@ export async function streamExtractExternal(
.pipe(unzip.Extract({path: directory}))
.on('close', () => {
clearTimeout(timer)
resolve()
resolve(true)
})
.on('error', (error: Error) => {
reject(error)
Expand Down Expand Up @@ -140,13 +144,16 @@ export async function downloadArtifactPublic(

try {
core.info(`Starting download of artifact to: ${downloadPath}`)
await streamExtract(location, downloadPath)
core.info(`Artifact download completed successfully.`)
if (await streamExtract(location, downloadPath)) {
core.info(`Artifact download completed successfully.`)
return {downloadPath, skipped: false}
} else {
core.info(`Artifact download skipped.`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user requests a specific artifact to download and that artifact can't be downloaded, why should that silently fail?

Shouldn't that be an error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reluctant to change the default behavior here as it could be a breaking change for users.

Copy link
Author

@crazy-max crazy-max Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc the API only expects zip content type for extraction when used with actions/download-artifact@v4:

.pipe(unzip.Extract({path: directory}))

And it currently breaks workflows when artifacts with other content-type are being downloaded.

I guess we could create an error type if content type does not match so actions/download-artifact can catch it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought silently skipping unrelated artifacts not uploaded with actions/upload-artifact would be best so it doesn't require any changes in actions/download-artifact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use pattern input to filter out any artifacts that are not created by upload-artifact action in this case? (credit to @joshmgross for this suggestion as well!)

Copy link
Author

@crazy-max crazy-max Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yacaovsnc

Maybe an exclude pattern such as !**/*.gzip if all docker uploaded artifacts are gzips?

It is uploaded as .dockerbuild file: https://github.com/docker/build-push-action/actions/runs/12707375438/job/35422185205#step:6:22

But azure blob storage or GitHub upload backend enforces .zip extension somehow when downloaded from Summary page: https://github.com/docker/build-push-action/actions/runs/12707375438/artifacts/2412083382

$ file docker~build-push-action~QFMCC3.dockerbuild.zip 
docker~build-push-action~QFMCC3.dockerbuild.zip: gzip compressed data, original size modulo 2^32 311808

Of if the artifacts have a common name pattern, it can also be matched? Is there a pattern for docker uploaded artifacts?

I don't think that's reliable. Best is checking the content type imo.

Also, if there are customer who rely on download docker uploaded artifacts, we are breaking them either way. Would it be possible to actually upload zips? I understand this maybe difficult too.

Zip compression is less efficient and may result in larger files for the same data and uses a less efficient compression algorithm compared to gzip. Also zip does not handle Unix-specific metadata (e.g., permissions, ownership, symbolic links). But we could encapsulate our tarball within the zip file but that sounds hacky. Doing so would also break Docker Desktop users trying to import builds with an unknown format. I defer to @colinhemmings @thompson-shaun.

What's wrong with excluding artifacts that don't have the expected content-type during extraction? If someone else using the API to upload another content-type, people using actions/download-artifact would have the same issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with excluding artifacts that don't have the expected content-type during extraction? If someone else using the API to upload another content-type, people using actions/download-artifact would have the same issue.

I think the issue is it blurs the line between a real exception and an "expected" exception? Consider the same scenario where a user uses API to upload, and uses download-artifact to verify the content. Everything could be working until someone accidentally modify the upload portion to point to a wrong file, with this change it will be hard for those users to catch the mistake?

pattern exists to include and exclude files, and to me it fits the current situation and it could solve our issue without any code change? A exclude pattern of !**/*.dockerbuild seems could do the job?

Copy link
Author

@crazy-max crazy-max Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is it blurs the line between a real exception and an "expected" exception? Consider the same scenario where a user uses API to upload, and uses download-artifact to verify the content.

The API only expects zip content type for extraction when used with actions/download-artifact@v4:

.pipe(unzip.Extract({path: directory}))

So I'm not sure why this would be expected. I think this is an oversight in the implementation to skip unsupported content-types.

pattern exists to include and exclude files, and to me it fits the current situation and it could solve our issue without any code change? A exclude pattern of !**/*.dockerbuild seems could do the job?

So what you mean is having this pattern set as default in actions/download-artifact?: https://github.com/actions/download-artifact/blob/533298bc57c27f112a2c04a74a04a4d43e2866fd/action.yml#L11-L13

If in the future we change the filename it would break people using actions/download-artifact. The problem arises from actions/download-artifact downloading all artifacts if no name/pattern is defined.

Maybe I miss something but I think it needs code changes in your toolkit or in actions/download-artifact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what you mean is having this pattern set as default in actions/download-artifact?

No, customers must manually update their workflow to specify the exclusion pattern. We need to provide guidance on how to properly configure the download step to skip over docker produced artifacts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not sure why this would be expected. I think this is an oversight in the implementation to skip unsupported content-types.

I suspect the disagreement here is fundamentally - "should the 1st party artifact actions be compatible with artifacts created by other actions?"

The current implementation says no - thus users should avoid downloading artifacts that weren't uploaded by actions/upload-artifact.

The behavior change to skip non-zip files does not make them compatible, but it would at least avoid some friction due to incompatibility.

I'm still not convinced skipping these downloads is the right approach though - changing an explicit failure to a silent one could surprise users and increase friction.

return {downloadPath, skipped: true}
}
} catch (error) {
throw new Error(`Unable to download and extract artifact: ${error.message}`)
}

return {downloadPath}
}

export async function downloadArtifactInternal(
Expand Down Expand Up @@ -192,13 +199,16 @@ export async function downloadArtifactInternal(

try {
core.info(`Starting download of artifact to: ${downloadPath}`)
await streamExtract(signedUrl, downloadPath)
core.info(`Artifact download completed successfully.`)
if (await streamExtract(signedUrl, downloadPath)) {
core.info(`Artifact download completed successfully.`)
return {downloadPath, skipped: false}
} else {
core.info(`Artifact download skipped.`)
return {downloadPath, skipped: true}
}
} catch (error) {
throw new Error(`Unable to download and extract artifact: ${error.message}`)
}

return {downloadPath}
}

async function resolveOrCreateDirectory(
Expand Down
5 changes: 5 additions & 0 deletions packages/artifact/src/internal/shared/interfaces.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,11 @@ export interface DownloadArtifactResponse {
* The path where the artifact was downloaded to
*/
downloadPath?: string

/**
* If the artifact download was skipped
*/
skipped?: boolean
}

/**
Expand Down