Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration: increase timeout to mitigate TestBasicGhaCacheImportExport flakiness #5575

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MorrisLaw
Copy link

This PR adds an increased timeout to the TestBasicGhaCacheImportExport integration test in an effort to mitigate flakiness highlighted in the linked issue. Instead of an "ExtraTimeout" of just 15 minutes (3 * maxTimeout where maxTimeout equals 5 minutes), this now goes as high as 30 minutes by updating the multiplier to 6. This should theoretically be more than enough time to allow the test to complete successfully if the flake is truly rate-limited/time based (i.e. "We encounter this one when a bunch of jobs are running on CI and therefore we got rate-limited in GHA cache backend with this test.").

Any more time would likely require deeper investigation into figuring out how to more adequately handle the rate limiting issue.

Related to #5494

@tonistiigi
Copy link
Member

We might need some other way to handle this test. Setting the timeout to 30min would mean that likely the whole suite would already timeout.

@tonistiigi
Copy link
Member

#5494 is before the ExtraTimeout test-based timeout and an example of the whole suite getting a timeout.

@MorrisLaw
Copy link
Author

@tonistiigi ok, thanks for the context. Would this be a new timeout added or is there a particular area of the code I should be looking at instead?

@tonistiigi
Copy link
Member

tonistiigi commented Dec 12, 2024

The timeouts we have for full testsuites are in https://github.com/moby/buildkit/blob/master/.github/workflows/.test.yml#L104 .

I'm not sure what is the best way to approach this and how we can outsmart the github limits. Maybe it would be better to move the GHA test to its own suite, so it at least has a full 30min to itself and doesn't fail other suites if it goes over limit. @crazy-max Any ideas?

@crazy-max
Copy link
Member

The timeouts we have for full testsuites are in https://github.com/moby/buildkit/blob/master/.github/workflows/.test.yml#L104 .

I'm not sure what is the best way to approach this and how we can outsmart the github limits. Maybe it would be better to move the GHA test to its own suite, so it at least has a full 30min to itself and doesn't fail other suites if it goes over limit. @crazy-max Any ideas?

Hum yeah this integration test is problematic as we rely on GHA rate limitation. Would really like to replicate their backend but there are some discrepancies with azblob api. So yeah maybe better to have it in its own suite for isolating this test

@MorrisLaw
Copy link
Author

Ok, so the consensus seems to be to add a test suite for this specific test to mitigate timeout issues. There are some pieces I'm not familiar with w.r.t. the testing structure. Do we have an example PR, file or documentation around how to add a new test suite? @crazy-max @tonistiigi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants