forked from linkerd/linkerd
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add a retry budget to flaky tests (linkerd#1872)
A number of Linkerd's end-to-end tests are known to sometimes fail spuriously on CI (linkerd#1224, linkerd#1225, linkerd#1504, etc). If a test known to be flaky fails, the build is typically restarted, sometimes repeatedly if the test fails twice in a row. This is problematic for the following reasons: + This lengthens the CI feedback loop considerably, as we need to recompile Linkerd and re-run *all* the other tests, as well as re-running the failed test. + Furthermore, rerunning the whole test suite also creates an opportunity for *other* flaky tests to also fail, prolonging the process. + Spurious failures can decrease overall confidence in the test suite. + If a new contributor makes a minor change in a pull request and a flaky test fails, the new contributor typically doesn't know which test failures are spurious, and may assume that their changes somehow caused an unrelated part of the codebase to break. This is an uncomfortable experience, particularly if it takes a few hours for someone "in the know" to notice and restart the build. This PR adds a retry budget to tests known to be flaky. It builds upon ScalaTest's [`Retries`](http://doc.scalatest.org/3.0.0/index.html#org.scalatest.Retries) trait, but adds the notion of a retry *budget*, rather than retrying tests only a single time. This is probably necessary as we've observed some of the flaky tests to fail spuriously multiple times in a row. My rationale behind this is as follows: obviously we'd all like to write tests that never fail spuriously. With that said, it's proved very difficult to identify the cause behind these flaky test failures, so it doesn't seem like this problem will be fixed easily any time soon. Alternatively, we could consider skipping the known flaky tests entirely on CI. However, a pretty large number of our E2E tests have been observed to fail spuriously at least occasionally, and I, for one, would be uncomfortable skipping such a large portion of the test suite on CI. Finally, note that the retry budget behaviour will result in more or less the same outcome as our current manual restarting of failed flaky tests, but will shorten the feedback loop significantly, and won't result in broken builds due to flaky test failures. Fixes linkerd#1225. Fixes linkerd#1504. Signed-off-by: Eliza Weisman <[email protected]>
- Loading branch information
Showing
8 changed files
with
92 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 changes: 3 additions & 3 deletions
6
grpc/interop/src/test/scala/io/buoyant/grpc/interop/NetworkedEndToEndTest.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
27 changes: 27 additions & 0 deletions
27
test-util/src/main/scala/io/buoyant/test/BudgetedRetries.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
package io.buoyant.test | ||
|
||
import org.scalatest.{Canceled, Failed, Outcome, Retries} | ||
|
||
/** | ||
* Mixin trait for tests to support a retry budget. | ||
*/ | ||
trait BudgetedRetries extends FunSuite with Retries { | ||
|
||
/** | ||
* The number of retries permitted before a test is failed. | ||
* | ||
* Tests that mix in `BudgetedRetries` | ||
*/ | ||
def retries = 4 | ||
|
||
override def withFixture(test: NoArgTest) = | ||
if (isRetryable(test)) withRetries(test, retries) | ||
else super.withFixture(test) | ||
|
||
private[this] def withRetries (test: NoArgTest, remaining: Int): Outcome = | ||
super.withFixture(test) match { | ||
case Failed(_) | Canceled(_) if remaining == 1 => super.withFixture(test) | ||
case Failed(_) | Canceled(_) => withRetries(test, remaining - 1) | ||
case other => other | ||
} | ||
} |