Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate the search space #175

Open
symbiont-stevan-andjelkovic opened this issue Mar 19, 2021 · 3 comments
Open

Estimate the search space #175

symbiont-stevan-andjelkovic opened this issue Mar 19, 2021 · 3 comments
Milestone

Comments

@symbiont-stevan-andjelkovic
Copy link
Contributor

In order to display progress and to gauge how well our lineage-driven optimisations are working we should try to estimate the search space.

For a first draft, that also includes notes about the tricky parts, see #174.

@symbiont-stevan-andjelkovic symbiont-stevan-andjelkovic added this to the v0.1 milestone Mar 19, 2021
@symbiont-stevan-andjelkovic
Copy link
Contributor Author

symbiont-stevan-andjelkovic commented Mar 26, 2021

In PR #184 we added an explanation of how to estimate the random search fault
space given a failure specification, given one concrete test run.

There are several ways to improve on this.

For starters we need to figure out how many interesting concrete programs there are
to being with. Typically property based testing focuses on this aspect, i.e.
generate random test cases, but because in distributed systems a very small test
case, e.g. a write followed by a read, has so much non-determinism this aspect
is less important than exploring the non-determinism.

It would also be interesting to estimate what the fault space is if the random
search is augmented with information about the network trace (i.e. which events
actually happened in the concrete run) and memory of which faults it already
tried in previous runs. This would make it closer to what lineage-driven fault
injection (LDFI) does, but with very simple constraints (don't try what you've
already tried, as opposed to try to eliminate possibilities based on what we
know). The tricky part here is that as we introduce faults, they might trigger
retries which spawn new messages that were not in the original network trace. We
could perhaps make an estimate given that no new messages are spawned at the
start of the test, and then in addition calculate how many new messages are
spawned per run on average and how much they enlarge the fault space as we run
the test.

To make the estimate closer to what actually happens during LDFI we would need
to take all the constraints it produces into account, which seems hard. A
possible alternative is to get some statistics from the SAT solver itself about
its estimated search space? This would enable us to compare how efficient LDFI
is verses random search or even one version of LDFI vs another with slightly
tweaked constraints.

It would also be interesting, but hard, to take symmetries or equivalence
classes of states into account, e.g. doesn't it really matter which of the 4
follower nodes we choose to crash?

A simpler problem would be to run a sweep search where we start with a small
failure specification and iteratively make it bigger until we reach some
timeout, and basically see how far we get as a measure of how efficient it is
for a bug free program. By analogy this would be closer to a benchmark, while
properly estimating the search space is closer to calculating the complexity of
the algorithm.

@symbiont-stevan-andjelkovic
Copy link
Contributor Author

For an explanation of how LDFI cuts down the fault space and how to estimate the number of runs needed see #196.

@symbiont-stevan-andjelkovic
Copy link
Contributor Author

For starters we need to figure out how many interesting concrete programs there are to being with.

One possible way to compute this would be to: identify test case equivalence classes up to a certain depth, e.g. write(k1, v1) || write(k2, v2); ... is equivalent to write(k2, v2) || write(k1, v1); .... This can perhaps be done by generating a new random test case, run it to produce a history and a model, hash the response part of the history and the model, repeat, all test cases with the same hash are in the same equivalence class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant