-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance with temporary variables #7013
Comments
Short answer : use an SMT solver as a back-end. Improving "constant" propagation can probably fix this case but I am not sure that it will fix everything like this. Long answer: If we use
Let's break down what is happening a bit: A. Symex aims to generate one equation per assignment. Each assignment creates a new solver variable for the value of a program variable after the assignment. This makes it easy to generate the trace but can (normally a small amount but in rare cases, a lot) make the formula bigger than strictly necessary. B. Rewriting is applied when it is being generated. This is what allows us to determine when C. To help improve the rewriting, we have a step known (misleadingly) as constant propagation. According to a heuristic it will sometimes replace a solver variable with it's value when building the assignment. For example:
will not even generate any VCs because of constant propagation. D. When the SAT back-end bit-blasts expressions, it caches them based on a structural hash. This can lead to simplifications when the same expression is used twice. So what is happening in the first case is that the assignments generate two different variables, the back-end bit-blasts the multiply for one assignment and then uses that cached one for the second, the equality simplifies away and the whole thing is trivial. In the second case the assignments are generated but the values are not constant propagated, so the multiplies are structurally different and it can't be simplified. Due to $REASONS, the SAT solver can't handle this expression and it times out. Proper solution : use an SMT solver. They will compute a union-find closure of equivalent variables, substitute and rewrite. That's enough to handle all of this kind of issue. Hacky work-around : Have a look at the cbmc/src/goto-symex/goto_state.h Line 64 in d69761e
Get direct assignments of the form x = y to add rename(y) as the value of x . Be prepared to hunt some strange downstream bugs and to have to do a substantial amount of benchmarking to see what it does to performance. Talk to @tautschnig before attempting.
Sounds like a good idea but is a lot : Add the union find and substitute (and rewrite) loop to the back-end solver. Hope that helps. |
For simple cases like this where a fresh variable is assigned only once and is not dirty wouldn't it be safe to propagate/substitute it in a single forward pass on the goto program, statically before SymEx ? |
@remi-delmas-3000 in principle it should work but in practice I would not advise it because: A. Adding extra passes, esp. those that mean
C. ... which is what goto-analyzer --vsd --simplify does but more importantly what is done during symex. The information you need to tell when things are safe is already there during.
If you want to do something like that I would recommend going the route of strengthening what |
In addition to what @martin-cs said: I am contemplating adding a circuit cache that should pick up the equality despite the temporary variables. But ultimately multiplication is fundamentally hard, and it will always be easy to construct an example that is very hard to solve. |
Perhaps mostly for entertainment: backwards substitution also solves problems like this trivially. |
"Cache expensive circuits for given input bvts" in #7001 now implements such a circuit cache, which indeed makes the provided example trivial to solve. |
I have a similar problem. CBMC version: 5.43.0
the variable 'IN' is initialized by '__VERIFIER_nondet_float()';
And this is not initialized version,
which the variable 'IN' is not initialized.
After applying the '--slice-formula' option, the reuslt was:
|
Notes as just discussed out-of-band:
|
with kissat v4.0.1 this is solved in under two seconds thanks to clausal congruence closure and sweeping
#include <assert.h>
int main() {
int x;
__CPROVER_assume(-46340 <= x && x <= 46340);
int y;
__CPROVER_assume(-46340 <= y && y <= 46340);
int t1 = x;
int t2 = y;
int o1 = t1 * t2;
int t3 = x;
int t4 = y;
int o2 = t3 * t4;
assert(o1 == o2);
return 0;
}
|
Impressive! |
The same applies to the upstream Kani example:
|
CBMC version: 5.61.0
Operating system: Ubuntu 20.04
CBMC manages to solve the following program quickly:
But if I inject temporary variables:
it runs for >10 minutes without terminating.
Are there possible optimizations/reductions that would allow discovering that the two expressions are equivalent?
Such an optimization would particularly be helpful for Kani because its codegen flow produces lots of temporary variables (e.g. due to SSA). This makes programs like this one model-checking/kani#1351 difficult to solve even though the Rust program itself does not involve temporary variables.
The text was updated successfully, but these errors were encountered: