Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HolSmt: fix Z3 proof replay in real arithmetic with rational coefficients #1188

Merged
merged 2 commits into from
Jan 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/HolSmt/Z3_ProofReplay.sml
Original file line number Diff line number Diff line change
Expand Up @@ -798,7 +798,7 @@ local
handle Feedback.HOL_ERR _ =>

if term_contains_real_ty t then
profile "rewrite(11.1)(REAL_ARITH)" RealArith.REAL_ARITH t
profile "rewrite(11.1)(REAL_ARITH)" RealField.REAL_ARITH t
else
profile "rewrite(11.2)(ARITH_PROVE)" intLib.ARITH_PROVE t
in
Expand Down Expand Up @@ -836,7 +836,7 @@ local
(* this is just a heuristic - it is quite conceivable that a
term that contains type real is provable by integer
arithmetic *)
profile "th_lemma[arith](3.1)(real)" RealArith.REAL_ARITH t'
profile "th_lemma[arith](3.1)(real)" RealField.REAL_ARITH t'
else
profile "th_lemma[arith](3.2)(int)" intLib.ARITH_PROVE t'
val subst = List.map (fn (term, var) => {redex = var, residue = term})
Expand Down
17 changes: 7 additions & 10 deletions src/HolSmt/selftest.sml
Original file line number Diff line number Diff line change
Expand Up @@ -213,9 +213,6 @@ in

(* real *)

(* Z3 2.19 prints reals as integers in its proofs; I see no way to
reliably distinguish between them. *)

(``0r = 0r``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),
(``1r = 1r``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),
(``0r = 1r``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),
Expand Down Expand Up @@ -450,9 +447,9 @@ in
(``1 * (x:real) = x``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),
(``(x:real) * 42 = 42 * x``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),

(``(x:real) / 1 = x``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3(*, thm_Z3p*)]),
(``x > 0 ==> (x:real) / 42 < x``, [(*thm_AUTO,*) thm_CVC, thm_YO, thm_Z3(*, thm_Z3p*)]),
(``x < 0 ==> (x:real) / 42 > x``, [(*thm_AUTO,*) thm_CVC, thm_YO, thm_Z3(*, thm_Z3p*)]),
(``(x:real) / 1 = x``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),
(``x > 0 ==> (x:real) / 42 < x``, [(*thm_AUTO,*) thm_CVC, thm_YO, thm_Z3, thm_Z3p]),
(``x < 0 ==> (x:real) / 42 > x``, [(*thm_AUTO,*) thm_CVC, thm_YO, thm_Z3, thm_Z3p]),

(``abs (x:real) >= 0``, [thm_AUTO, thm_YO]),
(``(abs (x:real) = 0) = (x = 0)``, [thm_AUTO, thm_YO]),
Expand Down Expand Up @@ -551,10 +548,10 @@ in
(``(x:real) < x``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),
(``(x:real) < y ==> 42 * x < 42 * y``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),

(``0n <= 1n``, [thm_AUTO, thm_YO]),
(``1n <= 0n``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),
(``(x:num) <= x``, [thm_AUTO, thm_YO]),
(``(x:num) <= y ==> 2 * x <= 2 * y``, [thm_AUTO, thm_YO]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Is this problem ((x:num) <= y ==> 2 * x <= 2 * y) supposed to be solvable only by Yices and not others? I wonder why you have removed the above 4 cases, although the 2nd one (1n <= 0m) seems false.

Copy link
Contributor Author

@someplaceguy someplaceguy Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so there are a few things going on here:

I wonder why you have removed the above 4 cases

So, the 4 test cases are being removed because the same test cases are already being performed above:

  • The original test cases:

    HOL/src/HolSmt/selftest.sml

    Lines 486 to 489 in 5eb0e7b

    (``0n <= 1n``, [thm_AUTO, thm_YO]),
    (``1n <= 0n``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),
    (``(x:num) <= x``, [thm_AUTO, thm_YO]),
    (``(x:num) <= y ==> 42 * x <= 42 * y``, [thm_AUTO, thm_YO]),
  • The duplicate ones:

    HOL/src/HolSmt/selftest.sml

    Lines 554 to 557 in 5eb0e7b

    (``0n <= 1n``, [thm_AUTO, thm_YO]),
    (``1n <= 0n``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),
    (``(x:num) <= x``, [thm_AUTO, thm_YO]),
    (``(x:num) <= y ==> 2 * x <= 2 * y``, [thm_AUTO, thm_YO]),

The duplicate ones are even in the wrong section (reals instead of nums). I suspect the duplicate ones might have been copy/pasted from the nums for reference, but someone forgot to delete them afterwards.

Is this problem ((x:num) <= y ==> 2 * x <= 2 * y) supposed to be solvable only by Yices and not others?

Currently, yes. The HolSmt support for Yices 1 uses a different translation which supports natural numbers. However, SMT-LIB doesn't support natural numbers, only integers, so we currently translate the natural numbers into uninterpreted constants.

It is possible to translate the natural numbers to the integers in an equivalence-preserving way (like it was done in Isabelle or perhaps F*) but we don't do this yet. I intend to fix this in the future (if nobody else does it first), since this is an extremely important use-case for me.

However, I decided to work on proof reconstruction first (to become familiar with it) because I wanted to make sure that when we translate the natural numbers, that the proof reconstruction still works seamlessly -- i.e., I'd like to make sure that it also works well for the natural numbers without any ugly kludges or fragile workarounds, if possible.

the 2nd one (1n <= 0n) seems false.

It is indeed false. The HolSmt tests include tests for true statements (i.e. theorems) and false ones. When the SMT solvers can prove that the goal is false, they return sat instead of unsat (hence why that test case is tested with sat_CVC, sat_Z3 and sat_Z3p rather than thm_CVC, thm_Z3, and thm_Z3p). In this case, it's easy for them to prove false statements about the natural numbers because they are allowed to assign whatever they want to uninterpreted constants, thus it's easy to assign something that makes the statement false.

This means that when we are forced to use uninterpreted constants or functions (which still happens a lot due to HolSmt limitations), a sat answer may not always imply that the goal is false, but an unsat answer always implies that the goal is true.

Hopefully, in the future we won't need to rely on uninterpreted symbols so much, making a sat answer a much more reliable indicator that the goal is false.

In my to-do list, there's also a task which consists in clarifying this to the HolSmt user: when the SMT solver replies sat, I'd like HolSmt to indicate to the user whether the sat result may be spurious or not. And also, I'd like to make it clear that when a non-spurious sat result is encountered, it indicates that the goal is false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your answer. Now I found these duplicated test cases in other places of selftest.sml, so you did the right thing (of removing duplications). P.S. In my to-do list, I will figure out why those tests (dedicated for Yices 1.x) failed and try to fix them.

(``0r <= 1r``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),
(``1r <= 0r``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),
(``(x:real) <= x``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),
(``(x:real) <= y ==> 42 * x <= 42 * y``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),

(``1r > 0r``, [thm_AUTO, thm_CVC, thm_YO, thm_Z3, thm_Z3p]),
(``0r > 1r``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),
Expand Down
Loading