HolSmt: fix Z3 proof replay in real arithmetic with rational coefficients #1188

someplaceguy · 2024-01-26T02:50:49Z

Sorry, while working on the newer Z3 proof reconstruction, I unexpectedly ran into another easy fix, hence this PR.

Namely, due to a comment in the self-tests I was under the impression that Z3 2.19 wasn't printing real constants correctly, but it turns out that it does (I suspect the issue may be with Z3 2.19.1 instead). Hence, it is possible to fix the proof reconstruction for the remaining real-related test cases.

After a quick investigation, it was easy to figure out the problem.

I will reproduce the initial part of the commit message here (see the commit for more details):

It turns out that the existing Z3 proof reconstruction code was using
RealArith.REAL_ARITH to replay linear arithmetic steps in proofs.

However, RealArith.REAL_ARITH only handles integral coefficients and
therefore proof steps such as (x: real) / 1r = x were failing.

To fix this, we simply switch to RealField.REAL_ARITH, which supports
rational coefficients.

This allows us to enable proof reconstruction for 3 more self-tests,
all related to the reals. This also means that all real-related tests
in the self-tests now work with proof reconstruction.

Another commit was also added that simply eliminates a few self-tests that were duplicated.

Edit: instead of removing the test cases, they were fixed to do what was almost surely intended, which is to test the reals in the same way the nums and ints are being tested.

Thanks!

someplaceguy · 2024-01-26T03:21:49Z

Oh, I just noticed that realLib.REAL_ARITH also exists:

  fun REAL_ARITH tm      = RealArith.OLD_REAL_ARITH tm
                           handle HOL_ERR _ => RealField.REAL_ARITH tm;

I'm sure you're aware of this, but for anyone who might not be aware, RealArith.OLD_REAL_ARITH is described in src/real/selftest.sml as:

The original version and old port by Hurd

I wonder why realLib's version first tries the old port and only if it fails, it tries the new one. Is it because the old port is faster (or solves different goals)?

If the realLib version is preferred, I can switch the code to use realLib.REAL_ARITH rather than RealField.REAL_ARITH directly.

binghe · 2024-01-26T14:13:03Z

The reason that realLib's REAL_ARITH first tries the old port and then the new one (after #1043), is because the old port can solve some (less interesting) inputs that the new port (same as HOL-Light's current version) cannot. To strictly align with HOL-Light's REAL_ARITH (for porting proofs from HOL-Light to HOL4), I chose to not (this is not actually true, as further explained in later comments, and the old port is 2x-3x faster) further enhance the new port to cover those (known) extra inputs.

When HolSmt was written, HOL4's REAL_ARITH was the old port, and the original author of HolSmt has only REAL_ARITH with integral coefficients. I'm glad to see that the new port of REAL_ARITH can also enrich the set of solvable cases of HolSmt.

binghe · 2024-01-26T14:22:52Z

Your current choice of using RealField.REAL_ARITH is good, I think. It's slightly slower but given the HolSmt code has no ability to actively choose different versions (by checking the coefficients), it has no choice but to always use the full version.

The interface provided by realLib is basically for users who doesn't care which version to use.

someplaceguy · 2024-01-26T14:23:58Z

Thank you for the background info, that clarifies things for me!

So it sounds like it'd be better to use realLib rather than RealField, since that would allow HolSmt to reconstruct more proofs.

I will fix the PR accordingly. Thanks!

Edit: I just saw your latest comment. Now I am unsure again!

binghe · 2024-01-26T14:31:06Z

Thank you for the background info, that clarifies things for me!

So it sounds like it'd be better to use realLib rather than RealField, since that would allow HolSmt to reconstruct more proofs.

I will fix the PR accordingly. Thanks!

Edit: I just saw your latest comment. Now I am unsure again!

There's no difference (on the number of solvable cases) for your two choices at this moment, because I don't think the extra problems that OLD_REAL_ARITH can solve, may appear inside HolSmt. Those "extra" problems solvable by OLD_REAL_ARITH are something like |- f (x + x) = f (2 * x), where f is an arbitrary uninterpreted function irrelevant to reals. Basically, by doing some extra rewritings, the new REAL_ARITH can also solve them.

someplaceguy · 2024-01-26T14:36:31Z

Ok, sorry, I was really confused for a moment.

Those "extra" problems solvable by OLD_REAL_ARITH are something like |- f (x + x) = f (2 * x), where f is an arbitrary uninterpreted function irrelevant to reals.

Yes, I think those kind of problems (such as the one you mentioned) are already being handled by other proof rules (like monotonicity, in this case), so I think RealField should be enough, as you say. And of course, if it turns out not to be the case, it can always be fixed later if necessary.

So I will leave the PR as it is then.

Thank you very much and sorry for the noise!

binghe · 2024-01-26T14:42:22Z

Ok, sorry, I was really confused for a moment.

Those "extra" problems solvable by OLD_REAL_ARITH are something like |- f (x + x) = f (2 * x), where f is an arbitrary uninterpreted function irrelevant to reals.

Yes, I think those kind of problems (such as the one you mentioned) are already being handled by other proof rules (like monotonicity, in this case), so I think RealField should be enough, as you say. And of course, if it turns out not to be the case, it can always be fixed later if necessary.

So I will leave the PR as it is then.

Thank you very much and sorry for the noise!

OK, I check the code, the known cases that HOL-Light's latest version cannot solve were something like |- closed {x | 1 * x = a} <=> closed {x | x = a}, found when building core library with the new port. Then I put some extra code into RealArith.sml (at line 2361) to make sure such cases can also be solved. But there's no way to know what else problems that the old port can do but the new one cannot, because their implementations are completely separate, no shared code at all.

Latest HOL-Light:

# REAL_ARITH `{x | 1 * x = a} = {x | x = a}`;;
Exception: Failure "linear_ineqs: no contradiction".

HOL4:

> RealArith.REAL_ARITH ``{x | 1 * x = a} = {x | x = a}``;
<<HOL message: more than one resolution of overloading was possible>>
val it = |- {x | 1 * x = a} = {x | x = a}: thm

HOL4 can do it only because I added one line (2361) in RealArith.sml with some historical notes:

    (* NOTE: this step of POLY_CONV helps by cutting off real arith terms
       hidding in propositional terms, e.g. ‘closed {x | 1 * x = a}’ will
       be simplified to ‘closed {x | x = a}’ before going to NNF_CONV.
       Some HOL4 proofs rely on this. *)
    TOP_DEPTH_CONV POLY_CONV THENC

This is mostly to make sure that unknown user proofs using the old REAL_ARITH continue to work, although the proof is easily fixable.

binghe · 2024-01-26T14:47:10Z

src/HolSmt/selftest.sml

-    (``0n <= 1n``, [thm_AUTO, thm_YO]),
-    (``1n <= 0n``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),
-    (``(x:num) <= x``, [thm_AUTO, thm_YO]),
-    (``(x:num) <= y ==> 2 * x <= 2 * y``, [thm_AUTO, thm_YO]),


Hi. Is this problem ((x:num) <= y ==> 2 * x <= 2 * y) supposed to be solvable only by Yices and not others? I wonder why you have removed the above 4 cases, although the 2nd one (1n <= 0m) seems false.

Right, so there are a few things going on here:

I wonder why you have removed the above 4 cases

So, the 4 test cases are being removed because the same test cases are already being performed above:

The original test cases:

HOL/src/HolSmt/selftest.sml

Lines 486 to 489 in 5eb0e7b

(``0n <= 1n``, [thm_AUTO, thm_YO]),

(``1n <= 0n``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),

(``(x:num) <= x``, [thm_AUTO, thm_YO]),

(``(x:num) <= y ==> 42 * x <= 42 * y``, [thm_AUTO, thm_YO]),

The duplicate ones:

HOL/src/HolSmt/selftest.sml

Lines 554 to 557 in 5eb0e7b

(``0n <= 1n``, [thm_AUTO, thm_YO]),

(``1n <= 0n``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),

(``(x:num) <= x``, [thm_AUTO, thm_YO]),

(``(x:num) <= y ==> 2 * x <= 2 * y``, [thm_AUTO, thm_YO]),

The duplicate ones are even in the wrong section (reals instead of nums). I suspect the duplicate ones might have been copy/pasted from the nums for reference, but someone forgot to delete them afterwards.

Is this problem ((x:num) <= y ==> 2 * x <= 2 * y) supposed to be solvable only by Yices and not others?

Currently, yes. The HolSmt support for Yices 1 uses a different translation which supports natural numbers. However, SMT-LIB doesn't support natural numbers, only integers, so we currently translate the natural numbers into uninterpreted constants.

It is possible to translate the natural numbers to the integers in an equivalence-preserving way (like it was done in Isabelle or perhaps F*) but we don't do this yet. I intend to fix this in the future (if nobody else does it first), since this is an extremely important use-case for me.

However, I decided to work on proof reconstruction first (to become familiar with it) because I wanted to make sure that when we translate the natural numbers, that the proof reconstruction still works seamlessly -- i.e., I'd like to make sure that it also works well for the natural numbers without any ugly kludges or fragile workarounds, if possible.

the 2nd one (1n <= 0n) seems false.

It is indeed false. The HolSmt tests include tests for true statements (i.e. theorems) and false ones. When the SMT solvers can prove that the goal is false, they return sat instead of unsat (hence why that test case is tested with sat_CVC, sat_Z3 and sat_Z3p rather than thm_CVC, thm_Z3, and thm_Z3p). In this case, it's easy for them to prove false statements about the natural numbers because they are allowed to assign whatever they want to uninterpreted constants, thus it's easy to assign something that makes the statement false.

This means that when we are forced to use uninterpreted constants or functions (which still happens a lot due to HolSmt limitations), a sat answer may not always imply that the goal is false, but an unsat answer always implies that the goal is true.

Hopefully, in the future we won't need to rely on uninterpreted symbols so much, making a sat answer a much more reliable indicator that the goal is false.

In my to-do list, there's also a task which consists in clarifying this to the HolSmt user: when the SMT solver replies sat, I'd like HolSmt to indicate to the user whether the sat result may be spurious or not. And also, I'd like to make it clear that when a non-spurious sat result is encountered, it indicates that the goal is false.

Thanks for your answer. Now I found these duplicated test cases in other places of selftest.sml, so you did the right thing (of removing duplications). P.S. In my to-do list, I will figure out why those tests (dedicated for Yices 1.x) failed and try to fix them.

someplaceguy · 2024-01-26T20:45:02Z

Wait, I made a mistake with those test cases after all. The test cases are duplicated indeed, but now I think the correct solution is not to remove the duplicates, but rather to convert them to use reals instead of nums, so that the same pattern of tests is followed for both nums and reals.

I will fix this PR soon to do that.

Thank you for bringing this to my attention @binghe, I'd almost surely have missed this issue otherwise. It's easy to get lost in the sea of tests 😅

@x76

It turns out that the existing Z3 proof reconstruction code was using RealArith.REAL_ARITH to replay linear arithmetic steps in proofs. However, RealArith.REAL_ARITH only handles integral coefficients and therefore proof steps such as `(x: real) / 1r = x` were failing. To fix this, we simply switch to RealField.REAL_ARITH, which supports rational coefficients. This allows us to enable proof reconstruction for 3 more self-tests, all related to the reals. This also means that all real-related tests in the self-tests now work with proof reconstruction. The RealArith->RealField change in the "rewrite" rule handler fixes proof steps such as these: `(rewrite (= (/ v0 1.0) v0))` and allows us to reenable the first self-test. The identical change in the "th_lemma[arith]" rule handler fixes proof steps such as the following: ((_ |th-lemma| arith farkas -41/42 1) @x76 @x77 false) ... corresponding to: [.] |- ~(x <= 0r)), [.] |- 41r / 42r * x <= 0r], F ... which allows us to reenable proof reconstruction for the other two self-tests.

These test cases were almost surely intended to test the reals, since they are in the `reals` section and identical test cases covering the nums already exist in the `nums` section. Since cvc5 and Z3 can reason about the reals (unlike the nums, currently) we can enable them for these tests. This includes Z3 with proof reconstruction.

someplaceguy · 2024-01-26T21:49:11Z

Update: instead of eliminating the test cases, I've changed them to cover the reals now, which is almost surely what was originally intended: to test these inequalities in the reals in the same way that they are being tested in the nums and the ints.

Thanks!

mn200 · 2024-01-28T23:21:37Z

Thanks both!

binghe reviewed Jan 26, 2024

View reviewed changes

someplaceguy marked this pull request as draft January 26, 2024 20:39

someplaceguy added 2 commits January 26, 2024 21:43

someplaceguy force-pushed the u/holsmt-real branch from 6aba5bc to 7b32ff2 Compare January 26, 2024 21:43

someplaceguy marked this pull request as ready for review January 26, 2024 21:46

mn200 merged commit a423628 into HOL-Theorem-Prover:develop Jan 28, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HolSmt: fix Z3 proof replay in real arithmetic with rational coefficients #1188

HolSmt: fix Z3 proof replay in real arithmetic with rational coefficients #1188

someplaceguy commented Jan 26, 2024 •

edited

Loading

someplaceguy commented Jan 26, 2024 •

edited

Loading

binghe commented Jan 26, 2024 •

edited

Loading

binghe commented Jan 26, 2024

someplaceguy commented Jan 26, 2024 •

edited

Loading

binghe commented Jan 26, 2024

someplaceguy commented Jan 26, 2024 •

edited

Loading

binghe commented Jan 26, 2024 •

edited

Loading

binghe Jan 26, 2024

someplaceguy Jan 26, 2024 •

edited

Loading

binghe Jan 26, 2024

someplaceguy commented Jan 26, 2024 •

edited

Loading

someplaceguy commented Jan 26, 2024

mn200 commented Jan 28, 2024

	(``0n <= 1n``, [thm_AUTO, thm_YO]),
	(``1n <= 0n``, [sat_CVC, sat_YO, sat_Z3, sat_Z3p]),
	(``(x:num) <= x``, [thm_AUTO, thm_YO]),
	(``(x:num) <= y ==> 42 * x <= 42 * y``, [thm_AUTO, thm_YO]),

HolSmt: fix Z3 proof replay in real arithmetic with rational coefficients #1188

HolSmt: fix Z3 proof replay in real arithmetic with rational coefficients #1188

Conversation

someplaceguy commented Jan 26, 2024 • edited Loading

someplaceguy commented Jan 26, 2024 • edited Loading

binghe commented Jan 26, 2024 • edited Loading

binghe commented Jan 26, 2024

someplaceguy commented Jan 26, 2024 • edited Loading

binghe commented Jan 26, 2024

someplaceguy commented Jan 26, 2024 • edited Loading

binghe commented Jan 26, 2024 • edited Loading

binghe Jan 26, 2024

Choose a reason for hiding this comment

someplaceguy Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

binghe Jan 26, 2024

Choose a reason for hiding this comment

someplaceguy commented Jan 26, 2024 • edited Loading

someplaceguy commented Jan 26, 2024

mn200 commented Jan 28, 2024

someplaceguy commented Jan 26, 2024 •

edited

Loading

someplaceguy commented Jan 26, 2024 •

edited

Loading

binghe commented Jan 26, 2024 •

edited

Loading

someplaceguy commented Jan 26, 2024 •

edited

Loading

someplaceguy commented Jan 26, 2024 •

edited

Loading

binghe commented Jan 26, 2024 •

edited

Loading

someplaceguy Jan 26, 2024 •

edited

Loading

someplaceguy commented Jan 26, 2024 •

edited

Loading