Subsumption and Subsumption Resolution via SAT solving #546

RobCoutel · 2024-04-19T12:35:46Z

In this pull request, we completely replace the code for the subsumption and subsumption resolution module by an implementation using a SAT solver to encode the problems.

The papers:

2022: "First-Order Subsumption via SAT Solving." by Jakob Rath, Armin Biere and Laura Kovács
2023: "SAT-Based Subsumption Resolution" by Robin Coutelier, Jakob Rath, Michael Rawson and Laura Kovács
2024: "SAT Solving for Variants of First-Order Subsumption" by Robin Coutelier, Jakob Rath, Michael Rawson, Armin Biere and Laura Kovács

explain the details of the encodings as well as provide proofs of their soundness.

The most important parts of the PR are the following:

"SATSubsumption/subsat": A custom SAT solver implemented by Jakob supporting AMO constraints and substitution reasoning
"SATSubsumption/SATSubsumptionAndResolution.*": The implementation of clause at a time subsumption and subsumption resolution
"Inferences/ForwardSubsumptionAndResolution.cpp": The loop for forward subsumption and subsumption resolution was optimized to mutualize the setup of the SAT solver and the MatchSet used by the subsumption and subsumption resolution module
"Inferences/BackwardSubsumptionAndResolution.cpp": The loop was treated similarly as for forward subsumption and subsumption resolution
"UnitTests/tSATSubsumptionResolution.cpp": A set of unit tests to check the soundness of our encoding
"Saturation/SaturationAlgorithm.cpp": We plugged in the new forward and backward loops in the saturation algorithm.

Inferences/BackwardSubsumptionResolution.cpp

Inferences/SubsumptionDemodulationHelper.hpp

Kernel/Clause.hpp

Kernel/MLMatcher.cpp

SATSubsumption/subsat/types.hpp

SATSubsumption/subsat/subsat_config.hpp

SATSubsumption/SATSubsumptionAndResolution.hpp

JakobR · 2024-05-17T09:47:53Z

Oh damn... most of my comments were supposed to be answers to @MichaelRawson's comments, but it looks like github lost the connection somehow? Edit: please click on "View reviewed changes" above to see the context (next to "JakobR reviewed ...")

MichaelRawson · 2024-05-17T11:20:24Z

Thanks for the hard work, @JakobR ! That was quick. I've replied inline where still possible, I guess GitHub doesn't like the big merges so much. Others:

I can remove MLMatcherStats, if you want.

No need, thanks for the explanation.

Using uint64_t extra_j would probably work too. Do you want me to keep the comment (or a similar one)?

Keep the comment for now, it gives at least the intuition!

…:22: error: static assertion failed: VDEBUG and NDEBUG are not synchronized

JakobR · 2024-05-21T14:49:41Z

I think this is done from my side.

I'm repeating one unresolved point buried in the inline comments, but I think it's a minor issue:

BackwardSubsumptionAndResolution: maybe get rid of _checked or _subsumedSet (one of them is likely redundant)

quickbeam123 · 2024-07-15T18:43:03Z

Guys, my experiments suggest that something does not quite work the way it should, at least not for my favourite (pseudo-)default strategy -sa discount -awr 10. Instead of suspecting your likely-well-optimised sat-based subsumption code itself, I want to question, at least for a second, the Loop ("for forward subsumption and subsumption resolution was optimized to mutualize...") and the heuristically choices it encapsulates. I think it might be making a heuristic commitment that is detrimental for the average performance (regardless of the speed of performing the individual subsumption checks; indeed subsumption is less important for discount and less important under avatar than without it).

The comparison for -i 100000 limited -sa discount -awr 10 run over the FOF part of TPTP 9.0.0 are:

Sort by UNS
9344 ['problemsSTD_robin8250_dis10_i100000.pkl']
9347 ['problemsSTD_master7703_dis10_i100000.pkl']

which is close, but's it's quite a bit worse at the 10000 mark:

8190 ['problemsSTD_robin8250_dis10_i100000.pkl']
8219 ['problemsSTD_master7703_dis10_i100000.pkl']

(I plan to generate a full cactus plot comparison soon).

However, the main suspicious thing is the number of activations needed by the two vampires to succeed on the 9307 solved (in 100000) by both, namely: 53283652 vs 53633200.

Before I start digging deeper, could you please sum up how "the Loop" differs from the previous solution we had in Vampire? Also, what would be your favourite platform for a bit of discussion on this? (Are you, e.g., getting notifications from our zulip?)

RobCoutel · 2024-07-16T16:20:23Z

Hi Martin,

However, the main suspicious thing is the number of activations needed by the two vampires to succeed on the 9307 solved (in 100000) by both, namely: 53 283 652 vs 53 633 200.

I don't know what is meant here. Do you mean the number of subsumption resolution calls?
If yes, then it makes sense. See below the difference between the old and new loops.

Before I start digging deeper, could you please sum up how "the Loop" differs from the previous solution we had in Vampire?

Previously, Vampire would first check all the subsumptions and then, if it failed, all the subsumption resolutions. However, since setting up the problems is expensive, we have changed the loop to perform both subsumption and subsumption resolution on each instance.

Old:
For each candidate clause:
Check subsumption
For each candidate clause:
Check subsumption resolution

New:
For each candidate clause:
Check subsumption
Check subsumption resolution

The drawback is that if a clause is subsumed in the new version, we will perform useless subsumption resolution checks that the old loop does not.
The advantage is that we set up both problems simultaneously, saving on pruning, matching, and index time.

This could be a problem if a certain strategy gets a lot of subsumed clauses.

I will have a look to make sure nothing fishy is happening. But what is curious is that the optimization seemed to be very beneficial on otter.

Another thing that might change the results is that there might be more than one solution for subsumption resolution. Therefore the search is affected by the method employed.

Also, what would be your favourite platform for a bit of discussion on this? (Are you, e.g., getting notifications from our zulip?)

GitHub and Zulip are both fine. But on Zulip I have 2 accounts (not super convenient, I know). I will answer faster on the one where I have a profile picture set. Zulip might be a bit more spontaneous.

Best,
Robin

easychair · 2024-07-16T21:35:57Z

Hi Robin, I am not sure I understood your message. Are you proposing to change the main loop because you think SAT-based subsumption and subsumption resolution should be made the only way to do these two operations in Vampire? Best, Andrei

…

On Tue, 16 Jul 2024 at 17:20, Robin Coutelier ***@***.***> wrote: Hi Martin, However, the main suspicious thing is the number of activations needed by the two vampires to succeed on the 9307 solved (in 100000) by both, namely: 53 283 652 vs 53 633 200. I don't know what is meant here. Do you mean the number of subsumption resolution calls? If yes, then it makes sense. See below the difference between the old and new loops. Before I start digging deeper, could you please sum up how "the Loop" differs from the previous solution we had in Vampire? Previously, Vampire would first check all the subsumptions and then, if it failed, all the subsumption resolutions. However, since setting up the problems is expensive, we have changed the loop to perform both subsumption and subsumption resolution on each instance. Old: For each candidate clause: Check subsumption For each candidate clause: Check subsumption resolution New: For each candidate clause: Check subsumption Check subsumption resolution The drawback is that if a clause is subsumed in the new version, we will perform useless subsumption resolution checks that the old loop does not. The advantage is that we set up both problems simultaneously, saving on pruning, matching, and index time. This could be a problem if a certain strategy gets a lot of subsumed clauses. I will have a look to make sure nothing fishy is happening. But what is curious is that the optimization seemed to be very beneficial on otter. Another thing that might change the results is that there might be more than one solution for subsumption resolution. Therefore the search is affected by the method employed. Also, what would be your favourite platform for a bit of discussion on this? (Are you, e.g., getting notifications from our zulip?) GitHub and Zulip are both fine. But on Zulip I have 2 accounts (not super convenient, I know). I will answer faster on the one where I have a profile picture set. Zulip might be a bit more spontaneous. Best, Robin — Reply to this email directly, view it on GitHub <#546 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVY4BIRAVMOY6AGAM55RCTZMVB57AVCNFSM6AAAAABGPDHLGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRGMZTQMJVGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

quickbeam123 · 2024-07-18T03:40:26Z

I just deleted a line

static_assert(std::is_same<subsat::allocator_type<int>, STLAllocator<int>>::value, "unexpected subsat::allocator_type");

since we don't have any STLAllocator anymore. @JakobR , could that cause trouble?

RobCoutel · 2024-07-18T06:34:42Z

Hi Robin, I am not sure I understood your message. Are you proposing to change the main loop because you think SAT-based subsumption and subsumption resolution should be made the only way to do these two operations in Vampire? Best, Andrei

Hello Andrei,

I do not touch the main loop of Vampire. I only changed the Forward Simplification loop for subsumption and subsumption resolution. This is where the optimization kicks in.

I removed the old subsumption and subsumption resolution for code maintainability. Our experiments show that the SAT method is faster by a factor of 35% on average, with a much lower variance.

However, one thing that is not taken into account here is the runtime. We ran our experiments on a 60-second timeout with -sa otter. But the portfolio will run for shorter bursts. Our method could be worse in this case because its strength is scaling. But if Vampire only runs for a few seconds on a specific strategy, it is possible that this scaling effect is not observed.

We are looking into it with Martin.

Best,
Robin

JakobR · 2024-07-18T09:35:53Z

I just deleted a line
static_assert(std::is_same<subsat::allocator_type<int>, STLAllocator<int>>::value, "unexpected subsat::allocator_type");
since we don't have any STLAllocator anymore. @JakobR , could that cause trouble?

Yes, this is fine to remove! (In an earlier version, it was possible to change the SAT solver's allocator. This assertion checked that when used within Vampire, the correct allocator is set.)

…ot rid of goto

JakobR added 30 commits February 17, 2022 16:19

Remove minisat

063dc92

token

bdcd2bd

cache bindings

07ba33c

seems to work

40022ae

don't leak

7cb7df6

bugfix and make it testable

2ebcfbf

wtf

58f6853

includes for gcc

3c513b0

warnings

3c54935

gcc compile error

071b259

hmm

adf461f

warning

0def54e

gcc compile error

7f4e014

tweaks

d6de884

do first run without setup

e7449d1

minor tweak: we don't *always* need to clear the solver

f741f86

there's a stupid bug in the slogs but I have no idea what it is :(

ed817f1

correct the resLitIdx before logging

561a461

update slog parser

37fb509

slog fix

bae5726

measure orig_setup

b4aa03a

more faithful MLMatcher usage

cc317e2

properly skip of already-handled instances

ea42c90

don't check S/SR in release mode

05d2845

slight optimization

22a823a

remove comment

cf4a030

do less runs

4ba46bc

try shorter clause (or rather, an exactly-one constraint)

3c51fa9

do another run with S enabled

37d03ed

fix

6483620

JakobR reviewed May 17, 2024

View reviewed changes

JakobR added 2 commits May 17, 2024 12:18

Use Vampire-style assertions

531b854

Restore Statistics.cpp from master branch

f1f33e1

quickbeam123 and others added 9 commits May 20, 2024 10:50

Merge branch 'master' into robin_c-sat-s-sr-for-master

beca034

prevent the following: SATSubsumption/subsat/././subsat_config.hpp:18…

bcd55ee

…:22: error: static assertion failed: VDEBUG and NDEBUG are not synchronized

Move standalone subsat into the main CMakeLists.txt

59b780f

Get rid of the SUBSAT_STANDALONE flag

8b9bd4e

Disable SAT solver statistics by default

62aaa35

Simplify Constraint::header_bytes

a4f7188

Remove both extra_i and extra_j

4f56e29

Restore subsumption (resolution) statistics

172a2bd

Remove the now-unused EncodingMethod type

1f42144

JakobR approved these changes May 21, 2024

View reviewed changes

Merge branch 'master' into robin_c-sat-s-sr-for-master

01bad98

quickbeam123 force-pushed the robin_c-sat-s-sr-for-master branch from b26fa9f to d087a9a Compare July 18, 2024 08:38

Merge branch 'master' into robin_c-sat-s-sr-for-master

69ddc23

quickbeam123 force-pushed the robin_c-sat-s-sr-for-master branch from d087a9a to 69ddc23 Compare July 18, 2024 08:52

RobCoutel added 2 commits July 18, 2024 11:53

kicked out unnecessary _subsumedSet + adding doc

3a29254

reintroduced the lost optimization on the least matchable literal + g…

796148f

…ot rid of goto

quickbeam123 merged commit ce340a0 into master Jul 19, 2024
1 check passed

quickbeam123 deleted the robin_c-sat-s-sr-for-master branch July 19, 2024 06:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsumption and Subsumption Resolution via SAT solving #546

Subsumption and Subsumption Resolution via SAT solving #546

RobCoutel commented Apr 19, 2024

JakobR commented May 17, 2024 •

edited

Loading

MichaelRawson commented May 17, 2024

JakobR commented May 21, 2024

quickbeam123 commented Jul 15, 2024 •

edited

Loading

RobCoutel commented Jul 16, 2024

easychair commented Jul 16, 2024 via email

quickbeam123 commented Jul 18, 2024

RobCoutel commented Jul 18, 2024

JakobR commented Jul 18, 2024

Subsumption and Subsumption Resolution via SAT solving #546

Subsumption and Subsumption Resolution via SAT solving #546

Conversation

RobCoutel commented Apr 19, 2024

JakobR commented May 17, 2024 • edited Loading

MichaelRawson commented May 17, 2024

JakobR commented May 21, 2024

quickbeam123 commented Jul 15, 2024 • edited Loading

RobCoutel commented Jul 16, 2024

easychair commented Jul 16, 2024 via email

quickbeam123 commented Jul 18, 2024

RobCoutel commented Jul 18, 2024

JakobR commented Jul 18, 2024

JakobR commented May 17, 2024 •

edited

Loading

quickbeam123 commented Jul 15, 2024 •

edited

Loading