Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsumption and Subsumption Resolution via SAT solving #546

Merged
merged 550 commits into from
Jul 19, 2024

Conversation

RobCoutel
Copy link
Contributor

In this pull request, we completely replace the code for the subsumption and subsumption resolution module by an implementation using a SAT solver to encode the problems.

The papers:

  • 2022: "First-Order Subsumption via SAT Solving." by Jakob Rath, Armin Biere and Laura Kovács
  • 2023: "SAT-Based Subsumption Resolution" by Robin Coutelier, Jakob Rath, Michael Rawson and Laura Kovács
  • 2024: "SAT Solving for Variants of First-Order Subsumption" by Robin Coutelier, Jakob Rath, Michael Rawson, Armin Biere and Laura Kovács

explain the details of the encodings as well as provide proofs of their soundness.

The most important parts of the PR are the following:

  • "SATSubsumption/subsat": A custom SAT solver implemented by Jakob supporting AMO constraints and substitution reasoning
  • "SATSubsumption/SATSubsumptionAndResolution.*": The implementation of clause at a time subsumption and subsumption resolution
  • "Inferences/ForwardSubsumptionAndResolution.cpp": The loop for forward subsumption and subsumption resolution was optimized to mutualize the setup of the SAT solver and the MatchSet used by the subsumption and subsumption resolution module
  • "Inferences/BackwardSubsumptionAndResolution.cpp": The loop was treated similarly as for forward subsumption and subsumption resolution
  • "UnitTests/tSATSubsumptionResolution.cpp": A set of unit tests to check the soundness of our encoding
  • "Saturation/SaturationAlgorithm.cpp": We plugged in the new forward and backward loops in the saturation algorithm.

Inferences/BackwardSubsumptionResolution.cpp Show resolved Hide resolved
Inferences/SubsumptionDemodulationHelper.hpp Outdated Show resolved Hide resolved
Kernel/Clause.hpp Outdated Show resolved Hide resolved
Kernel/MLMatcher.cpp Outdated Show resolved Hide resolved
Kernel/MLMatcher.cpp Outdated Show resolved Hide resolved
SATSubsumption/subsat/types.hpp Outdated Show resolved Hide resolved
SATSubsumption/subsat/subsat_config.hpp Outdated Show resolved Hide resolved
SATSubsumption/SATSubsumptionAndResolution.hpp Outdated Show resolved Hide resolved
SATSubsumption/SATSubsumptionAndResolution.hpp Outdated Show resolved Hide resolved
@JakobR
Copy link
Member

JakobR commented May 17, 2024

Oh damn... most of my comments were supposed to be answers to @MichaelRawson's comments, but it looks like github lost the connection somehow? Edit: please click on "View reviewed changes" above to see the context (next to "JakobR reviewed ...")

@MichaelRawson
Copy link
Contributor

Thanks for the hard work, @JakobR ! That was quick. I've replied inline where still possible, I guess GitHub doesn't like the big merges so much. Others:

I can remove MLMatcherStats, if you want.

No need, thanks for the explanation.

Using uint64_t extra_j would probably work too. Do you want me to keep the comment (or a similar one)?

Keep the comment for now, it gives at least the intuition!

@JakobR
Copy link
Member

JakobR commented May 21, 2024

I think this is done from my side.

I'm repeating one unresolved point buried in the inline comments, but I think it's a minor issue:

  • BackwardSubsumptionAndResolution: maybe get rid of _checked or _subsumedSet (one of them is likely redundant)

@quickbeam123
Copy link
Collaborator

quickbeam123 commented Jul 15, 2024

Guys, my experiments suggest that something does not quite work the way it should, at least not for my favourite (pseudo-)default strategy -sa discount -awr 10. Instead of suspecting your likely-well-optimised sat-based subsumption code itself, I want to question, at least for a second, the Loop ("for forward subsumption and subsumption resolution was optimized to mutualize...") and the heuristically choices it encapsulates. I think it might be making a heuristic commitment that is detrimental for the average performance (regardless of the speed of performing the individual subsumption checks; indeed subsumption is less important for discount and less important under avatar than without it).

The comparison for -i 100000 limited -sa discount -awr 10 run over the FOF part of TPTP 9.0.0 are:

Sort by UNS
9344 ['problemsSTD_robin8250_dis10_i100000.pkl']
9347 ['problemsSTD_master7703_dis10_i100000.pkl']

which is close, but's it's quite a bit worse at the 10000 mark:

8190 ['problemsSTD_robin8250_dis10_i100000.pkl']
8219 ['problemsSTD_master7703_dis10_i100000.pkl']

(I plan to generate a full cactus plot comparison soon).

However, the main suspicious thing is the number of activations needed by the two vampires to succeed on the 9307 solved (in 100000) by both, namely: 53283652 vs 53633200.

Before I start digging deeper, could you please sum up how "the Loop" differs from the previous solution we had in Vampire? Also, what would be your favourite platform for a bit of discussion on this? (Are you, e.g., getting notifications from our zulip?)

@RobCoutel
Copy link
Contributor Author

Hi Martin,

However, the main suspicious thing is the number of activations needed by the two vampires to succeed on the 9307 solved (in 100000) by both, namely: 53 283 652 vs 53 633 200.

I don't know what is meant here. Do you mean the number of subsumption resolution calls?
If yes, then it makes sense. See below the difference between the old and new loops.

Before I start digging deeper, could you please sum up how "the Loop" differs from the previous solution we had in Vampire?

Previously, Vampire would first check all the subsumptions and then, if it failed, all the subsumption resolutions. However, since setting up the problems is expensive, we have changed the loop to perform both subsumption and subsumption resolution on each instance.

Old:
For each candidate clause:
Check subsumption
For each candidate clause:
Check subsumption resolution

New:
For each candidate clause:
Check subsumption
Check subsumption resolution

The drawback is that if a clause is subsumed in the new version, we will perform useless subsumption resolution checks that the old loop does not.
The advantage is that we set up both problems simultaneously, saving on pruning, matching, and index time.

This could be a problem if a certain strategy gets a lot of subsumed clauses.

I will have a look to make sure nothing fishy is happening. But what is curious is that the optimization seemed to be very beneficial on otter.

Another thing that might change the results is that there might be more than one solution for subsumption resolution. Therefore the search is affected by the method employed.

Also, what would be your favourite platform for a bit of discussion on this? (Are you, e.g., getting notifications from our zulip?)

GitHub and Zulip are both fine. But on Zulip I have 2 accounts (not super convenient, I know). I will answer faster on the one where I have a profile picture set. Zulip might be a bit more spontaneous.

Best,
Robin

@easychair
Copy link
Contributor

easychair commented Jul 16, 2024 via email

@quickbeam123
Copy link
Collaborator

I just deleted a line

static_assert(std::is_same<subsat::allocator_type<int>, STLAllocator<int>>::value, "unexpected subsat::allocator_type");

since we don't have any STLAllocator anymore. @JakobR , could that cause trouble?

@RobCoutel
Copy link
Contributor Author

Hi Robin, I am not sure I understood your message. Are you proposing to change the main loop because you think SAT-based subsumption and subsumption resolution should be made the only way to do these two operations in Vampire? Best, Andrei

Hello Andrei,

I do not touch the main loop of Vampire. I only changed the Forward Simplification loop for subsumption and subsumption resolution. This is where the optimization kicks in.

I removed the old subsumption and subsumption resolution for code maintainability. Our experiments show that the SAT method is faster by a factor of 35% on average, with a much lower variance.
image

However, one thing that is not taken into account here is the runtime. We ran our experiments on a 60-second timeout with -sa otter. But the portfolio will run for shorter bursts. Our method could be worse in this case because its strength is scaling. But if Vampire only runs for a few seconds on a specific strategy, it is possible that this scaling effect is not observed.

We are looking into it with Martin.

Best,
Robin

@JakobR
Copy link
Member

JakobR commented Jul 18, 2024

I just deleted a line

static_assert(std::is_same<subsat::allocator_type<int>, STLAllocator<int>>::value, "unexpected subsat::allocator_type");

since we don't have any STLAllocator anymore. @JakobR , could that cause trouble?

Yes, this is fine to remove! (In an earlier version, it was possible to change the SAT solver's allocator. This assertion checked that when used within Vampire, the correct allocator is set.)

@quickbeam123 quickbeam123 merged commit ce340a0 into master Jul 19, 2024
1 check passed
@quickbeam123 quickbeam123 deleted the robin_c-sat-s-sr-for-master branch July 19, 2024 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants