Skip to content

Compact propositional encoding of OBJECT_SIZE(ptr) #7702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

tautschnig
Copy link
Collaborator

@tautschnig tautschnig commented May 5, 2023

For each OBJECT_SIZE expression to be encoded by the propositional back-end we previously created a select statement over all objects, mapping a match to the object's size. This would yield a number of constraints linear in the number of objects times the number of OBJECT_SIZE expressions.

For a particular benchmark at hand these were 2047 OBJECT_SIZE expressions over 1669 objects. Post-processing exceeded 30GB of memory after several minutes of constructing constraints.

For programs that number of distinct object sizes will be small. In the above benchmark there are only 8 distinct object sizes (over those 1669 objects). The new encoding groups objects by size by forming a disjunction of the object identifiers. BDDs are used to compute compact representations of these disjunctions.

With this new approach, post-processing on the above benchmark completes in less than 10 seconds and memory never exceeds 3GB.

While at it, use the same trick for compacting the encoding of IS_DYNAMIC_OBJECT(ptr).

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig added the aws Bugs or features of importance to AWS CBMC users label May 5, 2023
@tautschnig tautschnig requested a review from martin-cs as a code owner May 5, 2023 21:49
@tautschnig tautschnig self-assigned this May 5, 2023
@codecov
Copy link

codecov bot commented May 5, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (74075ec) 79.09% compared to head (c53b111) 79.10%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7702      +/-   ##
===========================================
+ Coverage    79.09%   79.10%   +0.01%     
===========================================
  Files         1699     1699              
  Lines       196512   196575      +63     
===========================================
+ Hits        155428   155508      +80     
+ Misses       41084    41067      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tautschnig tautschnig force-pushed the cleanup/object-size-encoding branch 2 times, most recently from 0bfe1a9 to 1545504 Compare May 8, 2023 10:48
@tautschnig tautschnig removed their assignment May 8, 2023
@remi-delmas-3000
Copy link
Collaborator

Experiments on a FreeRTOS proof that uses contracts (which in turn introduces a lot of r_ok instances) shows a reduction too:

  • 10% less variables
  • 30% less clauses,
  • peak SAT memory consumption decreased from 6G to 700M
  • runtime decreased from 1300s to 500s

@remi-delmas-3000
Copy link
Collaborator

On other models I observed up to 3x regressions on the runtime so think we should make this an opt-in feature with a command line switch.

@tautschnig tautschnig self-assigned this May 15, 2023
@tautschnig
Copy link
Collaborator Author

On other models I observed up to 3x regressions on the runtime so think we should make this an opt-in feature with a command line switch.

For the record: this is the list_is_valid task from https://github.com/remi-delmas-3000/unbounded-proofs-with-cbmc, where CBMC has the following resource usage:

  • develop:
    • SAT solver CaDiCaL: 37 seconds, 1.8 GB of memory
    • SAT solver kissat: 43 seconds, 800 MB of memory
  • this branch (BDDs):
    • SAT solver CaDiCaL: 103 seconds, 2.2 GB of memory
    • SAT solver kissat: 40 seconds, 788 MB of memory

So I do acknowledge the slow-down with CaDiCaL on this benchmark (just in case: MiniSat is not able to solve these benchmarks in a reasonable amount of time), but I'd argue that in other cases the proposed change makes previously infeasible benchmarks newly solvable.

@tautschnig tautschnig removed their assignment May 17, 2023
Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentative yes. Concerns:

  • Does this need a more extensive benchmarking run?
  • I feel like this is not the only place this technique could or should be used. I have pondered doing this for constant arrays or even arrays in general. This makes me wonder whether we need a general interface for "compact encoding of a map" and whether it is worth putting some time (experimental or literature) into finding a good BDD ordering and encoding technique.

@martin-cs martin-cs removed their assignment Sep 18, 2023
For each OBJECT_SIZE expression to be encoded by the propositional
back-end we previously created a select statement over all objects,
mapping a match to the object's size. This would yield a number of
constraints linear in the number of objects times the number of
OBJECT_SIZE expressions.

For a particular benchmark at hand these were 2047 OBJECT_SIZE
expressions over 1669 objects. Post-processing exceeded 30GB of memory
after several minutes of constructing constraints.

For programs that number of distinct object sizes will be small. In the
above benchmark there are only 8 distinct object sizes (over those 1669
objects). The new encoding groups objects by size by forming a
disjunction of the object identifiers. BDDs are used to compute compact
representations of these disjunctions.

With this new approach, post-processing on the above benchmark completes
in less than 10 seconds and memory never exceeds 3GB.

While at it, use the same trick for compacting the encoding of
IS_DYNAMIC_OBJECT(ptr).
@tautschnig tautschnig force-pushed the cleanup/object-size-encoding branch from 1545504 to c53b111 Compare November 15, 2023 16:20
@tautschnig
Copy link
Collaborator Author

More benchmarking has now been done across 582 harnesses that use CBMC in GitHub's CI. Across these, develop at revision caf9768 requires 16451 seconds when using MiniSat as back-end. With changes in this PR, the total time is 13405 seconds. There are instances of slow-downs, but when that is the case the combination of the changes proposed in PRs #7996, #8041, #8042 on top of this PR make the overall execution faster again. This demonstrates that on such PRs the time required by MiniSat is highly dependent on peculiarities of the encoding. The maximum observed slowdown is one case going from 173 seconds up to 394 seconds. The maximum speed-up, however, is getting from 1006 seconds down to 279 seconds.

On the same benchmarks and CBMC versions but using CaDiCaL as back-end it takes 29584 seconds with develop, and 30759 seconds with this PR.

So it may still be possible to further investigate optimisations of the encoding as proposed by @martin-cs, but the current state appears to be a reasonable starting point.

@tautschnig tautschnig merged commit 9220c3a into diffblue:develop Jan 22, 2024
@tautschnig tautschnig deleted the cleanup/object-size-encoding branch January 22, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants