Improve history thread ID `may_create` #1561

sim642 · 2024-08-14T17:11:28Z

The may_create function for our history-based thread IDs from the ESOP paper is quite conservative.
After thinking about it for a while (in comparison to is_must_parent, I noticed many cases where may_create can be more precise and incrementally implemented them here.

I haven't constructed programs where all these thread IDs and their improvements would actually show up and matter, so these may be very obscure (or even infeasible for some other reasons I didn't notice).
Also, I'm not sure if it is now optimal, possibly not yet in the "both are non-unique" case.

TODO

Add unit tests.
Add regression tests.
Rename the two functions.
Make prefix list un-reversed? Being reversed is only useful for efficient "appending", but the prefixes should be very short. Reversing those lists for various predicates (which may be called many times) might be the more overwhelming inefficiency.
Benchmark for (unlikely) precision improvement or performance regression.
From GobCon: consider over-approximated $$C$$ implications.

…hortening

michael-schwarz · 2024-08-14T17:50:05Z

I wonder how often (if at all) any of this actually hits: As soon as the creator is not the must-parent, it's already game over:

analyzer/src/cdomains/mHP.ml

Lines 55 to 60 in b9caf63

    
           let definitely_not_started (current, created) other = 
        
             if (not (TID.is_must_parent current other)) then 
        
               false 
        
             else 
        
               let ident_or_may_be_created creator = TID.equal creator other || TID.may_create creator other in 
        
               if ConcDomain.ThreadSet.is_top created then

sim642 · 2024-08-14T18:30:50Z

Indeed, since those are our only usages of must_be_parent and may_create, they might not and I didn't consider that so far.
Although there we require current to be must parent, but creator is checked for may_create. The naming in the code is maybe a bit confusing, because creator (from $$C$$) is created by current.

I did a test run with coverage locally and (even to by surprise) all lines and branches are covered. But the actual usefulness of this will become clear when I try to construct regression test programs around these hypothetical improvements.

sim642 · 2024-08-15T10:35:40Z

I managed to simplify and optimize it to something reasonable by using a simple (but efficient) GobList.remove_common_prefix function. The old way of checking of p is a prefix of p' was

P.equal p (P.common_suffix p p')

which is very roundabout: it constructs the common prefix (via list reversals) just to check for equality (which goes through the entire list again).

In many cases the check

S.subset (S.union (S.of_list p) s) (S.union (S.of_list p') s')

does capture everything that's needed, but it's far from obvious that it's all that's needed (for optimality).
By using information from checks (e.g. s is empty, or p is prefix of p' or vice versa) and well-formedness (p has unique elements, and p and s are disjoint), a lot less needs to be actually checked – in some cases nothing at all!

I've now realized that the naming of the two functions is also a bit weird. I would rename:

is_must_parent → must_be_ancestor because in standard terminology parent means immediate,
may_create → may_be_ancestor to make naming uniform.

…read ID may_create

sim642 · 2024-08-15T13:00:20Z

I went through all the unit tests for the improved cases and tried to construct a real program for each one. Five of them have a corresponding regression test now that also checks for the improved precision from this. For others I documented why a real program for data race regression testing at least isn't possible.
There may be a use for may_create in the future that doesn't guarantee is_must_parent checks before it. And even for the time being, the change to may_create cannot even harm performance because in those cases the new logic isn't used anyway.

sim642 · 2024-08-21T08:44:19Z

A before vs after benchmark run on SV-COMP ConcurrencySafety with 60s timeout gives zero verdict differences and performance differences are also just within measurement noise.

As to the over-approximated $$C$$ sets, I don't see how it would matter: this predicate even doesn't use it. The only place in thread spawning where it does matter is thread creation in a loop, where main could directly create main, {foo} (not just main, foo). But as these thread IDs, main, {foo} could equivalently be created by main, foo recursively. Since both predicates are about ancestors (not immediate parents), this shouldn't matter. main is considered both must and may ancestor of main, {foo}.

michael-schwarz · 2024-12-15T15:43:02Z

I saw you have a pending thesis on verifying the correctness of this. Do we want to wait for this or should I attempt to review it such that we can then merge it?

sim642 · 2024-12-16T08:49:14Z

I offered a thesis topic on formally verifying this but there's no student working on it (and I'm not sure if there ever will be), so we shouldn't wait for that. I've left a lot of comments for the various cases along with tests, so I hope that's sufficiently clear and convincing that this should be right.

src/cdomain/value/cdomains/threadIdDomain.ml

michael-schwarz · 2024-12-16T14:50:36Z

tests/unit/cdomains/threadIdDomainTest.ml

+  assert_equal true (may_be_ancestor (main >> a >> b >> b) (main >> b >> b >> a));
+  assert_equal true (may_be_ancestor (main >> a >> b >> b) (main >> b >> a >> b));
+
+  (* 4f6a7637b8d0dc723fe382f94bed6c822cd4a2ce passes all... *)


Why this comment?

I think the reason was that there are at least two improvements after that commit (dbd4874 and c2c596f) which are supposed to improve precision, but they're not observed by the tests. That might possibly be related to the other unit test TODOs about certain cases being untestable by construction.
Since then I refactored the code, so perhaps it's no longer true. At least according to code coverage, all cases seem to be covered by something.

Then we can maybe just remove the comment?

michael-schwarz · 2024-12-16T14:51:01Z

tests/unit/cdomains/threadIdDomainTest.ml

+  assert_equal false (may_be_ancestor (main >> a >> b >> b) (main >> b >> b)); (* from prefix *) (* infeasible for race: definitely_not_started requires (main >> a or main >> a >> b), where this must happen, to be must parent for (main >> b >> b), which it is not *)
+  (* non-unique creates non-unique: removed elements and set must be in new set *)
+  (* assert_equal false (may_be_ancestor (main >> a >> b >> c >> c) (main >> a >> c >> c)); *)
+  (* TODO: cannot test due because by construction after prefix check? *)


What is the plan with this TODO?

michael-schwarz · 2024-12-16T14:51:11Z

tests/unit/cdomains/threadIdDomainTest.ml

+  assert_equal true (may_be_ancestor main (main >> a >> a));
+  assert_equal true (may_be_ancestor main (main >> a >> b >> b));
+  assert_equal true (may_be_ancestor (main >> a) (main >> a >> b >> b));
+  (* TODO: added elements condition always true by construction in tests? *)


What is the plan with this TODO?

Co-authored-by: Michael Schwarz <[email protected]>

sim642 added 6 commits August 14, 2024 19:18

Improve history thread ID may_create for unique threads

52055b1

Improve history thread ID may_create for incompatible prefixes

baa497a

Improve history thread ID may_create for both non-unique threads

4f6a763

Improve history thread ID may_create for first unique thread extension

dbd4874

Improve history thread ID may_create for first unique thread prefix s…

c2c596f

…hortening

Remove history thread ID may_create for first unique thread prefix TODO

bb2fa08

sim642 added feature precision relational Relational analyses (Apron, affeq, lin2var) labels Aug 14, 2024

michael-schwarz self-requested a review August 14, 2024 17:37

sim642 added 6 commits August 15, 2024 11:44

Add history thread ID domain unit tests

f3dda0e

Add TODOs to history thread ID may_create

be2b3e2

Do additional simplifications to history thread ID may_create

73a22d7

Optimize history thread IDs using GobList.remove_common_prefix

b6fd349

Simplify history thread ID may_create

31d22ef

Clean up history thread ID is_must_parent

5f1b296

sim642 added the performance Analysis time, memory usage label Aug 15, 2024

Fix GobList.remove_common_prefix doc indentation

9744751

sim642 added 2 commits August 15, 2024 15:42

Construct feasible races-mhp regression tests for improved history th…

8e54444

…read ID may_create

Fix comment in history thread ID is_must_parent

cc7a76a

sim642 force-pushed the threadid-history-may_create branch from aa3fb6d to cc7a76a Compare August 15, 2024 12:54

sim642 marked this pull request as ready for review August 15, 2024 13:00

sim642 self-assigned this Aug 20, 2024

sim642 added 2 commits August 20, 2024 15:13

Merge branch 'master' into threadid-history-may_create

de117f5

Rename to must_be_ancestor and may_be_ancestor

fc9ecfc

michael-schwarz reviewed Dec 16, 2024

View reviewed changes

Fix ThreadIdDomain comments from review

205d2b9

Co-authored-by: Michael Schwarz <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve history thread ID `may_create` #1561

Improve history thread ID `may_create` #1561

sim642 commented Aug 14, 2024 •

edited

Loading

michael-schwarz commented Aug 14, 2024

sim642 commented Aug 14, 2024

sim642 commented Aug 15, 2024

sim642 commented Aug 15, 2024

sim642 commented Aug 21, 2024

michael-schwarz commented Dec 15, 2024

sim642 commented Dec 16, 2024

michael-schwarz Dec 16, 2024

sim642 Dec 17, 2024

michael-schwarz Dec 17, 2024

michael-schwarz Dec 16, 2024

michael-schwarz Dec 16, 2024

Improve history thread ID may_create #1561

Are you sure you want to change the base?

Improve history thread ID may_create #1561

Conversation

sim642 commented Aug 14, 2024 • edited Loading

TODO

michael-schwarz commented Aug 14, 2024

sim642 commented Aug 14, 2024

sim642 commented Aug 15, 2024

sim642 commented Aug 15, 2024

sim642 commented Aug 21, 2024

michael-schwarz commented Dec 15, 2024

sim642 commented Dec 16, 2024

michael-schwarz Dec 16, 2024

Choose a reason for hiding this comment

sim642 Dec 17, 2024

Choose a reason for hiding this comment

michael-schwarz Dec 17, 2024

Choose a reason for hiding this comment

michael-schwarz Dec 16, 2024

Choose a reason for hiding this comment

michael-schwarz Dec 16, 2024

Choose a reason for hiding this comment

Improve history thread ID `may_create` #1561

Improve history thread ID `may_create` #1561

sim642 commented Aug 14, 2024 •

edited

Loading