Allocator alignment problems - remove it? #452

MichaelRawson · 2023-06-07T09:51:34Z

Not intended for merge.

Recently I started running a sanitizer for undefined behaviour (-fsanitize=undefined) and to my surprise there are relatively few issues, mostly easy to fix.

Unfortunately, one of them is not so easy: Vampire's allocator returns memory that is not sufficiently aligned for some objects. Allocator seems to align to pointer-sized (8 bytes on my machine) boundaries, but some types - for example std::ifstream - have larger alignment requirements. So far I've seen up to std::max_align_t (16 bytes), but even greater alignment is theoretically possible, so-called "over-aligned" types.

There are several possible approaches here:

Do nothing, maybe alignment isn't such a big issue. This has indeed worked so far, but compilers are getting smarter and assuming a greater amount of standards-compliance from code. I would not be surprised if compilers started emitting aligned loads where they are technically allowed to - which would be absolutely horrendous to track down, and then we'd have to fix Allocator anyway.
Fix Allocator by always aligning to std::max_align_t intervals. This is what the standard says it must do, but I suspect carries some performance hit as each allocation must take up at least 16 bytes. It will also not work with over-aligned types, should these be necessary in future. System allocators tend to support over-aligned types: the GNU C++ environment supports over-aligned types with -faligned-new, switched on by default from C++17.
Fix Allocator by working out what the alignment should be (or falling back to alignof(std::max_align_t)) and allocating with variable alignment. I tried this yesterday! Wiring up Vampire this way is possible if tedious, but changing Allocator to handle variably-aligned allocations seems near-impossible in the current implementation and would amount to a rewrite, possibly introducing significant slowdown along the way.
(ノಠ益ಠ)ノ彡┻━┻ - remove Allocator and use the system allocator instead. This is what this PR does - the minimal change required to fall back to the system allocator. If we go down this route we should also change some other things, such as removing the need for STLAllocator, vvector and friends, and possibly removing e.g. USE_ALLOCATOR from the whole source tree. Pros and cons discussed below.

Solutions (2) or (4) seem promising, but we will probably have to accept some slowdown in either case. (2) is more conservative, but as you can tell, I'm considering (4). This has disadvantages:

I'm led to believe that the initial motivation for a custom allocator was improved performance. System allocators have improved significantly in the meantime, but there is still a small (somewhat inconsistent) performance hit using the system allocator. One might argue that this is because the system allocator correctly aligns allocations. ;-)
We no longer track how much memory we've used. This seems like a bigger problem than it is: if we really need this (and I'm not sure we do) we could write a much thinner wrapper around the system allocator or write platform-specific code to query the operating system.

But also some advantages:

No undefined behaviour!
System allocators get better over time.
Compilers can increasingly recognise and transform heap allocation patterns. I expect that a custom allocator defeats some of these passes unless we can annotate Allocator suitably.
Can use the system allocator without playing the traditional game of BYPASSING_ALLOCATOR whack-a-mole. This is not such a big deal for seasoned Vampire hackers, but new contributors consistently run into this. This is not as simple as "just use Lib::Stack rather than std::vector" - calling almost anything in std might allocate if you are not careful, like I/O routines. We also have to patch or work around third-party code like MiniSat or Z3.
Smaller Vampire, no USE_ALLOCATOR oddities.

In summary, there is a bug wandering around, and it will most likely bite us eventually. In my view the best solution is to remove Allocator and use the system allocator, like the majority of other programs. There is a performance hit, but in exchange we get significant convenience and the system allocator is fully correct.

I'd like your thoughts on this.

To attempt to quantify the performance hit, I ran 10-second DISCOUNT runs on a random subset of TPTP until Michaela started complaining about the fan noise. Log attached - format is problem master system-allocator, problems where neither solved it are removed.

comparison.txt

quickbeam123 · 2023-06-07T10:16:54Z

@MichaelRawson, how did you get your second column vampire - was it simply via -DUSE_SYSTEM_ALLOCATION=1?

MichaelRawson · 2023-06-07T10:57:11Z

No, USE_SYSTEM_ALLOCATION seems to cause more performance regression than strictly necessary. I suspect inlining - the compiler can probably inline (some of) the system operator new across Vampire, but not stuff from Allocator.cpp.

To get the right-hand column, in this branch I manually removed Allocator and replaced it with a skeleton.

quickbeam123 · 2023-06-07T15:50:12Z

So in my randomized setup, a discount10 strategy running for 50000Mi would lose around 80 problems (from first-order TPTP) on average.

This was done just with the USE_SYSTEM_ALLOCATION=1 switch. But generally, I would be hesitant to endorse such a degradation "just to be" standard complying.

(Do we know what is the alignment important on the compiler side?)

Having said that, here is a thought. With the TermList/Term representation in Vampire, we are ourselves relying on Term pointers to be 4-byte aligned (and use the two lowest bits for tagging). Since you brought up the issue with no alignment guarantee by Allocator, I suspect we could break vampire ourselves at the moment, by simply allocating a single char/byte from time to time. (To force some Terms to get allocated at non-aligned addresses.) Could that possibly work?

MichaelRawson · 2023-06-07T16:56:37Z

So in my randomized setup, a discount10 strategy running for 50000Mi would lose around 80 problems (from first-order TPTP) on average.

Interesting - and not as bad as I feared. Thanks for running the experiment so quickly!

I would be hesitant to endorse such a degradation "just to be" standard complying.

Fair enough - it's arguably the major problem with this line of thinking. It might be possible to claw something back with some targeted optimisation if we knew what causes the slowdown - any ideas? I'll look into that.

Do we know what is the alignment important on the compiler side?

I suspect we could break vampire ourselves

The TermList scenario you suggest is actually OK - Allocator does do some alignment, but only to the level of pointers (8 bytes) - so we're OK for TermList, which only needs 4-byte alignment. We are already broken in some sense because the sanitizer reports that we have UB when allocating a std::ifstream here (and we actually reach it, too), but it doesn't seem to actually cause a crash in this case.

Do we know what is the alignment important on the compiler side?

I'm parsing this as "Why is alignment important for the compiler?" - sorry for condescension if I misunderstood.

Alignment is important for compilers - some architectures cannot do unaligned loads/stores, and for some it's inefficient, so the compiler has to work out if some data is aligned, and work around it if it is not. Intel is actually pretty permissive in this respect, it can do both and in most cases there's not much difference in terms of performance. Architectures may have separate aligned/unaligned instructions - e.g. movaps and movups respectively on Intel.

As a result, e.g. Clang will emit aligned instructions by preference if it can prove that something "should" (in a standard-compliant program) be aligned. If it turns out not to be aligned, even Intel chips will fault and the program will crash immediately. Contrived example: https://godbolt.org/z/r36Y5Kqj5 - if Vampire did not put the allocation on a 16-byte boundary, we'd crash here.

MichaelRawson · 2023-07-12T09:15:07Z

An update - I will try to implement a less-complex allocator that is both standards-compliant and doesn't have a performance hit. If I succeed I will close this and open a new PR for that one. If not, we can return to this.

…ly tiny

This reverts commit 9eccf67. Not actually necessary with USE_SYSTEM_ALLOCATION!

MichaelRawson added bug question on hold (don't merge) labels Jun 7, 2023

MichaelRawson force-pushed the michael-remove-allocator branch from 38837bb to 7e25f7f Compare July 22, 2023 16:05

quickbeam123 force-pushed the michael-remove-allocator branch from 8c1a4c5 to 732afe7 Compare August 1, 2023 15:12

MichaelRawson and others added 7 commits August 3, 2023 15:28

gut Allocator

8b5ee43

remove type checks

433833c

disable allocator even more

921d72f

need some kind of small object allocator for perf, but it can be real…

f6b6476

…ly tiny

implementation tricks and documentation, cleanup

08fc075

add FSA5 and FSA7

e947575

fix alignment bug (!) with ALLOC_KNOWN, update STLAllocator

db1e34c

MichaelRawson force-pushed the michael-remove-allocator branch from 9676d32 to db1e34c Compare August 3, 2023 14:29

MichaelRawson added 3 commits August 3, 2023 21:31

make custom allocator player nicer with Valgrind

9eccf67

Revert "make custom allocator player nicer with Valgrind"

0a92726

This reverts commit 9eccf67. Not actually necessary with USE_SYSTEM_ALLOCATION!

restore USE_SYSTEM_ALLOCATION functionality

6518aa9

MichaelRawson mentioned this pull request Aug 6, 2023

Fix Allocator's alignment problems. #469

Merged

MichaelRawson closed this Aug 6, 2023

MichaelRawson deleted the michael-remove-allocator branch August 10, 2023 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocator alignment problems - remove it? #452

Allocator alignment problems - remove it? #452

MichaelRawson commented Jun 7, 2023

quickbeam123 commented Jun 7, 2023

MichaelRawson commented Jun 7, 2023

quickbeam123 commented Jun 7, 2023

MichaelRawson commented Jun 7, 2023

MichaelRawson commented Jul 12, 2023

Allocator alignment problems - remove it? #452

Allocator alignment problems - remove it? #452

Conversation

MichaelRawson commented Jun 7, 2023

quickbeam123 commented Jun 7, 2023

MichaelRawson commented Jun 7, 2023

quickbeam123 commented Jun 7, 2023

MichaelRawson commented Jun 7, 2023

MichaelRawson commented Jul 12, 2023