Skip to content

Commit 5451622

Browse files
committed
fix alignment computation for nested objects (#57722)
The alignment of a nested object (in C layouts) is not affected by the alignment of its parent container when computing a field offset (as if it will be allocated at address 0). This can be strongly counter-intuitive (as it implies it should add padding where it does not seem to provide value to the user), but this is required to match the C standard. It also permits users to write custom allocators which happen to provide alignment in excess of that which codegen may assume is guaranteed, and get the behavioral characteristics they intended to specify (without resorting to computing explicit padding). Addresses #57713 (comment) (Cherry-picked from ec3c02a in v1.11 backports branch)
1 parent eb272a3 commit 5451622

File tree

4 files changed

+47
-18
lines changed

4 files changed

+47
-18
lines changed

doc/src/manual/calling-c-and-fortran-code.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -543,15 +543,14 @@ is not valid, since the type layout of `T` is not known statically.
543543

544544
### SIMD Values
545545

546-
Note: This feature is currently implemented on 64-bit x86 and AArch64 platforms only.
547-
548546
If a C/C++ routine has an argument or return value that is a native SIMD type, the corresponding
549547
Julia type is a homogeneous tuple of `VecElement` that naturally maps to the SIMD type. Specifically:
550548

551-
> * The tuple must be the same size as the SIMD type. For example, a tuple representing an `__m128`
552-
> on x86 must have a size of 16 bytes.
553-
> * The element type of the tuple must be an instance of `VecElement{T}` where `T` is a primitive type that
554-
> is 1, 2, 4 or 8 bytes.
549+
> * The tuple must be the same size and elements as the SIMD type. For example, a tuple
550+
> representing an `__m128` on x86 must have a size of 16 bytes and Float32 elements.
551+
> * The element type of the tuple must be an instance of `VecElement{T}` where `T` is a
552+
> primitive type with a power-of-two number of bytes (e.g. 1, 2, 4, 8, 16, etc) such as
553+
> Int8 or Float64.
555554
556555
For instance, consider this C routine that uses AVX intrinsics:
557556

@@ -624,6 +623,10 @@ For translating a C argument list to Julia:
624623

625624
* `T`, where `T` is a Julia leaf type
626625
* argument value will be copied (passed by value)
626+
* `vector T` (or `__attribute__ vector_size`, or a typedef such as `__m128`)
627+
628+
* `NTuple{N, VecElement{T}}`, where `T` is a primitive Julia type of the correct size
629+
and N is the number of elements in the vector (equal to `vector_size / sizeof T`).
627630
* `void*`
628631

629632
* depends on how this parameter is used, first translate this to the intended pointer type, then
@@ -670,13 +673,16 @@ For translating a C return type to Julia:
670673
* `T`, where `T` is one of the primitive types: `char`, `int`, `long`, `short`, `float`, `double`,
671674
`complex`, `enum` or any of their `typedef` equivalents
672675

673-
* `T`, where `T` is an equivalent Julia Bits Type (per the table above)
674-
* if `T` is an `enum`, the argument type should be equivalent to `Cint` or `Cuint`
676+
* same as C argument list
675677
* argument value will be copied (returned by-value)
676678
* `struct T` (including typedef to a struct)
677679

678-
* `T`, where `T` is a Julia Leaf Type
680+
* same as C argument list
679681
* argument value will be copied (returned by-value)
682+
683+
* `vector T`
684+
685+
* same as C argument list
680686
* `void*`
681687

682688
* depends on how this parameter is used, first translate this to the intended pointer type, then

src/cgutils.cpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3301,7 +3301,7 @@ static void union_alloca_type(jl_uniontype_t *ut,
33013301
[&](unsigned idx, jl_datatype_t *jt) {
33023302
if (!jl_is_datatype_singleton(jt)) {
33033303
size_t nb1 = jl_datatype_size(jt);
3304-
size_t align1 = jl_datatype_align(jt);
3304+
size_t align1 = julia_alignment((jl_value_t*)jt);
33053305
if (nb1 > nbytes)
33063306
nbytes = nb1;
33073307
if (align1 > align)
@@ -3850,9 +3850,10 @@ static jl_cgval_t emit_new_struct(jl_codectx_t &ctx, jl_value_t *ty, size_t narg
38503850

38513851
// whether we should perform the initialization with the struct as a IR value
38523852
// or instead initialize the stack buffer with stores
3853+
// although we do the former if it is a vector or could be a vector element
38533854
auto tracked = CountTrackedPointers(lt);
38543855
bool init_as_value = false;
3855-
if (lt->isVectorTy() || jl_is_vecelement_type(ty)) { // maybe also check the size ?
3856+
if (lt->isVectorTy() || jl_special_vector_alignment(1, ty) != 0) {
38563857
init_as_value = true;
38573858
}
38583859
else if (tracked.count) {

src/datatype.c

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -291,9 +291,10 @@ static jl_datatype_layout_t *jl_get_layout(uint32_t sz,
291291
}
292292

293293
// Determine if homogeneous tuple with fields of type t will have
294-
// a special alignment beyond normal Julia rules.
294+
// a special alignment and vector-ABI beyond normal rules for aggregates.
295295
// Return special alignment if one exists, 0 if normal alignment rules hold.
296296
// A non-zero result *must* match the LLVM rules for a vector type <nfields x t>.
297+
// Matching the compiler's `__attribute__ vector_size` behavior.
297298
// For sake of Ahead-Of-Time (AOT) compilation, this routine has to work
298299
// without LLVM being available.
299300
unsigned jl_special_vector_alignment(size_t nfields, jl_value_t *t)
@@ -308,8 +309,12 @@ unsigned jl_special_vector_alignment(size_t nfields, jl_value_t *t)
308309
// motivating use case comes up for Julia, we reject pointers.
309310
return 0;
310311
size_t elsz = jl_datatype_size(ty);
311-
if (elsz != 1 && elsz != 2 && elsz != 4 && elsz != 8)
312-
// Only handle power-of-two-sized elements (for now)
312+
if (next_power_of_two(elsz) != elsz)
313+
// Only handle power-of-two-sized elements (for now), since other
314+
// lengths may be packed into very complicated arrangements (llvm pads
315+
// extra bits on most platforms when computing alignment but not when
316+
// computing type size, but adds no extra bytes for each element, so
317+
// their effect on offsets are never what you may naturally expect).
313318
return 0;
314319
size_t size = nfields * elsz;
315320
// Use natural alignment for this vector: this matches LLVM and clang.
@@ -601,9 +606,9 @@ void jl_compute_field_offsets(jl_datatype_t *st)
601606
}
602607
else {
603608
fsz = sizeof(void*);
604-
if (fsz > MAX_ALIGN)
605-
fsz = MAX_ALIGN;
606609
al = fsz;
610+
if (al > MAX_ALIGN)
611+
al = MAX_ALIGN;
607612
desc[i].isptr = 1;
608613
zeroinit = 1;
609614
npointers++;
@@ -648,8 +653,6 @@ void jl_compute_field_offsets(jl_datatype_t *st)
648653
if (al > alignm)
649654
alignm = al;
650655
}
651-
if (alignm > MAX_ALIGN)
652-
alignm = MAX_ALIGN; // We cannot guarantee alignments over 16 bytes because that's what our heap is aligned as
653656
if (LLT_ALIGN(sz, alignm) > sz) {
654657
haspadding = 1;
655658
sz = LLT_ALIGN(sz, alignm);
@@ -825,6 +828,18 @@ JL_DLLEXPORT jl_datatype_t *jl_new_primitivetype(jl_value_t *name, jl_module_t *
825828
jl_emptysvec, jl_emptysvec, jl_emptysvec, 0, 0, 0);
826829
uint32_t nbytes = (nbits + 7) / 8;
827830
uint32_t alignm = next_power_of_two(nbytes);
831+
# if defined(_CPU_X86_) && !defined(_OS_WINDOWS_)
832+
// datalayout strings are often weird: on 64-bit they usually follow fairly simple rules,
833+
// but on x86 32 bit platforms, sometimes 5 to 8 byte types are
834+
// 32-bit aligned even though the MAX_ALIGN (for types 9+ bytes) is 16
835+
// (except for f80 which is align 4 on Mingw, Linux, and BSDs--but align 16 on MSVC and Darwin)
836+
// https://llvm.org/doxygen/ARMTargetMachine_8cpp.html#adb29b487708f0dc2a940345b68649270
837+
// https://llvm.org/doxygen/AArch64TargetMachine_8cpp.html#a003a58caf135efbf7273c5ed84e700d7
838+
// https://llvm.org/doxygen/X86TargetMachine_8cpp.html#aefdbcd6131ef195da070cef7fdaf0532
839+
// 32-bit alignment is weird
840+
if (alignm == 8)
841+
alignm = 4;
842+
# endif
828843
if (alignm > MAX_ALIGN)
829844
alignm = MAX_ALIGN;
830845
// memoize isprimitivetype, since it is much easier than checking

test/core.jl

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5665,6 +5665,13 @@ let ni128 = sizeof(FP128test) ÷ sizeof(Int),
56655665
@test reinterpret(UInt128, arr[2].fp) == expected
56665666
end
56675667

5668+
# make sure VecElement Tuple has the C alignment and ABI for supported types
5669+
primitive type Int24 24 end
5670+
@test Base.datatype_alignment(NTuple{10,VecElement{Int16}}) == 32
5671+
@test Base.datatype_alignment(NTuple{10,VecElement{Int24}}) == 4
5672+
@test Base.datatype_alignment(NTuple{10,VecElement{Int64}}) == 128
5673+
@test Base.datatype_alignment(NTuple{10,VecElement{Int128}}) == 256
5674+
56685675
# issue #21516
56695676
struct T21516
56705677
x::Vector{Float64}

0 commit comments

Comments
 (0)