Implement vec256/512 #3966

TheNumbat · 2025-05-05T00:20:29Z

Adds support for 256 and 512 bit vectors throughout the compiler. Generally has exactly the same semantics as 128 bit vectors, but larger. About half of the diff (+5k loc) is from enabling AVX/AVX2 in simdgen.

256-bit types are allowed with simd_beta. The basic tests for 128-bit have been duplicated for 256, but none of the operations in the other tests are implemented yet. AVX and AVX2 architecture extensions have been added and are enabled by default, since we already enable other Haswell extensions.
512-bit types are allowed with simd_alpha. They do not yet have tests and their usage will trigger a fatal error in emit. The AVX512F extension is also disabled by default.

Runtime changes:

caml_call_gc and related functions now have versions that save ymm/zmm registers. Emit chooses which one to call based on the live register set. Programs that do not use 256/512 bit vectors should see identical behavior.
When a C call includes vec256 or vec512 stack arguments, C compilers assume the stack will be 32/64 byte aligned upon entering the external call. Therefore we emit a call to caml_c_call_stack_args_32/64, which aligns the C stack before copying the arguments from the OCaml stack. We also do this in runtime4.

For later PRs:

constants (+tests)
static & reinterpret casts (+tests)
array accessors (+tests)
intrinsics (+tests)
align vec256/512 stack slots on the ocaml stack
use vex encoding for sse intrinsics when avx is enabled (evex if avx512)
avx512 mask registers

TheNumbat · 2025-05-20T21:23:41Z

backend/x86_binary_emitter.ml

-  buf_int8 buf (((lnot rexr) lsl 7) lor
-                ((lnot rexx) lsl 6) lor
-                ((lnot rexb) lsl 5) lor
+  buf_int8 buf (((rexr lxor 1) lsl 7) lor


This is a bugfix

TheNumbat added flambda2 Prerequisite for, or part of, flambda2 backend simd SIMD support labels May 5, 2025

TheNumbat force-pushed the vec256-512 branch from 0b58fe6 to ee745db Compare May 7, 2025 02:20

TheNumbat added the lambda Lambda language changes label May 7, 2025

TheNumbat force-pushed the vec256-512 branch from 74cb7ab to ae5309b Compare May 20, 2025 17:44

TheNumbat marked this pull request as ready for review May 20, 2025 21:04

TheNumbat commented May 20, 2025

View reviewed changes

squash

c6c0e8e

TheNumbat force-pushed the vec256-512 branch from c73077d to c6c0e8e Compare May 20, 2025 21:45

TheNumbat added 5 commits May 20, 2025 17:57

format

1bfbf6f

revert config change

3af8652

delete runtime stuff again

c70f4ad

new rt5 impl

db6a66f

rt4 (broken)

e68487a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement vec256/512 #3966

Implement vec256/512 #3966

TheNumbat commented May 5, 2025 •

edited

Loading

TheNumbat May 20, 2025

Implement vec256/512 #3966

Are you sure you want to change the base?

Implement vec256/512 #3966

Conversation

TheNumbat commented May 5, 2025 • edited Loading

TheNumbat May 20, 2025

Choose a reason for hiding this comment

TheNumbat commented May 5, 2025 •

edited

Loading