Skip to content

Implement vec256/512 #3966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Implement vec256/512 #3966

wants to merge 6 commits into from

Conversation

TheNumbat
Copy link
Contributor

@TheNumbat TheNumbat commented May 5, 2025

Adds support for 256 and 512 bit vectors throughout the compiler. Generally has exactly the same semantics as 128 bit vectors, but larger. About half of the diff (+5k loc) is from enabling AVX/AVX2 in simdgen.

  • 256-bit types are allowed with simd_beta. The basic tests for 128-bit have been duplicated for 256, but none of the operations in the other tests are implemented yet. AVX and AVX2 architecture extensions have been added and are enabled by default, since we already enable other Haswell extensions.
  • 512-bit types are allowed with simd_alpha. They do not yet have tests and their usage will trigger a fatal error in emit. The AVX512F extension is also disabled by default.

Runtime changes:

  • caml_call_gc and related functions now have versions that save ymm/zmm registers. Emit chooses which one to call based on the live register set. Programs that do not use 256/512 bit vectors should see identical behavior.
  • When a C call includes vec256 or vec512 stack arguments, C compilers assume the stack will be 32/64 byte aligned upon entering the external call. Therefore we emit a call to caml_c_call_stack_args_32/64, which aligns the C stack before copying the arguments from the OCaml stack. We also do this in runtime4.

For later PRs:

  • constants (+tests)
  • static & reinterpret casts (+tests)
  • array accessors (+tests)
  • intrinsics (+tests)
  • align vec256/512 stack slots on the ocaml stack
  • use vex encoding for sse intrinsics when avx is enabled (evex if avx512)
  • avx512 mask registers

@TheNumbat TheNumbat added flambda2 Prerequisite for, or part of, flambda2 backend simd SIMD support labels May 5, 2025
@TheNumbat TheNumbat added the lambda Lambda language changes label May 7, 2025
@TheNumbat TheNumbat marked this pull request as ready for review May 20, 2025 21:04
buf_int8 buf (((lnot rexr) lsl 7) lor
((lnot rexx) lsl 6) lor
((lnot rexb) lsl 5) lor
buf_int8 buf (((rexr lxor 1) lsl 7) lor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bugfix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend flambda2 Prerequisite for, or part of, flambda2 lambda Lambda language changes simd SIMD support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant