[SYCL][Docs] Add SYCLBIN feature and format design document #16872

steffenlarsen · 2025-02-03T16:01:28Z

This commit adds a design document detailing the SYCLBIN binary format for representing SYCL device kernel binaries to be loaded dynamically at runtime. Additionally, the design document details how this is to be handled by the SYCL runtime, driver and clang tooling.

As the design of SYCLBIN files relies heavily on the property sets, this PR also adds documentation to the existing property set functionality.

This commit adds a design document detailing the SYCLBIN binary format for representing SYCL device kernel binaries to be loaded dynamically at runtime. Additionally, the design document details how this is to be handled by the SYCL runtime, driver and clang tooling. Signed-off-by: Larsen, Steffen <[email protected]>

bader

SYCL design documentation predominantly uses Markdown format. Please, convert the document to Markdown format.

Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen · 2025-02-04T08:27:26Z

SYCL design documentation predominantly uses Markdown format. Please, convert the document to Markdown format.

Apologies! Old habits die hard. It should be good now.

Signed-off-by: Larsen, Steffen <[email protected]>

sycl/doc/design/SYCLBINDesign.md

AlexeySachkov · 2025-02-04T13:47:39Z

sycl/doc/design/SYCLBINDesign.md

+*TODO:* Do we need a target-specific blob inside this structure? E.g. for CUDA
+        we may want to embed the SM version.


If we are talking about a specific CUDA property, is it still IR module? I suppose that I don't understand well enough what is PTX and what is its place in a toolchain.

Target-specific for me means "native", i.e. as if PTX is incorrectly assumed as IR module. Also, all IR modules are expected to share the same properties within an abstract module, right? If so, then maybe we should propagate that property up to the abstract module level and have PTX modules compiled for different SM versions as separate abstract modules?

If we are talking about a specific CUDA property, is it still IR module?

Because of the forward-compatibility of SM archs, my understanding is that we want PTX to be considered an IR type.

Target-specific for me means "native", i.e. as if PTX is incorrectly assumed as IR module. Also, all IR modules are expected to share the same properties within an abstract module, right? If so, then maybe we should propagate that property up to the abstract module level and have PTX modules compiled for different SM versions as separate abstract modules?

If we were to put the SM architecture information at abstract module level, I don't see how an abstract module would ever have more than one IR module and more than one native device code image. Granted, having the exact same properties is somewhat rare, but I would expect it to be the case if the user was to compile for multiple SM versions.

I agree that it seems like we would probably want to annotate the IR module with the CUDA virtual architecture in the case when the IR is PTX. I was thinking that we would use the IR-level metadata for stuff like this.

I've added a "target" field to the IR module metadata which contains the value of -fsycl-targets used for the given module, similar to "arch" in native device code images, with the exception that this key may be missing from the metadata in the case that the option wasn't specified, in which case the IR type should be enough to infer from.

AlexeySachkov · 2025-02-04T13:48:50Z

sycl/doc/design/SYCLBINDesign.md

+module AOT compiled for a specific device, identified by the architecture
+string.


I think that we need to specify what is the architecture string here.

Is it target triple? Is it value passed to -fsycl-targets? Is it value from architecture enum from our device architecture extension?

It is not clear how RT can use this field without such specifiation.

That is actually a good question. I am not sure what the architecture string would be for cases like SASS binaries. For example, lets say we've compiled to PTX through our compiler, then load that to a kernel-bundle, compile that kernel bundle to native device code and then serialize that to SYCLBIN. The -fsycl-targets would not be enough to express the architecture here, I believe.

In your example, the application has compiled the PTX to native code. Wouldn't you know the native architecture when this happens? It seems like the set of possible native CUDA architectures is a fixed set which would each map to one of the -fsycl-targets values.

I wonder if there is a reason to use a string for the architecture names. Why couldn't this be an enumeration? We use an enumeration for the device architectures in sycl_ext_oneapi_device_architecture.

In any case, I agree with @AlexeySachkov. I think the set of possible architectures should be specified in the file format.

In your example, the application has compiled the PTX to native code. Wouldn't you know the native architecture when this happens? It seems like the set of possible native CUDA architectures is a fixed set which would each map to one of the -fsycl-targets values.

I will have to do some research here. I know PTX can be associated with SM architectures, but I don't know if the same applies to the native device code produced from PTX. It may be device-specific and as such more strict than the SM version.

I wonder if there is a reason to use a string for the architecture names. Why couldn't this be an enumeration? We use an enumeration for the device architectures in sycl_ext_oneapi_device_architecture.

Since the compiler will need to know about these architectures too, I am reluctant to try and match enum values between the runtime and library for this purpose.

I've specified that the "arch" value is the string that corresponds to the -fsycl-targets value used for the binary. The runtime should generally be able to convert that to the enums in sycl_ext_oneapi_device_architecture.

I've specified that the "arch" value is the string that corresponds to the -fsycl-targets value used for the binary. The runtime should generally be able to convert that to the enums in sycl_ext_oneapi_device_architecture.

The approach works for me, but I feel like the clarification is actually missing from the doc.

Good point! I've added the reference to the SYCL extension with a note about the runtime attempting to make the appropriate mapping.

sycl/doc/design/SYCLBINDesign.md

AlexeySachkov · 2025-02-04T13:53:50Z

sycl/doc/design/SYCLBINDesign.md

+directly, instead of extracting it from a host binary. This should be done when
+a new flag, `--syclbin`, is passed. In this case, the clang-linker-wrapper is


So, .syclbin files cannot be used if the output is not .syclbin, right?

I'm not sure if I have a use case for that, just wanted to double-check the intent.

A potential use-case, though, is ability to embed .syclbin into an application as if that device code was originally compiled as part of the application. I.e. you had your dynamically loadable .syclbin, but at some point decided to embed it and stop shipping it separately. But that will have some implications on the API, I assume: we need to design then how to use such embedded SYCLBIN.

sycl/doc/design/SYCLBINDesign.md

Signed-off-by: Larsen, Steffen <[email protected]>

sycl/doc/design/SYCLBINDesign.md

gmlueck · 2025-02-05T17:04:28Z

sycl/doc/design/SYCLBINDesign.md

+*TODO:* Do we need a target-specific blob inside this structure? E.g. for CUDA
+        we may want to embed the SM version.


I agree that it seems like we would probably want to annotate the IR module with the CUDA virtual architecture in the case when the IR is PTX. I was thinking that we would use the IR-level metadata for stuff like this.

gmlueck · 2025-02-05T17:11:26Z

sycl/doc/design/SYCLBINDesign.md

+module AOT compiled for a specific device, identified by the architecture
+string.


In your example, the application has compiled the PTX to native code. Wouldn't you know the native architecture when this happens? It seems like the set of possible native CUDA architectures is a fixed set which would each map to one of the -fsycl-targets values.

I wonder if there is a reason to use a string for the architecture names. Why couldn't this be an enumeration? We use an enumeration for the device architectures in sycl_ext_oneapi_device_architecture.

In any case, I agree with @AlexeySachkov. I think the set of possible architectures should be specified in the file format.

sycl/doc/design/SYCLBINDesign.md

Co-authored-by: Michael Toguchi <[email protected]>

Co-authored-by: Greg Lueck <[email protected]>

Signed-off-by: Larsen, Steffen <[email protected]>

sycl/doc/design/SYCLBINDesign.md

sycl/doc/design/PropertySets.md

sycl/doc/design/SYCLBINDesign.md

sycl/doc/design/PropertySets.md

Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen · 2025-03-14T12:32:41Z

@AlexeySachkov @asudarsa @mdtoguchi - I believe I've addressed the open comments. A re-review would be much appreciated. 😄

sycl/doc/design/SYCLBINDesign.md

Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen · 2025-03-24T08:42:50Z

@intel/dpcpp-doc-reviewers @AlexeySachkov @asudarsa - Friendly ping.

sycl/doc/design/SYCLBINDesign.md

AlexeySachkov

A few final comments here and there, but nothing major, I believe

sycl/doc/design/PropertySets.md

sycl/doc/design/SYCLBINDesign.md

AlexeySachkov · 2025-03-27T14:08:17Z

sycl/doc/design/SYCLBINDesign.md

+An abstract module metadata entry contains any number of property sets, as
+described in [PropertySets.md](PropertySets.md), excluding:


Not sure if this is a problem, but what if we add a new property there which cannot be applied to an abstract module?

I don't think adding new cases should be a problem. It is mostly to say that these property sets have no meaning for the abstract module metadata block. I don't think it's worth checking whether they are there or not, over just skipping them.

AlexeySachkov · 2025-03-27T14:14:20Z

sycl/doc/design/SYCLBINDesign.md

+module AOT compiled for a specific device, identified by the architecture
+string.


I've specified that the "arch" value is the string that corresponds to the -fsycl-targets value used for the binary. The runtime should generally be able to convert that to the enums in sycl_ext_oneapi_device_architecture.

The approach works for me, but I feel like the clarification is actually missing from the doc.

AlexeySachkov · 2025-03-27T14:18:06Z

sycl/doc/design/SYCLBINDesign.md

+`-fsycl` pipeline, instead passing the output of the clang-offload-packager
+invocation to clang-linker-wrapper together with the new `--syclbin` flag.
+
+Setting this option will imply `-fsycl` and override `-fsycl-device-only`.


Could you please clarify what do you mean by "override -fsycl-device-only"? Isn't skipping host compilation implies -fsycl-device-only?

I suppose so. To some extend I think the terminology overlaps a little. They are not really mutually exclusive, it is just "-fsycl-device-only with more". I'll change it to just "implies". @mdtoguchi - Is there a precedence for the terminology used here?

I think the intention of the wording here is that use of -fsyclbin will only produce the SYCLBIN file. Expectation of using -fsycl-device-only is produce just the device binary. If they are used together, -fsycl-device-only is overridden from a driver function perspective.

Co-authored-by: Alexey Sachkov <[email protected]>

Signed-off-by: Larsen, Steffen <[email protected]>

mdtoguchi · 2025-03-28T15:40:55Z

sycl/doc/design/SYCLBINDesign.md

+`-fsycl` pipeline, instead passing the output of the clang-offload-packager
+invocation to clang-linker-wrapper together with the new `--syclbin` flag.
+
+Setting this option implies `-fsycl` and `-fsycl-device-only`.


Now that I see this written, what is the expectation of behavior when -fsyclbin -fsycl-device-only is used on the command line? The natural behavior here is for -fsycl-device-only to be seen and taken advantage of, which will create the device only file and stop. -fsyclbin does additional work after the device compilation.

That's a good point. Maybe it would be better to issue a diagnostic if they are used together? Seems like they may not be totally related, despite there being some conceptual overlap.

I think the standard 'unused argument' diagnostic should be sufficient when used together:

> clang++ --offload-new-driver -fsycl-device-only -fsyclbin ~/a.cpp clang++: warning: argument unused during compilation: '-fsyclbin' [-Wunused-command-line-argument]

Sounds good! I've reworded this part to explicitly say that the -fsyclbin will be unused.

asudarsa · 2025-04-01T02:29:56Z

sycl/doc/design/SYCLBINDesign.md

+
+## clang-linker-wrapper changes
+
+The clang-linker-wrapper is responsible for doing post-processing and linking of


What is meant by post-processing here? Thanks

Module-splitting and metadata analysis/extraction mainly. I've expanded it a bit.

asudarsa · 2025-04-01T02:42:14Z

sycl/doc/design/SYCLBINDesign.md

+SYCLBIN files are linked together is yet to be specified.
+
+
+## clang-linker-wrapper changes


We are in the process of moving most of the SYCL specific functionality from clang-linker-wrapper into a new tool called clang-sycl-linker. So, this documentation will need to be updated based on that. For the purposes of this PR, we can use clang-linker-wrapper.
Just heads up.

Thanks for the heads up! Would it make sense to change it now? From a documentation POV, is it as simple as a search-and-replace or is there an important semantic difference between the tools?

Signed-off-by: Larsen, Steffen <[email protected]>

sycl/doc/design/SYCLBINDesign.md

Co-authored-by: Michael Toguchi <[email protected]>

maksimsab · 2025-04-02T11:58:15Z

sycl/doc/design/SYCLBINDesign.md

+the SYCL runtime library handles files of this format.
+
+
+## SYCLBIN binary format


Testing and debugging would require some capabilities of searching/extracting information from files of this format.
It would require us to provide such tools/utilities.
I think that making this format based on ELF would allows us to reuse some available utilities like standard gnu packages (readelf, objdump and i.e) and llvm utilities (llvm-objdump and rich LLVM library). LLVM library's support of ELF could be reused in the development as well.

Custom binary format requires custom support in many ways that significantly burdens the development and maintaining. I think we should strive to the generic Offloading Format from LLVM as much as we can.

What do you think?

I do agree with the general sentiment and I am open to the idea of reusing ELF as the format. However, I don't see how this format fits with ELF, as we would just be fitting bogus into a lot of the pre-defined ELF headers and sections. SYCLBIN is not an executable format per-se and to do appropriate linking the linker will have to consider the binary metadata too, which we would have to retrofit into some text section of the ELF file.

For tooling, I could see it, but are there any other tools than llvm-objdump (and readelf and objdump) that we would get "for free" if we used ELF? Even if we do, would users be able to get much out of the metadata without us adding additional functionality to these tool or entirely new tooling? When you mention "rich LLVM library" could you be more specific?

I've previously tried and failed to fit the SYCLBIN format into the existing ELF format in a way that isn't just what the current design is but separated into best-effort chunks in the ELF format, so please, if you have a suggestion of how to structure the format based off ELF, please do explain your thoughts.

maksimsab · 2025-04-02T12:18:35Z

sycl/doc/design/PropertySets.md

@@ -0,0 +1,296 @@
+# SYCL binary property sets


There is our fault of not properly communicating the status of upstreaming of PropertySets.

There has been an attempt of doing that: llvm/llvm-project#110771
And there is comment/request to use text format (JSON or YAML):
llvm/llvm-project#110771 (comment)

In overall, we doubt that we able to get PropertySets in llvm-project as it is. At the moment, Justin is experimenting with text formats to add this into intel/llvm. Therefore, it is not provident to make the old binary format of PropertySet the part of a new ABI.

In these document we could describe properties in an abstract way as a collection of string keys and int/string values without specifying the exact binary format.

In the syclbin format we could add some field specifying the actual format like:

struct Properties { std::byte format[8]; // "json", "yaml", "prop" (old binary properties) and i.e. uint32_t size; std::byte encoded_data[size]; }

I don't insist on this concrete structure.

I don't think switching the format would be a problem. SYCLBIN just embeds the same properties as the compiler generates. This document is not intended to document a constituent of SYCLBIN but rather the current design of the property sets, which were not well documented. If we change to some other properties format, it should be straight forward to just switch over to embedding that instead, as they are simply encoded into the byte tables.

Yes. I mean that a documentation format similar to AMD's (link) table format wouldn't require an additional work in the future.

sycl/doc/design/SYCLBINDesign.md

steffenlarsen · 2025-04-08T10:13:38Z

The switch to ELF-based format was discussed offline and we decided to go with the current design for now to enable the development of the feature and then consider the upstreamability in parallel.

maksimsab

LGTM.

PropertySets could be transformed to table later.

steffenlarsen requested a review from a team as a code owner February 3, 2025 16:01

steffenlarsen requested review from uditagarwal97, cperkinsintel, mdtoguchi, asudarsa and gmlueck February 3, 2025 16:01

steffenlarsen mentioned this pull request Feb 3, 2025

[SYCL][Offload] Add SYCLBIN format and dump tool #16873

Open

bader reviewed Feb 3, 2025

View reviewed changes

steffenlarsen added 2 commits February 4, 2025 00:16

Move to Markdown format

Loading
Loading status checks…

60ff95f

Signed-off-by: Larsen, Steffen <[email protected]>

Fix tables, links and titles

Loading
Loading status checks…

b54afa8

Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen added 2 commits February 4, 2025 03:18

Fix xrefs

Loading
Loading status checks…

12a6cad

Signed-off-by: Larsen, Steffen <[email protected]>

Use link for clang design

Loading
Loading status checks…

0aad200

Signed-off-by: Larsen, Steffen <[email protected]>

AlexeySachkov reviewed Feb 4, 2025

View reviewed changes

mdtoguchi reviewed Feb 4, 2025

View reviewed changes

sycl/doc/design/SYCLBINDesign.md Outdated Show resolved Hide resolved

sycl/doc/design/SYCLBINDesign.md Outdated Show resolved Hide resolved

sycl/doc/design/SYCLBINDesign.md Outdated Show resolved Hide resolved

cperkinsintel reviewed Feb 4, 2025

View reviewed changes

sycl/doc/design/SYCLBINDesign.md Outdated Show resolved Hide resolved

gmlueck reviewed Feb 4, 2025

View reviewed changes

steffenlarsen added 2 commits February 5, 2025 01:11

Address first set of comments

Loading
Loading status checks…

1277bd9

Signed-off-by: Larsen, Steffen <[email protected]>

Remove redundant description

Loading
Loading status checks…

1d2b5c8

Signed-off-by: Larsen, Steffen <[email protected]>

gmlueck reviewed Feb 5, 2025

View reviewed changes

sycl/doc/design/SYCLBINDesign.md Outdated Show resolved Hide resolved

steffenlarsen and others added 4 commits February 6, 2025 07:54

Update sycl/doc/design/SYCLBINDesign.md

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

c7c1512

Co-authored-by: Greg Lueck <[email protected]>

Add kernel names back and fix array types

Loading
Loading status checks…

edca48e

Signed-off-by: Larsen, Steffen <[email protected]>

Switch to headers-based structure and add property set design document

Loading
Loading status checks…

1361d48

Signed-off-by: Larsen, Steffen <[email protected]>

gmlueck reviewed Feb 25, 2025

View reviewed changes

steffenlarsen added 3 commits February 25, 2025 22:50

Address PropertySets.md comments

533e901

Signed-off-by: Larsen, Steffen <[email protected]>

Address SYCLBIN design comments

Loading
Loading status checks…

63d0f9a

Signed-off-by: Larsen, Steffen <[email protected]>

Removed unfinished line

Loading
Loading status checks…

fbf54ad

Signed-off-by: Larsen, Steffen <[email protected]>

Add target and specify arch

Loading
Loading status checks…

3d76c2a

Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen requested review from AlexeySachkov, mdtoguchi and bader March 12, 2025 11:00

mdtoguchi reviewed Mar 14, 2025

View reviewed changes

sycl/doc/design/SYCLBINDesign.md Show resolved Hide resolved

Remove undocumented option

Loading
Loading status checks…

6e9a6b0

Signed-off-by: Larsen, Steffen <[email protected]>

mdtoguchi approved these changes Mar 14, 2025

View reviewed changes

steffenlarsen commented Mar 26, 2025

View reviewed changes

sycl/doc/design/SYCLBINDesign.md Outdated Show resolved Hide resolved

Update sycl/doc/design/SYCLBINDesign.md

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

ef7f2a2

AlexeySachkov reviewed Mar 27, 2025

View reviewed changes

steffenlarsen and others added 2 commits March 27, 2025 15:39

Apply suggestions from code review

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

0bc59d5

Co-authored-by: Alexey Sachkov <[email protected]>

Address comments

Loading
Loading status checks…

13fad43

Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen requested a review from AlexeySachkov March 28, 2025 10:56

mdtoguchi reviewed Mar 28, 2025

View reviewed changes

asudarsa reviewed Apr 1, 2025

View reviewed changes

steffenlarsen added 3 commits March 31, 2025 23:20

Expand post-processing

1ca7ec0

Signed-off-by: Larsen, Steffen <[email protected]>

Specify -fsyclbin being ignored if used with -fsycl-device-only

Loading
Loading status checks…

874c3bf

Signed-off-by: Larsen, Steffen <[email protected]>

ignored -> unused

Loading
Loading status checks…

bd71eb5

Signed-off-by: Larsen, Steffen <[email protected]>

AlexeySachkov approved these changes Apr 1, 2025

View reviewed changes

mdtoguchi reviewed Apr 1, 2025

View reviewed changes

sycl/doc/design/SYCLBINDesign.md Outdated Show resolved Hide resolved

maksimsab reviewed Apr 2, 2025

View reviewed changes

steffenlarsen requested a review from maksimsab April 8, 2025 10:13

maksimsab approved these changes Apr 9, 2025

View reviewed changes

gmlueck approved these changes Apr 9, 2025

View reviewed changes

steffenlarsen merged commit c3601c2 into intel:sycl Apr 9, 2025
4 checks passed

		TODO: Do we need a target-specific blob inside this structure? E.g. for CUDA
		we may want to embed the SM version.

		module AOT compiled for a specific device, identified by the architecture
		string.

		directly, instead of extracting it from a host binary. This should be done when
		a new flag, `--syclbin`, is passed. In this case, the clang-linker-wrapper is

		An abstract module metadata entry contains any number of property sets, as
		described in [PropertySets.md](PropertySets.md), excluding:


		## clang-linker-wrapper changes

		The clang-linker-wrapper is responsible for doing post-processing and linking of

		SYCLBIN files are linked together is yet to be specified.


		## clang-linker-wrapper changes

		the SYCL runtime library handles files of this format.


		## SYCLBIN binary format

[SYCL][Docs] Add SYCLBIN feature and format design document #16872

[SYCL][Docs] Add SYCLBIN feature and format design document #16872

Conversation

steffenlarsen commented Feb 3, 2025 • edited Loading

bader left a comment

Choose a reason for hiding this comment

steffenlarsen commented Feb 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steffenlarsen commented Mar 14, 2025

steffenlarsen commented Mar 24, 2025

AlexeySachkov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steffenlarsen commented Apr 8, 2025

maksimsab left a comment

Choose a reason for hiding this comment

steffenlarsen commented Feb 3, 2025 •

edited

Loading