String endianness is incorrect: big-endian strings cannot be read on little-endian machines #5595

BloCamLimb · 2024-02-26T08:21:23Z

According to the specification:

The character set is Unicode in the UTF-8 encoding scheme. The UTF-8 octets (8-bit bytes) are packed four per word, following the little-endian convention (i.e., the first octet is in the lowest-order 8 bits of the word).

This means that bytes must be swapped before reading a big-endian spv file on a little-endian machine, and vice versa. For example, glslang outputs big-endian encoded spv binary files on big-endian machines. But when disassembling this file via spirv-dis on little-endian machines, only words and operands are handled properly via spvFixWord, strings are handled in host endianness, which is not correct.

More specifically:
"GLSL.std.450" in big-endian encoding (or on big-endian machines), the first octet 'G' should be in the lowest-order byte, which is the fourth byte in a word. Then in the file (or memory), from the first byte to the last byte, from left to right is, 'L''S''L''G' 'd''t''s''.' '0''5''4''.' '\0''\0''\0''\0', 16 bytes and 4 words in total.
When reading this big-endian encoded file:
On big-endian machines, reinterpret the each consecutive 4 bytes as unit32_t, and use bit shift to obtain the first octet, like (word >> 0) & 0xFF. We will get the fourth byte, which is 'G', and this is correct.
On little-endian machines, the result of the bit operation is the first byte, which is 'L'. This is not correct, because there is no call to spvFixWord.

I'm making a compiler in Java myself and can selectively output spv binary files in little-endian or big-endian (the default is host endianness). I encountered this issue when running spirv-dis:

; SPIR-V
; Version: 1.5
; Generator: Khronos; 0
; Bound: 25
; Schema: 0
               OpCapability Shader
error: 2: Invalid extended instruction import 'LSLGdts.054.'

A related issue is #149 and PR #4622, but it does not fix this issue.

My spirv-dis version: SPIRV-Tools v2023.6 v2023.6.rc1-50-gdc667644
Here is my spv binary file for testing purposes, git describes this file as Khronos SPIR-V binary, big-endian, version 0x010500, generator 00000000, my CPU is little-endian
test_shader.zip

The text was updated successfully, but these errors were encountered:

cassiebeckley assigned dneto0 Feb 28, 2024

cassiebeckley added the component:as/dis label Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String endianness is incorrect: big-endian strings cannot be read on little-endian machines #5595

String endianness is incorrect: big-endian strings cannot be read on little-endian machines #5595

BloCamLimb commented Feb 26, 2024

String endianness is incorrect: big-endian strings cannot be read on little-endian machines #5595

String endianness is incorrect: big-endian strings cannot be read on little-endian machines #5595

Comments

BloCamLimb commented Feb 26, 2024