Randomized failure of PTX JIT Compilation #528

PolyRocketMatt · 2024-08-11T00:20:07Z

Describe the bug

Whenever I am trying to run my application on the latest TornadoVM build, it only at some occasions throws an error indicating that PTX JIT compilation failed:

Unable to compile task 300502d1-daec-4e34-b335-8fde2503eb00.mxm - addFloat
The internal error is: [Error During the Task Compilation]

How To Reproduce

My Main.java just simply runs the following:

public class Main {

    public static void main(String[] args) {
        run();
    }

    public void run() {
        FloatArray nativeBuffer = getFrom...();
        FloatArray nativeResultBuffer = getFrom...();
        size = 1024;

        // Create a task-graph with multiple tasks. Each task points to an exising Java method
        // that can be accelerated on a GPU/FPGA
        TaskGraph taskGraph = new TaskGraph(UUID.randomUUID().toString())
                .transferToDevice(DataTransferMode.FIRST_EXECUTION, nativeBuffer, nativeResultBuffer) // Transfer data from host to device only in the first execution
                .task("mxm", AdditionTask::addFloat, nativeBuffer, nativeResultBuffer, 1.0f, size)             // Each task points to an existing Java method
                .transferToHost(DataTransferMode.EVERY_EXECUTION, nativeResultBuffer);     // Transfer data from device to host

        // Create an immutable task-graph
        ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();

        // Create an execution plan from an immutable task-graph
        try (TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {

            // Run the execution plan on the default device
            TornadoExecutionResult executionResult = executionPlan.execute();

            if (executionResult.isReady()) {
               ...
            }
        } catch (TornadoExecutionPlanException ex) {
            ex.printStackTrace()
        }
    }
}

The AdditionTask class looks like this:

public class AdditionTask implements BufferTask {

    public static void addFloat(@NotNull FloatArray input, @NotNull FloatArray output,
                                float value, int size) {
        for (@Parallel int i = 0; i < size; i++)
            output.set(i, 1.0f);
    }
}

In this case, BufferTask is just an empty interface.

Expected behavior

I'm expecting the code to run, without throwing any compilation issues. This happens in some cases but not all.

Computing system setup (please complete the following information):

OS: Windows 10
CUDA: cuda_11.7.r11.7/compiler.31294372_0
PTX: How do I obtain this?
TornadoVM commit id: 8db121e

Additional context

The attached log is generated using the --debug flag and is one from my original program. The minimal reproducible example should still be a valid proxy.

debug.txt

The text was updated successfully, but these errors were encountered:

jjfumero · 2024-08-11T08:27:58Z

Hi @PolyRocketMatt , thanks for the report.

To see the generated PTX code, you can use the option --printKernel for the tornado command.

Just having a quick look at the issue, and I am not sure the annotation @NotNull is supported. It might happen that IR is not clean (ready to be consumed by the PTX code generator) because the annotation introduced more nodes. Can you check without this?

PolyRocketMatt · 2024-08-11T11:09:50Z

Hi, thanks for the fast response.

I have tried compiling and running the code without the @NotNull annotation, but this still seems to work only occasionally still. This is the generated PTX code, which seems to be running in version 7.6:

.version 7.6
.target sm_86
.address_size 64

.visible .entry 90375540_e9cf_461e_b9a2_d7dc8d46e67e_mxm_addfloat_arrays_floatarray_arrays_floatarray_1_0_1024(.param .u64 .ptr .global .align 8 kernel_context, .param .u64 .ptr .global .align 8 input, .param .u64 .ptr .global .align 8 output, .param .align 8 .u64 value, .param .align 8 .u64 size) {
        .reg .s64 rsd<3>;
        .reg .u64 rud<5>;
        .reg .s32 rsi<6>;
        .reg .pred rpb<2>;
        .reg .u32 rui<5>;

BLOCK_0:
        ld.param.u64    rud0, [kernel_context];
        ld.param.u64    rud1, [output];
        mov.u32 rui0, %nctaid.x;
        mov.u32 rui1, %ntid.x;
        mul.wide.u32    rud2, rui0, rui1;
        cvt.s32.u64     rsi0, rud2;
        mov.u32 rui2, %tid.x;
        mov.u32 rui3, %ctaid.x;
        mad.lo.s32      rsi1, rui3, rui1, rui2;

BLOCK_1:
        mov.s32 rsi2, rsi1;
LOOP_COND_1:
        setp.lt.s32     rpb0, rsi2, 1024;
        @!rpb0 bra      BLOCK_3;

BLOCK_2:
        add.s32 rsi3, rsi2, 6;
        cvt.s64.s32     rsd0, rsi3;
        shl.b64 rsd1, rsd0, 2;
        add.u64 rud3, rud1, rsd1;
        st.global.f32   [rud3], 0F3F800000;
        add.s32 rsi4, rsi0, rsi2;
        mov.s32 rsi2, rsi4;
        bra.uni LOOP_COND_1;

BLOCK_3:
        ret;
}

PolyRocketMatt · 2024-08-11T11:36:40Z

I'm not sure if this is the solution, but removing the UUID from the name of the task graph (and subsequently the PTX code), seems to behave stable. I'll be doing some further experimentation myself to see if this still keeps fixing the errors I was getting.

jjfumero self-assigned this Aug 11, 2024

jjfumero added the PTX label Aug 11, 2024

jjfumero added bug Something isn't working compiler labels Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomized failure of PTX JIT Compilation #528

Randomized failure of PTX JIT Compilation #528

PolyRocketMatt commented Aug 11, 2024 •

edited

Loading

jjfumero commented Aug 11, 2024

PolyRocketMatt commented Aug 11, 2024 •

edited

Loading

PolyRocketMatt commented Aug 11, 2024

Randomized failure of PTX JIT Compilation #528

Randomized failure of PTX JIT Compilation #528

Comments

PolyRocketMatt commented Aug 11, 2024 • edited Loading

Describe the bug

How To Reproduce

Expected behavior

Computing system setup (please complete the following information):

Additional context

jjfumero commented Aug 11, 2024

PolyRocketMatt commented Aug 11, 2024 • edited Loading

PolyRocketMatt commented Aug 11, 2024

PolyRocketMatt commented Aug 11, 2024 •

edited

Loading

PolyRocketMatt commented Aug 11, 2024 •

edited

Loading