Example code producing memory access fault for array size 10248, works for 10238 #510

chopikus · 2024-07-25T23:08:58Z

Describe the bug

Running the example program produces an error for big enough arrays.

Program:

public class App 
{
    public static void parallelInitialization(VectorFloat8 data) {
        for (@Parallel int i = 0; i < data.size(); i++) {
            int j = i * 8;
            data.set(i, new Float8(j, j + 1, j + 2, j + 3, j + 4 , j + 5 , j + 6, j + 7));
        }
    }

    public static void computeSquare(VectorFloat8 data) {
        for (@Parallel int i = 0; i < data.size(); i++) {
            Float8 item = data.get(i);
            Float8 result = Float8.mult(item, item);
            data.set(i, result);
        }
    }

    public static void main( String[] args ) {
        VectorFloat8 array = new VectorFloat8(1024 * 8);
        TaskGraph taskGraph = new TaskGraph("s0")
                .transferToDevice(DataTransferMode.EVERY_EXECUTION, array)
                .task("t0", App::parallelInitialization, array)
                .task("t1", App::computeSquare, array)
                .transferToHost(DataTransferMode.EVERY_EXECUTION, array);

        TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(taskGraph.snapshot());

        // Obtain a device from the list
        TornadoDevice device = TornadoExecutionPlan.getDevice(0, 0);
        executionPlan.withDevice(device);

        // Put in a loop to analyze hotspots with Intel VTune (as a demo)
        for (int i = 0; i < 1000; i++ ) {
            // Execute the application
            executionPlan.execute();
        }
    }
}

Running mvn package and tornado -jar [JARFILE] produces an error:

Memory access fault by GPU node-1 (Agent handle: 0x7fe80076f230) on address 0x7fe614c00000. Reason: Page not present or supervisor privilege.

However if I change the size of array to 1023*8 instead of 1024*8 the error is gone.

How To Reproduce

I put my code into a repository: https://github.com/chopikus/my-tornado-app.

Steps:

git clone https://github.com/chopikus/my-tornado-app.git
cd my-tornado-app
./run.sh

Expected behavior

No errors should be produced

Computing system setup (please complete the following information):

Fedora 40
ROCm runtime version 1.13
Radeon 680M GPU on Ryzen 7 PRO 6850U
tornado --version: version=1.0.7-dev, branch=master, commit=96b3040; Backends installed: opencl
tornado -version: java version "21.0.4" 2024-07-16 LTS; Java(TM) SE Runtime Environment (build 21.0.4+8-LTS-274); Java HotSpot(TM) 64-Bit Server VM (build 21.0.4+8-LTS-274, mixed mode)

Additional context

tornado --devices:

WARNING: Using incubator modules: jdk.incubator.vector

Number of Tornado drivers: 1
Driver: OpenCL
  Total number of OpenCL devices  : 1
  Tornado device=0:0  (DEFAULT)
	OPENCL --  [AMD Accelerated Parallel Processing] -- gfx1035
		Global Memory Size: 4.0 GB
		Local Memory Size: 64.0 KB
		Workgroup Dimensions: 3
		Total Number of Block Threads: [256]
		Max WorkGroup Configuration: [1024, 1024, 1024]
		Device OpenCL C version: OpenCL C 2.0

The text was updated successfully, but these errors were encountered:

jjfumero · 2024-07-26T05:49:20Z

Hi @chopikus . Thank you for the detailed report.

I can reproduce the error also for NVIDIA GPUs. The problem is in the clBuildProgram, once the code is generated. However, running on Intel GPUs with OpenCL works. It also works for the SPIR-V and PTX backends. We will take a look and analyze why NVIDIA and AMD are reporting a clBuildProgram failure.

jjfumero · 2024-07-26T05:51:37Z

For reference, I added this test in this branch: 87e6080

jjfumero · 2024-07-26T13:54:13Z

In general the most tested platforms for TornadoVM are NVIDIA discrete GPUs, and Intel GPUs (ARC and HD Graphics). The most supported backend is OpenCL. However, we are pushing for the SPIR-V. We hope in the future to be this the default one. Regarding the FPGAs, we have tested on Intel Altera FPGAs and AMD/Xilinx FPGAs. My colleague Thanos can give you more details about the current support.

…

On Fri, 26 Jul 2024 at 15:21, Igor Chovpan ***@***.***> wrote: Thank you for a quick response! @jjfumero <https://github.com/jjfumero> Which platform is the most stable for TornadoVM? Does this example work on FPGAs or on some hosting? I really like this project and want to try it out more. — Reply to this email directly, view it on GitHub <#510 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKX2BLX5Z4QXKGKNJBB7S3ZOJEMZAVCNFSM6AAAAABLPOQVTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJSG42TIMBYGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

andrii0lomakin · 2024-07-26T15:00:32Z

Good day.

I hope you excuse me if I add my 5 cents by asking.

Recently, AMD released a new NPU, which will be supported by Xilinx RT and, in turn, will work over OpenCL. If I got it correctly, one API supports OpenCL too, so will not make the SPIR-V default approach narrow down the usage possibilities of TornadoVM?

jjfumero · 2024-07-26T15:10:09Z

Hi @andrii0lomakin ,
OpenCL >= 2.1 can dispatch SPIR-V binary kernels. In fact, TornadoVM currently dispatches SPIR-V with both, OpenCL runtime and Level Zero API from oneAPI. We hope vendors in the future use more SPIR-V. From my view, the way to go is SPIR-V and PTX. However, debugging the compiler gets increasingly complex.

As it is now the vendors/accelerators landscape, it is difficult to deprecate our OpenCL C backend. Just a few examples: FPGA vendors support OpenCL 1.0 - 1.2. Apple supports OpenCL 1.2. Thus, if TornadoVM wants to run also on those platforms, the OpenCL C is still needed. Unless, of course, there are new backends (e.g., for VHDL directly, Apple Metal, etc).

chopikus changed the title ~~Example code not working for array size 1024*8, works for 1023*8~~ Example code producing memory access fault for array size 1024*8, works for 1023*8 Jul 25, 2024

jjfumero assigned jjfumero and gigiblender Jul 26, 2024

jjfumero added bug Something isn't working OpenCL labels Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example code producing memory access fault for array size 10248, works for 10238 #510

Example code producing memory access fault for array size 10248, works for 10238 #510

chopikus commented Jul 25, 2024 •

edited

Loading

jjfumero commented Jul 26, 2024

jjfumero commented Jul 26, 2024

jjfumero commented Jul 26, 2024 via email

andrii0lomakin commented Jul 26, 2024

jjfumero commented Jul 26, 2024

Example code producing memory access fault for array size 1024*8, works for 1023*8 #510

Example code producing memory access fault for array size 1024*8, works for 1023*8 #510

Comments

chopikus commented Jul 25, 2024 • edited Loading

Describe the bug

How To Reproduce

Expected behavior

Computing system setup (please complete the following information):

Additional context

jjfumero commented Jul 26, 2024

jjfumero commented Jul 26, 2024

jjfumero commented Jul 26, 2024 via email

andrii0lomakin commented Jul 26, 2024

jjfumero commented Jul 26, 2024

Example code producing memory access fault for array size 10248, works for 10238 #510

Example code producing memory access fault for array size 10248, works for 10238 #510

chopikus commented Jul 25, 2024 •

edited

Loading