-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it work with the GPU #28
Comments
Onnx supports many forms of hardware acceleration, including GPUs. For a Java interface on Linux and Windows, there is CUDA/cuDNN and also CUDA/TensorRT. So far I have only tried the former and only on Linux. Apple does not do CUDA, but it has CoreML. A page of documentation on that says "The CoreML EP can be used via the C or C++ APIs currently. Additional support via the Objective-C API is in progress." so that Java might not really be included yet. However, speedup is not necessarily impressive. For the TokenClassifierTimerApp that is part of the scala-transformers project, I measured the values below for 10,000 sentences. I suspect that parallelizing on the GPU may not be possible. There are several reports along those lines: Parallel execution mode is deprecated, Parallel execution mode does not support the CUDA Execution Provider.
This was done on a significantly low-end GPU and a higher end CPU, so maybe it could be faster. Parallelism on the CPU was more effective. One of the GPU crash messages has to do with sequential execution: 2023-05-24 13:02:41.408722743 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Gemm node. Name:'Gemm_1373' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED ; GPU=0 ; hostname=KWA-Aorus-Ubuntu ; expr=cublasCreate(&cublas_handle_); |
It is working with the GPU under Windows as well. This is with an older computer. I have not verified that each of the runs gets the same answer. This one worked even in parallel on the GPU.
|
A regression-type test shows that the CPU and GPU versions get the same results and on the CPU, serial and parallel versions do as well. On the GPU, even when the parallel version doesn't crash, the results are different. Therefore:
|
Thanks @kwalcock ! |
Here is an example of the differences. However, both runs are on the GPU, so there should be no hardware problems, and the differences change between runs so that the values below can't be replicated. It seems like a threading issue.
|
Thanks! Yes, it does seem like the differences indicate a bug... When you run the sequential code multiple times, do you get the same outputs on GPU and CPU on different runs? |
So far, the three other variations have produced the same, consistent results. That's serial on either CPU or GPU and parallel on CPU. |
Thanks! I suggest we disable parallel runs on the GPU if we can then. |
Although I have marked the CPU and GPU serial versions both as good, they are not the same. I have not seen differences in the tags, but the actual vectors that they calculate are different, usually in the 5th or later decimal point. This means that there are probably situations in which they could produce different answers. When writing a paper, one would want to specify some hardware or provide the output files in order to achieve a repeatable or verifiable process. The GPU parallel version is still inconsistent. |
Yes, this is a known issue in deep learning. There is a journal paper on this, if you want to read more. |
No description provided.
The text was updated successfully, but these errors were encountered: