You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running w/ cupy 9.3.0 and cudatoolkit 11.2.2 runs I occasionally see the following error when running Parla on the TSQR demo app.
It could be something wrong with my env, but logging it here. While its a rare error for each Parla instance, it happens fairly often on larger MPI runs.
Unexpected exception in Task handling Traceback (most recent call last): File ".../miniconda3/lib/python3.8/site-packages/parla/task_runtime.py", line 515, in run component.initialize_thread() File ".../miniconda3/lib/python3.8/site-packages/parla/cuda.py", line 250, in initialize_thread cupy.asnumpy(cupy.sqrt(a)) File ".../miniconda3/lib/python3.8/site-packages/cupy/__init__.py", line 773, in asnumpy return a.get(stream=stream, order=order) File "cupy/_core/core.pyx", line 1567, in cupy._core.core.ndarray.get File "cupy/_core/core.pyx", line 1636, in cupy._core.core.ndarray.get File "cupy/_core/core.pyx", line 1644, in cupy._core.core.ndarray.get File "cupy/cuda/memory.pyx", line 551, in cupy.cuda.memory.MemoryPointer.copy_to_host_async File "cupy_backends/cuda/api/runtime.pyx", line 693, in cupy_backends.cuda.api.runtime.memcpyAsync File "cupy_backends/cuda/api/runtime.pyx", line 273, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidValue: invalid argument Unexpected exception in Task handling
I don't see how 'a' would fail to exist after a sync but the gpu->cpu copy is failing.
The text was updated successfully, but these errors were encountered:
wlruys
changed the title
Threaded Cupy Warmpu/Initialization Error
Threaded Cupy Warmup/Initialization Error
Aug 28, 2021
When running w/ cupy 9.3.0 and cudatoolkit 11.2.2 runs I occasionally see the following error when running Parla on the TSQR demo app.
It could be something wrong with my env, but logging it here. While its a rare error for each Parla instance, it happens fairly often on larger MPI runs.
Unexpected exception in Task handling Traceback (most recent call last): File ".../miniconda3/lib/python3.8/site-packages/parla/task_runtime.py", line 515, in run component.initialize_thread() File ".../miniconda3/lib/python3.8/site-packages/parla/cuda.py", line 250, in initialize_thread cupy.asnumpy(cupy.sqrt(a)) File ".../miniconda3/lib/python3.8/site-packages/cupy/__init__.py", line 773, in asnumpy return a.get(stream=stream, order=order) File "cupy/_core/core.pyx", line 1567, in cupy._core.core.ndarray.get File "cupy/_core/core.pyx", line 1636, in cupy._core.core.ndarray.get File "cupy/_core/core.pyx", line 1644, in cupy._core.core.ndarray.get File "cupy/cuda/memory.pyx", line 551, in cupy.cuda.memory.MemoryPointer.copy_to_host_async File "cupy_backends/cuda/api/runtime.pyx", line 693, in cupy_backends.cuda.api.runtime.memcpyAsync File "cupy_backends/cuda/api/runtime.pyx", line 273, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidValue: invalid argument Unexpected exception in Task handling
I don't see how 'a' would fail to exist after a sync but the gpu->cpu copy is failing.
The text was updated successfully, but these errors were encountered: