Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the latest RAJA and UMPIRE #706

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from
Draft

Use the latest RAJA and UMPIRE #706

wants to merge 1 commit into from

Conversation

nychiang
Copy link
Collaborator

@nychiang nychiang commented Feb 1, 2025

Update HiOp to use the latest versions for RAJA and UMPIRE.

Note that HIOP_USE_RESOLVE is switched off since it fails ALL the cuda-related tests, i.e., even the testVectors fails. For example, function vectorSetToZero fails with the error message

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: cudaErrorInvalidDeviceFunction: invalid device function

However, when I switch off HIOP_USE_RESOLVE, nothing wrong except for two unit tests under NlpSparseRajaEx2.
It is expected as these two unit tests need to use ReSolve (see here)

Therefore this PR also corrects the Cmake file. Now these two examples are created only if ReSolve is given.

@pelesh Have you ever seen this kind of error? Does ReSolve require some specific cusolver packages?

@pelesh
Copy link
Collaborator

pelesh commented Feb 1, 2025

@pelesh Have you ever seen this kind of error? Does ReSolve require some specific cusolver packages?

I haven't seen this error before and Re::Solve (HiOp or standalone version) is well tested with CUDA 11.7.

Off the top of my head, this error is common when one tries to run GPU code out of queue. It is also caused when you don't build library with matching compute capability (e.g. build with 70 and run on Pascal). Does your scheduler allocates a matching device? I'll try to reproduce this on my end.

@nychiang
Copy link
Collaborator Author

nychiang commented Feb 2, 2025

@pelesh Have you ever seen this kind of error? Does ReSolve require some specific cusolver packages?

I was in an interactive session on a debugging node. In the same session, I can: 1)compile and run HiOp without Re::Solve; 2) then failed to run it with Re::Solve; 3) and then run it without Re::Solve. Maybe we can setup a quick teams meeting in which I can show you the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants