-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm 5.3 and later support asynchronous memory operations #2197
ROCm 5.3 and later support asynchronous memory operations #2197
Conversation
7c84c01
to
fd8cbca
Compare
16c471b
to
76a20bd
Compare
The CI is currently failing. I see in the HIP release notes for 5.6 |
Be aware, that we also does not test HIP 5.6. alpaka/script/job_generator/versions.py Line 30 in 4d1a221
|
One of the error in the CI is:
And line 105 in BufTest is IMO async allocation should be activated for HIP for 5.6+ to avoid applying a workaround for the older versions. |
I know, I'll have a look in the spare time from the ongoing hackathon... |
OK, I found the issue:
I'll add protections for those cases in the API. |
LUMI is still on HIP 5.2, so I'd rather not enable |
6993d3b
to
39aa327
Compare
OK, we're making progress, now only ROCm 5.2 fails... |
The hang in ROCm 5.2 can be reproduced with hipStream_t queue;
HIP_CHECK(hipStreamCreate(&queue));
void* buffer = nullptr;
HIP_CHECK(hipMallocAsync(&buffer, size, queue));
HIP_CHECK(hipFreeAsync(buffer, queue));
HIP_CHECK(hipStreamSynchronize(queue));
HIP_CHECK(hipStreamDestroy(queue)); // this hangs Using The alternative is to raise the minimum version that supports |
fb8b559
The last workaround seems to work with ROCm 5.2.3 on LUMI, and all alpaka tests pass. |
include/alpaka/core/ApiHipRt.hpp
Outdated
} | ||
# elif HIP_VERSION >= 50'200'000 | ||
// in ROCm 5.2.x, hipFreeAsync makes a subsequent hipStreamDestroy hang | ||
return ::hipFree(devPtr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is not correct. You should wait until all work in the stream is finished and then you can call hipFree, else a kernel that is using the pointer can still be active.
Another way is to enqueue a host task into the stream which is executing hipFree
. Then it will be asynchronous too.
The second option which IMO is nicer because it keeps the asynchronous execution of this function requires that the workaround is moved to the place where api::freeAsync
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is not correct. You should wait until all work in the stream is finished and then you can call hipFree, else a kernel that is using the pointer can still be active.
Actually, hipFree
itself is blocking, so the behaviour is correct even if suboptimal.
(I tested and it seems the case even if the stream is created with the hipStreamNonBlocking
flag)
Another way is to enqueue a host task into the stream which is executing
hipFree
. Then it will be asynchronous too.
I agree that this would be nicer, but maybe it's not worth the extra complexity :-/
I asked about this to the AMD people at LUMI and their answer is that
The async versions of memory allocation were only officially supported in ROCm 5.3. My opinion is that you shouldn't be doing any efforts around this for ROCm 5.2.3 and move
to a newer version of ROCm.
I'll update the PR accordingly.
fb8b559
to
05e5483
Compare
Include protections for 0-sized allocations in ROCm 5.3-5.5 .
05e5483
to
4a9a35a
Compare
Only XCode debug are failing, this finally looks good. |
No description provided.