Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm 5.3 and later support asynchronous memory operations #2197

Merged

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Nov 28, 2023

No description provided.

@fwyzard fwyzard force-pushed the allow_hipMallocAsync_for_ROCm_52 branch from 7c84c01 to fd8cbca Compare November 28, 2023 10:17
psychocoderHPC
psychocoderHPC previously approved these changes Nov 29, 2023
@psychocoderHPC
Copy link
Member

The CI is currently failing. I see in the HIP release notes for 5.6 hipMallocAsync to return success for zero size allocation to match hipMalloc.
Not sure if we catch this in the CI.

@SimeonEhrig
Copy link
Member

The CI is currently failing. I see in the HIP release notes for 5.6 hipMallocAsync to return success for zero size allocation to match hipMalloc. Not sure if we catch this in the CI.

Be aware, that we also does not test HIP 5.6.

HIPCC: ["5.0", "5.1", "5.2", "5.3", "5.4", "5.5"],

@psychocoderHPC
Copy link
Member

psychocoderHPC commented Nov 29, 2023

The CI is currently failing. I see in the HIP release notes for 5.6 hipMallocAsync to return success for zero size allocation to match hipMalloc. Not sure if we catch this in the CI.

One of the error in the CI is:

: /builds/hzdr/crp/alpaka/test/unit/mem/buf/src/BufTest.cpp:105: FAILED:
34: due to unexpected exception with message:
34:   /builds/hzdr/crp/alpaka/include/alpaka/mem/buf/BufUniformCudaHipRt.hpp(305)
34:   'TApi::mallocAsync( &memPtr, static_cast<std::size_t>(width) * sizeof(TElem),
34:   queue.getNativeHandle())' returned error  : 'hipErrorInvalidValue': 'invalid
34:   argument'!
34: 

And line 105 in BufTest is memBufAsyncZeroSizeTest.

IMO async allocation should be activated for HIP for 5.6+ to avoid applying a workaround for the older versions.

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 29, 2023

The CI is currently failing.

I know, I'll have a look in the spare time from the ongoing hackathon...

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2023

OK, I found the issue:

  • hipMallocAsync fails for an allocation of 0 bytes
  • hipFreeAsync fails for a deallocation of a null pointer

I'll add protections for those cases in the API.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2023

LUMI is still on HIP 5.2, so I'd rather not enable hipMallocAsync only for HIP >= 5.6.x.

SimeonEhrig
SimeonEhrig previously approved these changes Dec 14, 2023
@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2023

OK, we're making progress, now only ROCm 5.2 fails...

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2023

The hang in ROCm 5.2 can be reproduced with

hipStream_t queue;
HIP_CHECK(hipStreamCreate(&queue));

void* buffer = nullptr;
HIP_CHECK(hipMallocAsync(&buffer, size, queue));
HIP_CHECK(hipFreeAsync(buffer, queue));
HIP_CHECK(hipStreamSynchronize(queue));
HIP_CHECK(hipStreamDestroy(queue));  // this hangs

Using hipFree instead of hipFreeAsync avoids the hang - but I do not know if it's safe to do so.

The alternative is to raise the minimum version that supports hipMallocAsync to ROCm 5.3.x.

@fwyzard fwyzard dismissed stale reviews from SimeonEhrig and bernhardmgruber via fb8b559 December 15, 2023 13:49
@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 15, 2023

The last workaround seems to work with ROCm 5.2.3 on LUMI, and all alpaka tests pass.

}
# elif HIP_VERSION >= 50'200'000
// in ROCm 5.2.x, hipFreeAsync makes a subsequent hipStreamDestroy hang
return ::hipFree(devPtr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is not correct. You should wait until all work in the stream is finished and then you can call hipFree, else a kernel that is using the pointer can still be active.
Another way is to enqueue a host task into the stream which is executing hipFree. Then it will be asynchronous too.
The second option which IMO is nicer because it keeps the asynchronous execution of this function requires that the workaround is moved to the place where api::freeAsync is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is not correct. You should wait until all work in the stream is finished and then you can call hipFree, else a kernel that is using the pointer can still be active.

Actually, hipFree itself is blocking, so the behaviour is correct even if suboptimal.
(I tested and it seems the case even if the stream is created with the hipStreamNonBlocking flag)

Another way is to enqueue a host task into the stream which is executing hipFree. Then it will be asynchronous too.

I agree that this would be nicer, but maybe it's not worth the extra complexity :-/

I asked about this to the AMD people at LUMI and their answer is that

The async versions of memory allocation were only officially supported in ROCm 5.3. My opinion is that you shouldn't be doing any efforts around this for ROCm 5.2.3 and move

to a newer version of ROCm.

I'll update the PR accordingly.

@fwyzard fwyzard force-pushed the allow_hipMallocAsync_for_ROCm_52 branch from fb8b559 to 05e5483 Compare December 16, 2023 11:44
Include protections for 0-sized allocations in ROCm 5.3-5.5 .
@fwyzard fwyzard force-pushed the allow_hipMallocAsync_for_ROCm_52 branch from 05e5483 to 4a9a35a Compare December 16, 2023 11:45
@fwyzard fwyzard changed the title ROCm 5.2 and later support asynchronous memory operations ROCm 5.3 and later support asynchronous memory operations Dec 16, 2023
@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 19, 2023

Only XCode debug are failing, this finally looks good.

@psychocoderHPC psychocoderHPC merged commit ccb2c35 into alpaka-group:develop Dec 19, 2023
21 of 23 checks passed
@fwyzard fwyzard deleted the allow_hipMallocAsync_for_ROCm_52 branch August 22, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants