Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add managed memory #211

Merged
merged 13 commits into from
Sep 21, 2023
Merged

Add managed memory #211

merged 13 commits into from
Sep 21, 2023

Conversation

csbnw
Copy link
Contributor

@csbnw csbnw commented Sep 19, 2023

Description

There are many ways to mix and match different types of CUDA memory. As preparation for this PR, an (performance) evaluation of different strategies was conducted, the findings are summarized below:

  1. The use of cu::HostMemory and cu::DeviceMemory with explicit memory copies provides the best performance, but is also the most verbose.
  2. Mapped memory (already supported in cudawrappers) allows addressing host memory on the GPU, which allows for simple code, but it performs rather poorly.
  3. Managed memory (also known as unified memory), provides better performance and given the user the option to control data movement by prefetching to either host or device memory. When like option 1, performance is only slightly lower than.
    With the changes in this MR, support for option 3 is added.

To be specific, the cu::DeviceMemory constructor can now also be used to allocate managed memory by passing CU_MEMORYTYPE_UNIFIED as CUmemorytype argument and optionally also some flags. This change is transparent to pre-existing code by having CU_MEMORYTYPE_DEVICE as the default CUmemorytype and default flags = 0.
Additionally, cu::Stream::memPrefetchAsync is added to expose the cuMemPrefetchAsync function.

The new functionality is tested in new sections of the test_vector_add test.

Related issues:

Instructions to review the pull request

  • Check that CHANGELOG.md has been updated if necessary

@csbnw csbnw self-assigned this Sep 19, 2023
@john-romein
Copy link
Contributor

A few minor things:

  • in the constructor, the assignment to manager can be done outside (after) the if-then block
  • there should be a checkCudaCall() around the call to cuMemPrefetchAsync()

@csbnw
Copy link
Contributor Author

csbnw commented Sep 20, 2023

@john-romein,
I applied your suggestions 👍

tests/test_vector_add.cpp Outdated Show resolved Hide resolved
include/cudawrappers/cu.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@matmanc matmanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice :)

@csbnw csbnw merged commit 2a12b83 into main Sep 21, 2023
@csbnw csbnw deleted the add-managed-memory branch September 21, 2023 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants