Added support for 2D operations and C2R/R2C FFT #305

wvbbreu · 2024-10-31T13:21:15Z

Description

This pull request includes support for various 2D operations, namely memset2D, memcpyHtoD2DAsync, memcpyDtoH2DAsync, and introduces FFT1DRealToComplex and FFT1DComplexToReal classes for FFT transformations. Finally, the cu::Device::getOrdinal() method is introduced to retrieve the current device ID.
Related tests are included in this pull request.

Related issues:

None

Instructions to review the pull request

Check that CHANGELOG.md has been updated if necessary

include/cudawrappers/cufft.hpp

include/cudawrappers/nvtx.hpp

tests/test_cu.cpp

tests/test_cufft.cpp

CHANGELOG.md

Co-authored-by: Bram Veenboer <[email protected]>

CHANGELOG.md

include/cudawrappers/cu.hpp

Co-authored-by: Bram Veenboer <[email protected]>

csbnw · 2024-11-05T08:13:44Z

include/cudawrappers/cu.hpp

  void zero(size_t size) { memset(static_cast<unsigned char>(0), size); }

  const void *parameter()
      const  // used to construct parameter list for launchKernel();
  {
    return &_obj;
  }
+  void *parameter_copy() { return reinterpret_cast<void *>(_obj); }


This function doesn't seem to be used, remove it?

Suggested change

void *parameter_copy() { return reinterpret_cast<void *>(_obj); }

This function was intentionally added. I will demonstrate its use through a small example. In some libraries, such as dedisp, a significant amount of manual pointer arithmetic is (unfortunately) performed. Using the dereference operator (e.g., cu::Device::operation*) would be quite beneficial in such cases. However, it's not allowed, as memory in these situations isn’t allocated with CU_POINTER_ATTRIBUTE_IS_MANAGED, resulting in a runtime exception.

Option 1 (Dereference Operator)

cu::DeviceMemory idata(...); // idata points to a device address unsigned int i = 1; unsigned int nsamp_padded = 2345; unsigned int nchan_fft_batch = 5678; cu::DeviceMemory idata(static_cast<cufftReal*>(d_data_x_nu) + i * nsamp_padded * nchan_fft_batch); // --> THROWS: "Cannot return memory of type CU_MEMORYTYPE_DEVICE as pointer."

Option 2 (cu::DeviceMemory Constructor with Offset)

Alternatively, the cu::DeviceMemory(DeviceMemory&, size_t offset, size_t size) constructor allows us to wrap the arithmetic within the DeviceMemory object itself. Here, offset is given in bytes, so we need to multiply by the word size (e.g., sizeof(cufftReal)). Even so, I’d prefer to do pointer arithmetic with a cufftReal pointer rather than manually determining the byte offset.

cu::DeviceMemory idata(d_data_x_nu, (i * nsamp_padded * nchan_fft_batch) * sizeof(cufftReal), d_data_x_nu.size());

Option 3 (cu::DeviceMemory::parameter)

As an alternative, we could use cu::DeviceMemory::parameter(). However, it has its limitations. This function returns a const pointer to the _obj pointer stored in cu::DeviceMemory, not the actual underlying device pointer, so it can’t be used directly. Personally, I would rather avoid using this type of extensive casting in production.

const void* const* param = reinterpret_cast<const void* const*>(d_data_x_nu.parameter()); cu::DeviceMemory idata(reinterpret_cast<CUdeviceptr>( static_cast<const cufftReal*>(*param) + i * nsamp_padded * nchan_fft_batch));

Option 4 (cu::DeviceMemory::parameter_copy)

A final approach is to return the actual device pointer by introducing a parameter_copy() method.

cu::DeviceMemory idata(static_cast<const cufftReal*>(d_data_x_nu.parameter_copy()) + i * nsamp_padded * nchan_fft_batch);

What is your view on this?

Option 4 looks much cleaner than option 3, but I really think that the name parameter_copy should be changed. It shouldn't contain an underscore (see also this page) but more importantly, copy is somewhat misleading. I think object or pointer would be better names.

Another thing that comes to mind is that a feature like this may be useful for the other classes in cu::, for instance when combining cudawrappers code with 'plain' CUDA code. So this function could even be added to cu::Wrapper?

@loostrum, @john-romein, what is your view here?

Please stay as close as possible to the CUDA driver API. If you need to do arithmetic on a device memory pointer, simply cast the cu::DeviceMemory object to a CUdeviceptr (a cast operator that is inherited from cu::Wrapper<CUdeviceptr>) and do the arithmetic on CUdeviceptr (which is some int type that allows arithmetic to be performed on).

I had not considered that approach, but it does work. The only downside is that for non-char sized types, it requires two casts. For example:

float* x = reinterpret_cast<float*>(static_cast<CUdeviceptr>(some_dev_object)); // Move x pointer float* y = x + 3;

@john-romein @csbnw Would we rather opt for this approach than introducing a new method?

Although this is likely to work in practice, this is not strictly conforming. It will not work in the (unlikely) case that the size of a host memory pointer is smaller than the size of a device memory pointer. The proper solution would be static_cast<CUdeviceptr>(some_dev_object) + 3 * sizeof(float)
Apart from that, performing two casts is something that I do not consider as a problem, as it is very clear what happens this way.

The problem with performing arithmetic on a CUdeviceptr is that in HIP the underlying type is hipDeviceptr_t, which is internally declared as void*. Since performing arithmetic on a void pointer is not allowed, this results in the following compiler warning.

error: arithmetic on a pointer to void 549 | static_cast<CUdeviceptr>(some_dev_object) + (3 * sizeof(float))

In contrast, CUDA allows this because the underlying type is unsigned long long. This issue will likely also exist in other applications that use pointer arithmetic and are not yet compiled with HIP.

Currently, it is only allowed use the deference operator in case of managed memory. This commit relaxes this requirement a bit by also allowing access to non-managed memory. This enables casts like this: cu::DeviceMemory(1024) mem; float* ptr = static_cast<float*>(mem); therefore avoiding an intermediate cast to CUdeviceptr.

wvbbreu · 2024-11-06T14:33:24Z

I came up with a solution that relaxes the requirements of cu::DeviceMemory::operator* a bit. Instead of only allowing de-referencing on managed memory, access to non-managed/device memory is also allowed (see example). This change has no impact on the existing cudawrappers API. Access to unallocated memory will still be handled through a sanity check (checkPointerAccess).

Before

cu::DeviceMemory(1024) dev_mem;
cu::DeviceMemory(reinterpret_cast<CUdeviceptr>(static_cast<float*>(static_cast<CUdeviceptr>(dev_mem) + 2));

After

cu::DeviceMemory(1024) dev_mem;
cu::DeviceMemory(reinterpret_cast<CUdeviceptr>(static_cast<float*>(dev_mem) + 2));

csbnw

Two minor suggestions, other than that I think it's ready to be merged.

include/cudawrappers/cu.hpp

Co-authored-by: Bram Veenboer <[email protected]>

for more information, see https://pre-commit.ci

wvbbreu added 8 commits October 22, 2024 16:00

Added 2D memcpy, memset and zeroing functionality.

74559b8

WIP: configurable FFT for real-to-complex (and vice versa) conversions.

0d06efd

Added tests for new cudawrapper::cu methods.

a4f390e

Simplified FFT1DRealToComplex interface and added relevant tests.

165d2a6

AMD tests passing.

122ccd9

Disable nvtx for HIP.

e8eca2f

Updated changelog.

4272ce2

Fixed formatting.

d712071

wvbbreu requested a review from csbnw October 31, 2024 13:21

wvbbreu self-assigned this Oct 31, 2024

wvbbreu added the enhancement New feature or request label Oct 31, 2024

csbnw reviewed Nov 1, 2024

View reviewed changes

wvbbreu and others added 5 commits November 1, 2024 13:10

Formatting tests/test_cu.cpp

ef908e2

Co-authored-by: Bram Veenboer <[email protected]>

Processed MR feedback.

9702b89

Fixed typo

382b075

Added cu::Device::getOrdinal() to retrieve device identifier.

be64ac2

Added test Device.getOrdinal.

4e29959

csbnw reviewed Nov 5, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

csbnw reviewed Nov 5, 2024

View reviewed changes

include/cudawrappers/cu.hpp Outdated Show resolved Hide resolved

wvbbreu and others added 2 commits November 5, 2024 08:57

Update CHANGELOG.md

27a8cb2

Co-authored-by: Bram Veenboer <[email protected]>

Renamed functions and removed getTotalConstMem()

8437f14

wvbbreu requested a review from csbnw November 5, 2024 08:03

csbnw reviewed Nov 5, 2024

View reviewed changes

wvbbreu added 2 commits November 6, 2024 15:09

Fixed typo.

1ee98d8

wvbbreu requested a review from csbnw November 6, 2024 14:33

wvbbreu added 3 commits November 7, 2024 09:35

Moved checkPointerAccess method to Wrapper<T>

7e83d21

Fixed Zenodo link redirect test fail

7b68f9a

Removed unnecessarily file include.

79045ef

csbnw reviewed Nov 7, 2024

View reviewed changes

include/cudawrappers/cu.hpp Outdated Show resolved Hide resolved

include/cudawrappers/cu.hpp Outdated Show resolved Hide resolved

wvbbreu and others added 3 commits November 7, 2024 11:09

Update include/cudawrappers/cu.hpp

3fe604d

Co-authored-by: Bram Veenboer <[email protected]>

Update include/cudawrappers/cu.hpp

5d98ff5

Co-authored-by: Bram Veenboer <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

b8c3566

for more information, see https://pre-commit.ci

wvbbreu requested a review from csbnw November 7, 2024 10:23

csbnw approved these changes Nov 7, 2024

View reviewed changes

wvbbreu merged commit fd7f102 into main Nov 7, 2024
6 checks passed

wvbbreu deleted the add-2dsupport branch November 7, 2024 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for 2D operations and C2R/R2C FFT #305

Added support for 2D operations and C2R/R2C FFT #305

wvbbreu commented Oct 31, 2024 •

edited

Loading

csbnw Nov 5, 2024

wvbbreu Nov 5, 2024 •

edited

Loading

csbnw Nov 5, 2024

john-romein Nov 5, 2024 •

edited

Loading

wvbbreu Nov 5, 2024 •

edited

Loading

john-romein Nov 5, 2024

wvbbreu Nov 5, 2024 •

edited

Loading

wvbbreu commented Nov 6, 2024 •

edited

Loading

csbnw left a comment

Added support for 2D operations and C2R/R2C FFT #305

Added support for 2D operations and C2R/R2C FFT #305

Conversation

wvbbreu commented Oct 31, 2024 • edited Loading

csbnw Nov 5, 2024

Choose a reason for hiding this comment

wvbbreu Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Option 1 (Dereference Operator)

Option 2 (cu::DeviceMemory Constructor with Offset)

Option 3 (cu::DeviceMemory::parameter)

Option 4 (cu::DeviceMemory::parameter_copy)

csbnw Nov 5, 2024

Choose a reason for hiding this comment

john-romein Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

wvbbreu Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

john-romein Nov 5, 2024

Choose a reason for hiding this comment

wvbbreu Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

wvbbreu commented Nov 6, 2024 • edited Loading

Before

After

csbnw left a comment

Choose a reason for hiding this comment

wvbbreu commented Oct 31, 2024 •

edited

Loading

wvbbreu Nov 5, 2024 •

edited

Loading

Option 2 (`cu::DeviceMemory` Constructor with Offset)

Option 3 (`cu::DeviceMemory::parameter`)

Option 4 (`cu::DeviceMemory::parameter_copy`)

john-romein Nov 5, 2024 •

edited

Loading

wvbbreu Nov 5, 2024 •

edited

Loading

wvbbreu Nov 5, 2024 •

edited

Loading

wvbbreu commented Nov 6, 2024 •

edited

Loading