-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for 2D operations and C2R/R2C FFT #305
Conversation
Co-authored-by: Bram Veenboer <[email protected]>
include/cudawrappers/cu.hpp
Outdated
void zero(size_t size) { memset(static_cast<unsigned char>(0), size); } | ||
|
||
const void *parameter() | ||
const // used to construct parameter list for launchKernel(); | ||
{ | ||
return &_obj; | ||
} | ||
void *parameter_copy() { return reinterpret_cast<void *>(_obj); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't seem to be used, remove it?
void *parameter_copy() { return reinterpret_cast<void *>(_obj); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was intentionally added. I will demonstrate its use through a small example. In some libraries, such as dedisp, a significant amount of manual pointer arithmetic is (unfortunately) performed. Using the dereference operator (e.g., cu::Device::operation*
) would be quite beneficial in such cases. However, it's not allowed, as memory in these situations isn’t allocated with CU_POINTER_ATTRIBUTE_IS_MANAGED
, resulting in a runtime exception.
Option 1 (Dereference Operator)
cu::DeviceMemory idata(...); // idata points to a device address
unsigned int i = 1;
unsigned int nsamp_padded = 2345;
unsigned int nchan_fft_batch = 5678;
cu::DeviceMemory idata(static_cast<cufftReal*>(d_data_x_nu) + i * nsamp_padded * nchan_fft_batch);
// --> THROWS: "Cannot return memory of type CU_MEMORYTYPE_DEVICE as pointer."
Option 2 (cu::DeviceMemory
Constructor with Offset)
Alternatively, the cu::DeviceMemory(DeviceMemory&, size_t offset, size_t size)
constructor allows us to wrap the arithmetic within the DeviceMemory object itself. Here, offset is given in bytes, so we need to multiply by the word size (e.g., sizeof(cufftReal)). Even so, I’d prefer to do pointer arithmetic with a cufftReal pointer rather than manually determining the byte offset.
cu::DeviceMemory idata(d_data_x_nu, (i * nsamp_padded * nchan_fft_batch) * sizeof(cufftReal), d_data_x_nu.size());
Option 3 (cu::DeviceMemory::parameter
)
As an alternative, we could use cu::DeviceMemory::parameter()
. However, it has its limitations. This function returns a const pointer to the _obj pointer stored in cu::DeviceMemory, not the actual underlying device pointer, so it can’t be used directly. Personally, I would rather avoid using this type of extensive casting in production.
const void* const* param = reinterpret_cast<const void* const*>(d_data_x_nu.parameter());
cu::DeviceMemory idata(reinterpret_cast<CUdeviceptr>(
static_cast<const cufftReal*>(*param) +
i * nsamp_padded * nchan_fft_batch));
Option 4 (cu::DeviceMemory::parameter_copy
)
A final approach is to return the actual device pointer by introducing a parameter_copy()
method.
cu::DeviceMemory idata(static_cast<const cufftReal*>(d_data_x_nu.parameter_copy()) + i * nsamp_padded * nchan_fft_batch);
What is your view on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 4 looks much cleaner than option 3, but I really think that the name parameter_copy
should be changed. It shouldn't contain an underscore (see also this page) but more importantly, copy
is somewhat misleading. I think object
or pointer
would be better names.
Another thing that comes to mind is that a feature like this may be useful for the other classes in cu::
, for instance when combining cudawrappers code with 'plain' CUDA code. So this function could even be added to cu::Wrapper
?
@loostrum, @john-romein, what is your view here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please stay as close as possible to the CUDA driver API. If you need to do arithmetic on a device memory pointer, simply cast the cu::DeviceMemory
object to a CUdeviceptr
(a cast operator that is inherited from cu::Wrapper<CUdeviceptr>
) and do the arithmetic on CUdeviceptr
(which is some int type that allows arithmetic to be performed on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not considered that approach, but it does work. The only downside is that for non-char sized types, it requires two casts. For example:
float* x = reinterpret_cast<float*>(static_cast<CUdeviceptr>(some_dev_object));
// Move x pointer
float* y = x + 3;
@john-romein @csbnw Would we rather opt for this approach than introducing a new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is likely to work in practice, this is not strictly conforming. It will not work in the (unlikely) case that the size of a host memory pointer is smaller than the size of a device memory pointer. The proper solution would be static_cast<CUdeviceptr>(some_dev_object) + 3 * sizeof(float)
Apart from that, performing two casts is something that I do not consider as a problem, as it is very clear what happens this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with performing arithmetic on a CUdeviceptr
is that in HIP the underlying type is hipDeviceptr_t
, which is internally declared as void*
. Since performing arithmetic on a void pointer is not allowed, this results in the following compiler warning.
error: arithmetic on a pointer to void
549 | static_cast<CUdeviceptr>(some_dev_object) + (3 * sizeof(float))
In contrast, CUDA allows this because the underlying type is unsigned long long
. This issue will likely also exist in other applications that use pointer arithmetic and are not yet compiled with HIP.
Currently, it is only allowed use the deference operator in case of managed memory. This commit relaxes this requirement a bit by also allowing access to non-managed memory. This enables casts like this: cu::DeviceMemory(1024) mem; float* ptr = static_cast<float*>(mem); therefore avoiding an intermediate cast to CUdeviceptr.
I came up with a solution that relaxes the requirements of Beforecu::DeviceMemory(1024) dev_mem;
cu::DeviceMemory(reinterpret_cast<CUdeviceptr>(static_cast<float*>(static_cast<CUdeviceptr>(dev_mem) + 2)); Aftercu::DeviceMemory(1024) dev_mem;
cu::DeviceMemory(reinterpret_cast<CUdeviceptr>(static_cast<float*>(dev_mem) + 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor suggestions, other than that I think it's ready to be merged.
Co-authored-by: Bram Veenboer <[email protected]>
Co-authored-by: Bram Veenboer <[email protected]>
for more information, see https://pre-commit.ci
Description
This pull request includes support for various 2D operations, namely
memset2D
,memcpyHtoD2DAsync
,memcpyDtoH2DAsync,
and introducesFFT1DRealToComplex
andFFT1DComplexToReal
classes for FFT transformations. Finally, thecu::Device::getOrdinal()
method is introduced to retrieve the current device ID.Related tests are included in this pull request.
Related issues:
None
Instructions to review the pull request