-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subgroup Level Bluestein Algorithm #145
base: main
Are you sure you want to change the base?
Changes from 3 commits
cef2ad0
8cc05db
5237a37
5292f41
a3aa5d6
5b89d10
b0d1661
5afe834
8a83350
eb3291f
011e780
8c3b40b
20e3c78
ae929d3
708893c
594d224
5f1ab4d
99d8cfb
243c793
3a77953
cfd2ab8
d0e705d
823b84f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,7 +30,7 @@ | |
#include <vector> | ||
|
||
#include "common/exceptions.hpp" | ||
#include "common/subgroup.hpp" | ||
#include "common/subgroup_ct.hpp" | ||
#include "defines.hpp" | ||
#include "enums.hpp" | ||
#include "specialization_constant.hpp" | ||
|
@@ -148,28 +148,28 @@ class committed_descriptor_impl { | |
std::vector<kernel_data_struct> transpose_kernels; | ||
std::shared_ptr<IdxGlobal> factors_and_scan; | ||
detail::level level; | ||
// The size of DFT transform which will be computed for the given dimension | ||
// The size of DFT transform which will be computed for the given dimension. Will be different from the | ||
// committed_length when the Bluestein / Rader algorithms are used | ||
std::size_t length; | ||
// The committed length for the particular dimension, will be different from length in the case of bluestein and | ||
// radar fft algorithms | ||
// The committed length (as in the user specified length) for the particular dimension | ||
std::size_t committed_length; | ||
Idx used_sg_size; | ||
Idx num_batches_in_l2; | ||
Idx num_factors; | ||
bool is_prime; | ||
detail::fft_algorithm algorithm; | ||
|
||
dimension_struct(std::vector<kernel_data_struct> forward_kernels, std::vector<kernel_data_struct> backward_kernels, | ||
detail::level level, std::size_t length, std::size_t committed_length, Idx used_sg_size, | ||
bool is_prime) | ||
detail::fft_algorithm algorithm) | ||
: forward_kernels(std::move(forward_kernels)), | ||
backward_kernels(std::move(backward_kernels)), | ||
level(level), | ||
length(length), | ||
committed_length(committed_length), | ||
used_sg_size(used_sg_size), | ||
is_prime(is_prime) { | ||
if (is_prime && level != detail::level::SUBGROUP) { | ||
throw unsupported_configuration("Prime sizes that not fit in the subgroup implementation are not supported"); | ||
algorithm(algorithm) { | ||
if (algorithm == detail::fft_algorithm::BLUESTEIN && level != detail::level::SUBGROUP) { | ||
throw unsupported_configuration("Prime sizes that do not fit in the subgroup implementation are not supported"); | ||
} | ||
} | ||
}; | ||
|
@@ -215,9 +215,9 @@ class committed_descriptor_impl { | |
* set of kernels that need to be JIT compiled. | ||
* | ||
* @tparam SubgroupSize size of the subgroup | ||
* @param fft_size The size of the dft transform | ||
* @return implementation to use for the dimension and a vector of tuples of: implementation to use for a kernel, | ||
* vector of kernel ids, factors | ||
* @param fft_size The size for which kernel needs to be prepared | ||
* @return implementation to use for the dimension and a vector of tuples of: implementation to use for a kernel, the | ||
* size of the fft for which the implementation was prepared and the vector of kernel ids, factors | ||
*/ | ||
template <Idx SubgroupSize> | ||
std::tuple<detail::level, std::size_t, kernel_ids_and_metadata_t> prepare_implementation(IdxGlobal fft_size) { | ||
hjabird marked this conversation as resolved.
Show resolved
Hide resolved
t4c1 marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not think the change of parameter type to IdxGlobal makes sense here, as the function needs to cast it back to std::size_t anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its to avoid an additional casts inside prepare implementation, which currently works with IdxGlobal, It returns a size_t because a lot of our calculation pertaining to that field works on size_t, hence returning size_t to avoid casts there There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We use std::size_t in public interface and IdxGlobal in kernels. Somewhere inbetween we need to cast from one type to another. I just think this should happen just once, without any casts back to std::size_t. I am not particularly attached to where that is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The field we used to cast the fft_size to IdxGlobal inside the prepare_implementation here previously presumably to avoid warnings, and hence seeing that I simply made it accept IdxGlobal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But that requires additional casts, which make code harder to read. Make it so that we only cast to IdxGlobal, not back. |
||
|
@@ -595,14 +595,23 @@ class committed_descriptor_impl { | |
set_spec_constants_driver<SubgroupSize>(top_level, prepared_vec, direction::FORWARD, dimension_num); | ||
auto backward_kernels = | ||
set_spec_constants_driver<SubgroupSize>(top_level, prepared_vec, direction::BACKWARD, dimension_num); | ||
detail::fft_algorithm algorithm; | ||
if (fft_size == params.lengths[dimension_num]) { | ||
algorithm = detail::fft_algorithm::COOLEY_TUKEY; | ||
} else if (fft_size > params.lengths[dimension_num]) { | ||
algorithm = detail::fft_algorithm::BLUESTEIN; | ||
} else { | ||
throw internal_error("Invalid FFT size encountered while preparing the implementation"); | ||
} | ||
|
||
if (forward_kernels.has_value() && backward_kernels.has_value()) { | ||
return {forward_kernels.value(), | ||
backward_kernels.value(), | ||
top_level, | ||
fft_size, | ||
params.lengths[dimension_num], | ||
SubgroupSize, | ||
fft_size != params.lengths[dimension_num]}; | ||
algorithm}; | ||
} | ||
} | ||
} | ||
|
@@ -949,7 +958,7 @@ class committed_descriptor_impl { | |
const auto input_layout = detail::get_layout(params, compute_direction); | ||
const auto output_layout = detail::get_layout(params, inv(compute_direction)); | ||
|
||
if (dimensions.back().is_prime) { | ||
if (dimensions.back().algorithm == detail::fft_algorithm::BLUESTEIN) { | ||
if (input_layout == detail::layout::UNPACKED || output_layout == detail::layout::UNPACKED) { | ||
throw unsupported_configuration("Unsupported configuration for prime sized DFTs"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other lenght is also committed, making this variable name confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #145 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think committed_length is confusing. How about
ct_length
(standing for Cooley-Tukey)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
committed_length is not the cooley_tukey length. Committed length is the user provided length (i.e the length of the dimension as in the descriptor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case the other one could be ct_length. We just need something to make it obvious which is which.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cannot be ct_length, we should not be tying it with the name of an algorithm, as it isn't always cooley tukey.
I believe
length
, along with its description above should be suffiicientThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which other algorithm can it be?