-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configurable Pwelch scaling and improve performance #897
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nel that performs better than CUB when in memory FFT bin powers are {batches, nfft}
/build |
cliffburdick
reviewed
Mar 4, 2025
/build |
cliffburdick
reviewed
Mar 4, 2025
cliffburdick
reviewed
Mar 4, 2025
cliffburdick
approved these changes
Mar 4, 2025
/build |
static_asserts for signal type
/build |
cliffburdick
reviewed
Mar 4, 2025
cliffburdick
approved these changes
Mar 4, 2025
/build |
2 similar comments
/build |
/build |
/build |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses concerns in issue #887:
Configurable scaling is supported. Density and Spectrum modes, similar to Matlab or scipy.signal, as well as fused dB modes frequently needed for spectrum-analyzer-like plotting
Custom reduction kernel is used instead of MatX sum() using CUB DeviceSegmentedReduce (DSR).
Three reasons why the custom reduction kernel is preferred:
In the nominal case where the input signal is a memory backed complex-valued tensor and the output power spectrum is a memory backed real-valued tensor, the intermediate overlapping signal is a 2D tensor {batches, nfft}. This is an optimal memory layout for the FFTs, but a very suboptimal layout for CUB DSR which would the permuted layout {nfft, batches} as CUB DSR uses one block of threads to read all the 'batches' of a single fft bin. To improve CUB performance, we'd either need to have the batched FFTs directly write to a permuted tensor (about 4x slower for the scenario considered) or permute the intermediate tensor after the FFT (extra round trip through global memory).
CUB currently needs begin/end indexes for each data segment, the generation of which takes around 50% longer than the custom reduction (ignoring the permutation issue)
The custom reduction allows for configurable scaling, such as fusing a dB conversion.