-
Notifications
You must be signed in to change notification settings - Fork 760
[CoopVec] Add pixel shader and multi-layer support to Mul and OuterProduct tests #7437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CoopVec] Add pixel shader and multi-layer support to Mul and OuterProduct tests #7437
Conversation
3c95c62
to
d12ed52
Compare
…mory and improved input vector/matrix test patterns
041328e
to
e70a45f
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
For test code I think its better to fail out at this point. default: Refers to: tools/clang/unittests/HLSLExec/CoopVec.h:176 in f77d76f. [](commit_id = f77d76f, deletion_comment = False) |
Probably better to use VERIFY_FAIL to stop test execution at this point? I'm assuming its not valuable to continue running further. Log::Error will continue execution of the test case, but mark the test as failed. Or was that the intention? Refers to: tools/clang/unittests/HLSLExec/CoopVec.h:139 in f77d76f. [](commit_id = f77d76f, deletion_comment = False) |
Should this abort the test by using a VERIFY_* macro? Same pattern in some of these other helpers as well. Refers to: tools/clang/unittests/HLSLExec/CoopVec.h:82 in f77d76f. [](commit_id = f77d76f, deletion_comment = False) |
float Elt = 0.0f; | ||
|
||
if (IsIntegralDataType(MatrixInterpretation)) | ||
Elt = static_cast<float>(Rnd() & 0x7) - 3.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're for getting a specific range of random numbers. I added comments.
T *Vec = getVector<T>(I); | ||
for (size_t J = 0; J < VectorSize; ++J) | ||
if constexpr (std::is_same_v<T, DirectX::PackedVector::HALF>) { | ||
float Elt = (static_cast<float>(Rnd() & 0x3) - 1.0f) / 2.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
|
||
if (MatrixLayout == D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_ROW_MAJOR) { | ||
ConvertInfo.DestInfo.DestStride = | ||
(static_cast<UINT>(getVectorSize()) * DestEltSize + 15) & ~15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for alignment. Added comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change adds preliminary pixel shader support by wrapping the existing test code in a function, which is called by both compute and pixel shaders. The same test patterns are used for both, mapping threads to input/bias vectors and output buffer offsets. For pixel shaders, an atomic counter is used to implement a poor man's mapping of pixel shader threads to a range of thread IDs.
Multi-layer support is also added, currently limited to two layers. A square matrix is always used for the first layer in a multi-layer config for ease-of-implementation.
The input patterns are now slightly more interesting by generating random input with a generator seeded to a constant value. The range of values is limited to try to lower error that accumulates between the CoopVec GPU implementation and the CPU reference implementation.