Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SM6.9] Allow native vectors longer than 4 #7143

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

pow2clk
Copy link
Member

@pow2clk pow2clk commented Feb 18, 2025

Remove errors in Sema diagnostics for vectors longer than 4 in 6.9.
Test for failures using long vectors in unspported contexts and for correct codegen in
supported contexts. Verify errors persist in pre-6.9 shader models

The type buffer cache expects a max vector size of 4. By just skipping the cache for longer vectors, we don't overrun and store float7 vectors in the double3 slot or retrieve the double3 in place of float7.

Testing is for acceptance, mangling and basic copying that takes place
at the high level to ensure they are being accepted and recognized
correctly. The intent is not to tully test the passing of data as that
requires enabling vector operations to do properly. This test is used to
verify that these same constructs are disallowed in 6.8 and earlier.

A separate test verifies that disallowed contexts produce the
appropriate errors

Fixes #7117

Remove errors in Sema diagnostics for vectors longer than 4 in 6.9.
Test for failures using long vectors in unspported contexts and for correct codegen in
supported contexts. Verify errors persist in pre-6.9 shader models

The type buffer cache expects a max vector size of 4. By just skipping the cache for longer vectors, we don't overrun and store float7 vectors in the double3 slot or retrieve the double3 in place of float7.

Testing is for acceptance, mangling and basic copying that takes place
at the high level to ensure they are being accepted and recognized
correctly. The intent is not to tully test the passing of data as that
requires enabling vector operations to do properly. This test is used to
verify that these same constructs are disallowed in 6.8 and earlier.

A separate test verifies that disallowed contexts produce the
appropriate errors

Fixes microsoft#7117
Disallow long vectors, and arrays or structs containing long vectors in
cbuffers, entry functions, node records, tessellation patchs, or special intrinsic parameters with
user-defined struct parameters.
@pow2clk pow2clk requested a review from a team as a code owner February 18, 2025 16:20
@llvm-beanz llvm-beanz self-assigned this Feb 18, 2025
TypeSourceInfo *defaultTypeArgValue) {
CXXRecordDecl *hlsl::DeclareTemplateTypeWithHandle(
ASTContext &context, StringRef name, uint8_t templateArgCount,
TypeSourceInfo *defaultTypeArgValue, InheritableAttr *Attr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: Why the difference in signature formatting in this file? See 'DeclareRecordTypeWithHandleAndNoMemberFunctions'. Most look look like they match the formatting you swapped to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are referring to whether the return type is on the same line or not? The answer is that right or wrong, we succumb to clang-format as even with its inconsistent foibles, it is preferable to chaos. It's the same argument that can lead to tyranny in other contexts, but for code formatting, it's an acceptable sacrifice with only occasional irritations.

clang-format will move arguments or return types seemingly arbitrarily to make the line fit within 80 characters. I support the 80 characters, but I wish it would let me decide whether moving return or argument is more readable. We literally cannot submit unless clang-format is happy though and I have bigger concerns.

}

if (IsHLSLVecType(qt)) {
if (GetHLSLVecSize(qt) > 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4

Would this be better defined as a constant somewhere?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should reduce the number of places we put the magic value. Can we make this a language option though? Then we can query the Language Options for something like maxSupportedVectorSize()?

@@ -7843,6 +7841,8 @@ def err_hlsl_load_from_mesh_out_arrays: Error<
"output arrays of a mesh shader can not be read from">;
def err_hlsl_out_indices_array_incorrect_access: Error<
"a vector in out indices array must be accessed as a whole">;
def err_hlsl_unsupported_long_vector: Error<
"Vectors of over 4 elements in %0 are not supported">;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the element count could be a format argument too so that we can error on vectors too large for the large-vectors proposal too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did forgot about checking for vectors over 1024. The argument this has is meant to indicate the specific environment where such vectors aren't allowed, so I think I'll add a new error for the general prohibition. I don't mind adding the size as a param to this though it will almost certainly always be 4.

@@ -128,6 +128,8 @@ unsigned CaculateInitListArraySizeForHLSL(clang::Sema *sema,
const clang::InitListExpr *InitList,
const clang::QualType EltTy);

bool HasLongVecs(const clang::QualType &qt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IIUC, this is really checking if the type passed in contains a sub-object of long vector type?

Suggested change
bool HasLongVecs(const clang::QualType &qt);
bool ContainsLongVecSubObject(const clang::QualType &);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spend too much time considering this name. With other changes, I see this taking a max length param which makes the "long" in "longvec" subjective, but I'll name it something similar to your suggestion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should take a maximum length parameter, isn't a "long" vector anything over 4?

@@ -1131,6 +1135,9 @@ hlsl::DeclareConstantBufferViewType(clang::ASTContext &context, bool bTBuf) {
typeDeclBuilder.addField(
"h", context.UnsignedIntTy); // Add an 'h' field to hold the handle.

typeDeclBuilder.getRecordDecl()->addAttr(
HLSLCBufferAttr::CreateImplicit(context));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate this cleanup. Thank you for doing this.

const auto *SM =
hlsl::ShaderModel::GetByName(m_sema->getLangOpts().HLSLProfile.c_str());
if (!sintValue.isStrictlyPositive() ||
(sintValue.getLimitedValue() > 4 && !SM->IsSM69Plus())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were not requiring SM 6.9 for long vectors. Am I misreading this or misremembering?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meeting Monday the 10th, I brought up the issue and was told to do what I felt was right. I feel it's right to tie long vectors to native vectors in support matrices because it doesn't take too long a vector before the sequence of extractions and insertions becomes really cumbersome, slowing compile time and ballooning the DXIL size. As we discussed today, I'll add more rationale in the spec.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that seems like a reasonable rationale. Separately maybe we should consider if supporting shorter long vectors in old shader models is reasonable. I was looking at some compute shaders last week that bent over backwards to create 8-element vector types which might benefit from a smaller relaxing of vector size limitations.

const RecordDecl *recordDecl = recordType->getDecl();
if (recordDecl->isInvalidDecl())
return false;
RecordDecl::field_iterator begin = recordDecl->field_begin();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about base classes?

Comment on lines 11913 to 11915
if (qt.isNull()) {
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if (qt.isNull()) {
return false;
}
if (qt.isNull())
return false;

const ArrayType *arrayType = qt->getAsArrayTypeUnsafe();
return HasLongVecs(arrayType->getElementType());
} else if (qt->isStructureOrClassType()) {
const RecordType *recordType = qt->getAs<RecordType>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially avoid doing the record decl traversal at all if we take a bit from DefinitionData and cache it as we add members and base classes to the decl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting idea. I didn't like the recursion and throwing in base classes just adds to the overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is requiring a lot more additional code than I am inclined to put in here. In particular, various template parameters are not instantiated until after the checks are done now, so the DefintionData is set way too late. We have some forcing of instantiation, but only if the parameter is a template itself. @llvm-beanz Will you still consider approving the change without this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised by this, DefinitionData tends to get updated as members get added into records, so I would have expected it to be sufficiently easy to update this in the addMember function.

I don't consider this a blocker if it is more complicated than I expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly the ConstantBufer usage where the template parameter is already weird and doesn't get addressed in the usual order of things. Other usages might be easier, but that was the one I started with.

Copy link
Member Author

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! Followup commit soon.

TypeSourceInfo *defaultTypeArgValue) {
CXXRecordDecl *hlsl::DeclareTemplateTypeWithHandle(
ASTContext &context, StringRef name, uint8_t templateArgCount,
TypeSourceInfo *defaultTypeArgValue, InheritableAttr *Attr) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are referring to whether the return type is on the same line or not? The answer is that right or wrong, we succumb to clang-format as even with its inconsistent foibles, it is preferable to chaos. It's the same argument that can lead to tyranny in other contexts, but for code formatting, it's an acceptable sacrifice with only occasional irritations.

clang-format will move arguments or return types seemingly arbitrarily to make the line fit within 80 characters. I support the 80 characters, but I wish it would let me decide whether moving return or argument is more readable. We literally cannot submit unless clang-format is happy though and I have bigger concerns.

const auto *SM =
hlsl::ShaderModel::GetByName(m_sema->getLangOpts().HLSLProfile.c_str());
if (!sintValue.isStrictlyPositive() ||
(sintValue.getLimitedValue() > 4 && !SM->IsSM69Plus())) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meeting Monday the 10th, I brought up the issue and was told to do what I felt was right. I feel it's right to tie long vectors to native vectors in support matrices because it doesn't take too long a vector before the sequence of extractions and insertions becomes really cumbersome, slowing compile time and ballooning the DXIL size. As we discussed today, I'll add more rationale in the spec.

@@ -7843,6 +7841,8 @@ def err_hlsl_load_from_mesh_out_arrays: Error<
"output arrays of a mesh shader can not be read from">;
def err_hlsl_out_indices_array_incorrect_access: Error<
"a vector in out indices array must be accessed as a whole">;
def err_hlsl_unsupported_long_vector: Error<
"Vectors of over 4 elements in %0 are not supported">;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did forgot about checking for vectors over 1024. The argument this has is meant to indicate the specific environment where such vectors aren't allowed, so I think I'll add a new error for the general prohibition. I don't mind adding the size as a param to this though it will almost certainly always be 4.

@@ -128,6 +128,8 @@ unsigned CaculateInitListArraySizeForHLSL(clang::Sema *sema,
const clang::InitListExpr *InitList,
const clang::QualType EltTy);

bool HasLongVecs(const clang::QualType &qt);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spend too much time considering this name. With other changes, I see this taking a max length param which makes the "long" in "longvec" subjective, but I'll name it something similar to your suggestion.

const ArrayType *arrayType = qt->getAsArrayTypeUnsafe();
return HasLongVecs(arrayType->getElementType());
} else if (qt->isStructureOrClassType()) {
const RecordType *recordType = qt->getAs<RecordType>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting idea. I didn't like the recursion and throwing in base classes just adds to the overhead.

@@ -0,0 +1,132 @@
// RUN: %dxc -DTYPE=float -DNUM=7 -T ps_6_9 -verify %s
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to self to add tests for vectors longer than 1024 as well as structs with base classes containing long vectors in all the forbidden locations.

pow2clk added 5 commits March 3, 2025 09:40
Expand resource attribute to all resource types by adding reskind and
resclass arguments indicating the specific resource type. Change detection
in HlslTypes to use these attribute arguments. Similarly add vertex
number arguments to output stream attribute and a boolean indicator of
input or output for tessellation patches.

Add geomstream attr to detect those objects

Use attribute to detect tesselation patches
Removes template arg counts and startswith stirngs to identify
tesslations patches and distinguish them from multisampled textures
Add setting for max vec size.

Determine long vector presence using DefinitionData bit?
OR
Rename testing for long vecs function?

Add attribute for geometry streams, produce and test errors for long vectors there.

Add and test errors for > 1024 element vectors.

Add vector size to error messages

good test changes
Go for consistent test filename formatting. most LLVM tests have dashes,
so dashes it is. Remove redundant sm68 test
Expand existing tests to different target and contexts. Add thorough
testing for geometry streams and tessellation patches.

Add toolong vector test. Verify that vectors that are over the maximum
for 6.9 fail.

Add subobjects and template classes to tests. These are unfortunately
disabled because the code to make them work causes other tests to fail.
Copy link
Contributor

github-actions bot commented Mar 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

pow2clk and others added 7 commits March 3, 2025 13:00
Use RequireCompleteType to force specialization of templates encountered
in global and other scopes where finding long vectors is necessary where
possible. This populates the definitiondata which contains the base
class chain needed to detect when a base class has disqualifying long
vectors. It was also needed to detect when dependent types in a template
class result in long vectors.

Work graph node types didn't check their base classes for failures. This
affects base classes with longvectors that have sub classes used for
node objects which should fail for having long vector members.

Respond to feedback about iterating through fields in clunky manner
which got left out of the last reviewer feedback response
I guess it was about time. Should simplify some things later as well as at present and it was too easy to not do. Specifically, I was going to need to add another string check to the template instantiation code to identify longvectors. This is cleaner.

Incidentally convert another feedback texture string check to use attribs.

Incidentally resort the recently-added attribs to not break up the node shader attribs.
Vector types can be cached in a 2D array that has a column for lenghts 1-4. This uses the added contant to indicate the length and for the checks that confirm it isn't exceeded.
By setting the bit when the vector template is instantiated and then propagating it when members, be they standard members or base classes, the bit will be set correctly for any struct or struct-like type. For arrays, the arrays are pealed away in a utility function to get at the elements.

Decided to separate the check for completeness from the check for long vectors. Even though the latter almost always requires the former, they are separate concepts and embedding the first in the second would be unexpected
Output Streams, Tessellation patches, and global variables should be complete when receiving other correctness checks. If they cannot be made complete, they should produce an error. This was omitted for various of these including non-template globals, which was fine, but it meant that redundant errors were produced for templates, but not standard globals likely just because that was what was tested. This removes that distinction and adds testing for all of the above to the existing incomplete-type.hlsl test.
self->Diag(pPatchFnDecl->getLocation(),
diag::err_hlsl_unsupported_long_vector)
<< DXIL::kDefaultMaxVectorLength
<< "patch constant function return type";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, why can't long vectors be used in patch functions? I tried to find a reference in the spec and didn't see anything. Am I missing something? Just trying to connect some dots to help learning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't allow long vectors in any input or output signature elements for entry functions, as this would greatly complicate signature element handling/packing and require a design that makes sense for them.

Copy link
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished with code review part. Haven't thoroughly reviewed all the tests yet.

def err_hlsl_unsupported_long_vector: Error<
"Vectors of over %0 elements in %1 are not supported">;
def err_hlsl_vector_too_long: Error<
"Vectors of over %0 elements in are not supported">;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between err_hlsl_unsupported_long_vector and err_hlsl_vector_too_long? This one looks the same, but with a missing parameter in between words - elements in are not supported. Also, I think err_hlsl_vector_too_long is unused?

// expected-note@+4 {{forward declaration of 'S'}} expected-note@+4 {{forward declaration of 'S'}} expected-note@+4 {{forward declaration of 'S'}}
// expected-note@+3 {{forward declaration of 'S'}} expected-note@+3 {{forward declaration of 'S'}} expected-note@+3 {{forward declaration of 'S'}}
// expected-note@+2 {{forward declaration of 'S'}} expected-note@+2 {{forward declaration of 'S'}} expected-note@+2 {{forward declaration of 'S'}}
// expected-note@+1 {{forward declaration of 'S'}} expected-note@+1 {{forward declaration of 'S'}} expected-note@+1 {{forward declaration of 'S'}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we might have much better ways to deal with this sort of duplicated note thing. Here are a few options, depending on which ones actually exist in the DXC version of the clang codebase:

  • Add a marker to the target line, like // #1, then refer to the marker from the place that generated the error and note, like: expected-note@#1.
    • This feature might be too new for the DXC codebase though.
  • Add a number after the expected note to indicate the count of expected identical diagnostic notes: expected-note@+1 24 {{...}}.
    • I'm pretty sure we have this feature in the DXC codebase.
    • You can also specify inexact values, like 0+, 1+, or ranges like 8-24


def HLSLStreamOutput : InheritableAttr {
let Spellings = []; // No spellings!
let Args = [UnsignedArgument<"Vertices">];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused about the "Vertices" argument at first, then I saw it was used to determine the stream type between point, line, and triangle. I guess it's fine, but wish there was a better name, like "PrimVertices" perhaps, to indicate that it indicates the primitive type by the number of vertices.

@@ -7519,8 +7519,8 @@ def err_hlsl_half_load_store: Error<
"LoadHalf and StoreHalf are not supported for min precision mode">;
def err_hlsl_interfaces_cannot_inherit: Error<
"interfaces cannot inherit from other types">;
def err_hlsl_invalid_range_1_4: Error<
"invalid value, valid range is between 1 and 4 inclusive">;
def err_hlsl_invalid_range_1_plus: Error<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, I'd prefer:

Suggested change
def err_hlsl_invalid_range_1_plus: Error<
def err_hlsl_invalid_range_1_to_max: Error<

@@ -7853,6 +7851,10 @@ def err_hlsl_load_from_mesh_out_arrays: Error<
"output arrays of a mesh shader can not be read from">;
def err_hlsl_out_indices_array_incorrect_access: Error<
"a vector in out indices array must be accessed as a whole">;
def err_hlsl_unsupported_long_vector: Error<
"Vectors of over %0 elements in %1 are not supported">;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't imagine how this can ever be used with anything but 4 supplied for %0. Wouldn't it just be simpler to write that value into the diagnostic string? That way you don't have to always supply DXIL::kDefaultMaxVectorLength every time you use this diagnostic.

Comment on lines +802 to +805
if (Attr && (Attr->getResKind() ==
(unsigned)DXIL::ResourceKind::FeedbackTexture2D ||
Attr->getResKind() ==
(unsigned)DXIL::ResourceKind::FeedbackTexture2DArray)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Attr && (Attr->getResKind() ==
(unsigned)DXIL::ResourceKind::FeedbackTexture2D ||
Attr->getResKind() ==
(unsigned)DXIL::ResourceKind::FeedbackTexture2DArray)) {
if (Attr &&
DXIL::IsFeedbackTexture((DXIL::ResourceKind)Attr->getResKind())) {

Comment on lines +945 to +949
// I don't think this is necessary.
CXXRecordDecl *Decl = vectorSpecializationType->getAsCXXRecordDecl();
if (GetHLSLVecSize(vectorSpecializationType) > DXIL::kDefaultMaxVectorLength)
Decl->setHasHLSLLongVector();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is not necessary because Sema::InstantiateClass will set the property. Shouldn't this code just be deleted?

Comment on lines +1150 to +1152
typeDeclBuilder.getRecordDecl()->addAttr(HLSLResourceAttr::CreateImplicit(
context, (unsigned)DXIL::ResourceKind::CBuffer,
(unsigned)DXIL::ResourceClass::CBuffer));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be changing this depending on bTBuf to ResourceClass::SRV and ResourceKind::TBuffer. That has implications for locations where we need to match TextureBuffer type by this attribute as well.

Suggested change
typeDeclBuilder.getRecordDecl()->addAttr(HLSLResourceAttr::CreateImplicit(
context, (unsigned)DXIL::ResourceKind::CBuffer,
(unsigned)DXIL::ResourceClass::CBuffer));
typeDeclBuilder.getRecordDecl()->addAttr(HLSLResourceAttr::CreateImplicit(
context,
(unsigned)(bTBuf ? DXIL::ResourceKind::TBuffer
: DXIL::ResourceKind::CBuffer),
(unsigned)(bTBuf ? DXIL::ResourceClass::SRV
: DXIL::ResourceClass::CBuffer)));

self->Diag(pPatchFnDecl->getLocation(),
diag::err_hlsl_unsupported_long_vector)
<< DXIL::kDefaultMaxVectorLength
<< "patch constant function return type";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't allow long vectors in any input or output signature elements for entry functions, as this would greatly complicate signature element handling/packing and require a design that makes sense for them.

@@ -16268,6 +16383,18 @@ void DiagnoseEntry(Sema &S, FunctionDecl *FD) {
return;
}

// Check general parameter characteristics
// Would be nice to check for resources here as they crash the compiler now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue for this?

@damyanp damyanp assigned pow2clk and unassigned tex3d Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

[SM6.9] Support long vector declaration, casting, and assignment
4 participants