Skip to content

[AIE2P] Fix align requirement for wide vector load/store and split 32 aligned load/store instead of scalarizing them #321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 13, 2025

Conversation

khallouh
Copy link
Collaborator

No description provided.

@andcarminati
Copy link
Collaborator

Nice, I think we need the same for AIE2.

@konstantinschwarz
Copy link
Collaborator

Nice, I think we need the same for AIE2.

AIE2 doesn't have 512-bit load/store instructions, so those will be broken down into 256-bit accesses which only require 32-byte alignment. I think AIE2 is fine

@andcarminati
Copy link
Collaborator

Nice, I think we need the same for AIE2.

Nice, I think we need the same for AIE2.

AIE2 doesn't have 512-bit load/store instructions, so those will be broken down into 256-bit accesses which only require 32-byte alignment. I think AIE2 is fine

The problem there is a bit different. For AIE2 we are considering alignments of 32 bits (probably a misunderstanding of how alignment is expressed during legalization.).

@khallouh khallouh changed the title [AIE2P] Fix legalizer alignment requirement for 512-bit vector load/store [AIE2P] Fix align requirement for wide vector load/store and split 32-byte aligned load store instead of scalarizing them Jan 31, 2025
@khallouh khallouh force-pushed the hamza.fix.alignment branch 2 times, most recently from 1f59bfd to eae60df Compare January 31, 2025 14:26
@khallouh khallouh marked this pull request as ready for review January 31, 2025 14:27
@khallouh
Copy link
Collaborator Author

khallouh commented Jan 31, 2025

This now splits the large loads/stores when the alignment is 256 bits.

@khallouh khallouh force-pushed the hamza.fix.alignment branch from eae60df to 4c82b3f Compare January 31, 2025 14:37
@khallouh khallouh changed the title [AIE2P] Fix align requirement for wide vector load/store and split 32-byte aligned load store instead of scalarizing them [AIE2P] Fix align requirement for wide vector load/store and split 32/16-byte aligned load/store instead of scalarizing them Jan 31, 2025
@khallouh khallouh marked this pull request as draft January 31, 2025 15:33
@khallouh khallouh force-pushed the hamza.fix.alignment branch from 4c82b3f to d9810c1 Compare February 5, 2025 14:53
@khallouh khallouh marked this pull request as ready for review February 5, 2025 14:53
@khallouh khallouh changed the title [AIE2P] Fix align requirement for wide vector load/store and split 32/16-byte aligned load/store instead of scalarizing them [AIE2P] Fix align requirement for wide vector load/store and split 32 aligned load/store instead of scalarizing them Feb 5, 2025
; AIE2P: liveins: $p0
; AIE2P-NEXT: {{ $}}
; AIE2P-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $p0
; AIE2P-NEXT: [[LOAD:%[0-9]+]]:_(<32 x s8>) = G_LOAD [[COPY]](p0) :: (load (<32 x s8>))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we don't need the alignment anymore, but it seems wrong that it disappears. It's still 32 byte aligned, isn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it always disappears post legalizer, even if we just use .legal(), not sure why

const uint64_t Alignment = Query.MMODescrs[0].AlignInBits;
const unsigned SplitFactor = MemoryType.getSizeInBits() / Alignment;
return std::make_pair(
0, LLT::fixed_vector(MemoryType.getNumElements() / SplitFactor,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return std::make_pair(0, MemoryType.divide(SplitFactor));

Copy link
Collaborator

@andcarminati andcarminati Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, if some unaligned scalar load appears (let's say s512), divide will give us a valid scalar type, not replacing the type by a vector.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point but I don't think fewerElements works with scalars. I can assert that we have a vector here. Otherwise, we will simply scalarize as we already do and currently, it will fail during the process because we can't legalize the resulting merge and unmerge opcodes.

[=](const LegalityQuery &Query) {
const LLT &MemoryType = Query.MMODescrs[0].MemoryTy;
const uint64_t Alignment = Query.MMODescrs[0].AlignInBits;
if (MemoryType.getSizeInBits() >= 512 && Alignment == 256)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return (MemoryType.getSizeInBits() >= 512 && Alignment == 256);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider MemoryType.getSizeInBits() % 256 == 0, or at least assert?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assert because we have other types above and it will break for them but we can return false instead

@khallouh khallouh force-pushed the hamza.fix.alignment branch from d9810c1 to a4bb6fe Compare February 13, 2025 11:15
Copy link
Collaborator

@andcarminati andcarminati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@khallouh khallouh merged commit 93aebf1 into aie-public Feb 13, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants