Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDAIEFoldDmaWaits] Fold DMA wait operations across multi columns #986

Merged
merged 11 commits into from
Dec 18, 2024

Conversation

Yu-Zhewen
Copy link
Contributor

@Yu-Zhewen Yu-Zhewen commented Dec 12, 2024

This is an enhancement for #962.

In the previous PR, DMA waits on the same connection (and the same tile) could be folded, exploiting the fact that each DMA channel has a queue size of 4.

In this PR, DMA waits across multiple columns can also be folded, provided their corresponding row, channel, and direction are the same. This optimization leverages the ability to specify colNum in TCTSync, where the range [col, col + colNum) can be addressed.

The numbers in the following table show the instruction size in words.

Test (MxKxN) No Folding Only Fold by Connection Only Fold by Column Fold Both
512x4096x512 1228 1132 1120 1096
512x512x4096 820 772 748 736
4096x512x512 4628 4244 4220 4124

using DmaBdIdKey = std::pair<AMDAIE::TileOp, AMDAIE::ConnectionOp>;
using DmaBdIdPair = std::pair<DmaBdIdKey, uint32_t>;

FailureOr<DmaBdIdPair> retriveDmaBdIdPair(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FailureOr<DmaBdIdPair> retriveDmaBdIdPair(
FailureOr<DmaBdIdPair> retrieveDmaBdIdPair(

const Operation *batchParentOp,
const DenseSet<AMDAIE::ConnectionOp> &connectionOps,
const DenseMap<DmaBdIdKey, DenseSet<uint32_t>> &dmaBdIdsMap,
AMDAIE::NpuHalfDmaCpyNdOp currHalfDmaCpyNdOp, DmaBdIdPair &currBdIdPair) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AMDAIE::NpuHalfDmaCpyNdOp currHalfDmaCpyNdOp, DmaBdIdPair &currBdIdPair) {
AMDAIE::NpuHalfDmaCpyNdOp currHalfDmaCpyNdOp, DmaBdIdPair currBdIdPair) {

Non-expensive pairs are preferably passed by value as it's more simple, avoids reference issues (e.g. dangling, currBdIdPair shouldn't be changed ) and is as fast/faster if the contents are small.

Copy link
Collaborator

@jtuyls jtuyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yu-Zhewen Yu-Zhewen merged commit 2080473 into main Dec 18, 2024
8 checks passed
@Yu-Zhewen Yu-Zhewen deleted the zhewen_batch_sync branch December 18, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants