Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small optimizations on iggen buffer handling #317

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fknorr
Copy link
Contributor

@fknorr fknorr commented Dec 4, 2024

perform_task_buffer_accesses updates last-writers twice to gracefully handle overlapping writes, which is an edge case. This PR quickly checks if overlapping writes are present, and sticks to a single update if there are not. By transposing the loop nest from chunk -> bid to bid -> chunk, we can also save avoid constructing another unordered_map.

Results are not looking too impressive in the benchmark report, but I do get a consistent 4% speedup for RSim room_small, which is scheduler bound on gpuc3.

@fknorr fknorr added this to the 0.7.0 milestone Dec 4, 2024
@fknorr fknorr requested review from psalz, PeterTh and GagaLP December 4, 2024 11:20
@fknorr fknorr self-assigned this Dec 4, 2024
@celerity celerity deleted a comment from github-actions bot Dec 4, 2024
Copy link

github-actions bot commented Dec 4, 2024

Check-perf-impact results: (c8fb992b35322012b54e351345fdf71a)

✔️ No significant performance change in the microbenchmark set. You are good to go!

Relative execution time per category: (mean of relative medians)

  • command-graph : 1.01x
  • graph-nodes : 1.03x
  • grid : 1.01x
  • instruction-graph : 0.97x
  • scheduler : 0.98x
  • system : 0.98x
  • task-graph : 1.02x

@coveralls
Copy link

coveralls commented Dec 4, 2024

Pull Request Test Coverage Report for Build 12158729604

Details

  • 36 of 36 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.007%) to 94.933%

Totals Coverage Status
Change from base Build 12158562061: 0.007%
Covered Lines: 7135
Relevant Lines: 7253

💛 - Coveralls

Copy link
Contributor

@GagaLP GagaLP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done.
LGTM! 👍

Copy link
Member

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I've suggested two comment changes that I've added for my understanding while investigating how to implement replicated writes!

}
}

// 3. Clear tracking structures for all regions that are being written to. We gracefully handle overlapping writes by treating the set of all conflicting
// writers as last writers of an allocation.
// 3. To gracefully handle overlapping writes, clear tracking structures for all regions that are being written to, so we can treat the set of all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// 3. To gracefully handle overlapping writes, clear tracking structures for all regions that are being written to, so we can treat the set of all
// 3. To gracefully handle overlapping writes if detection is disabled via the error policy, clear tracking structures for all regions that are being written to, so we can treat the set of all

}
buffer.track_original_write(concurrent_writes[i], command_instructions[i], concurrent_chunks[i].memory_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
buffer.track_original_write(concurrent_writes[i], command_instructions[i], concurrent_chunks[i].memory_id);
// On the buffer level, there is no special handling of overlapping writes: The last chunk that writes to a given
// region becomes its last writer, and subsequent reads on other devices will require a copy.
buffer.track_original_write(concurrent_writes[i], command_instructions[i], concurrent_chunks[i].memory_id);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants