Fix command graph generation bugs around reductions #223

fknorr · 2023-10-25T11:55:19Z

Implementing IDAG reductions uncovered two bugs around reductions in distributed command graph generation:

For an all-to-all reduction, we emit push commands for the partial results from our node followed by a reduction command. The reduction command logically overwrites the buffer contents, so it must anti-depend on these pushes. This bug does not appear to break reductions in the current runtime, most likely because the final reduction result is only committed to device memory once it's being read in the next consumer task.
We elide reduction commands if there only is a single producer chunk. If the result is subsequently read by multiple nodes, we generate push commands on the producer, but failed to generate the corresponding await-pushes on the consumer node.

I've added unit tests for both cases.

github-actions · 2023-10-25T11:56:01Z

Check-perf-impact results: (b003273516680ef3e6ca0110b3678f5e)

❓ No new benchmark data submitted. ❓
Please re-run the microbenchmarks and include the results if your commit could potentially affect performance.

psalz

Good stuff!

src/distributed_graph_generator.cc

github-actions

clang-tidy made some suggestions

test/graph_gen_reduction_tests.cc

PeterTh

LGTM

test/graph_gen_reduction_tests.cc

…r nodes

github-actions · 2023-11-02T13:55:00Z

Check-perf-impact results: (dee217934841bf19e612d83adf4e7dfb)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: 4 individual benchmarks affected
🚀 Significant speedup (<0.80x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / jacobi topology

Relative execution time per category: (mean of relative medians)

command-graph : 1.00x
graph-nodes : 0.99x
grid : 1.02x
scheduler : 1.02x
system : 1.15x ⚠️
task-graph : 1.01x

github-actions · 2023-11-02T17:19:51Z

Check-perf-impact results: (d21ecac39af892ab1c227e6d0ae10ebf)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology, benchmark independent task pattern with N tasks - 100 / task generation
🚀 Significant speedup (<0.80x) in some microbenchmark results: benchmark stencil pattern with N time steps - 50 / iterations

Relative execution time per category: (mean of relative medians)

command-graph : 1.00x
graph-nodes : 0.96x
grid : 1.01x
scheduler : 1.03x
system : 1.06x
task-graph : 0.99x

fknorr · 2023-11-02T17:27:49Z

I re-ran the benchmarks because there seemed to be significant jitter in the system benchmarks, but it appears that "benchmark independent task pattern with 100 tasks" is indeed slowing down, even though the change should not affect code without reductions.

fknorr · 2023-11-08T09:54:55Z

@PeterTh discovered that results of our multi-threaded benchmarks, especially system benchmarks, are not as stable and reliable as we thought, and our benchmarking setup needs some work.

Aside from extremely obscure reason in instruction cache, OS scheduling or similar, I'm going to trust the command-graph benchmarks which measure this change in isolation and do not show a change in performance.

fknorr added this to the 0.5.0 milestone Oct 25, 2023

fknorr requested review from psalz and PeterTh October 25, 2023 11:55

fknorr self-assigned this Oct 25, 2023

psalz approved these changes Oct 25, 2023

View reviewed changes

src/distributed_graph_generator.cc Outdated Show resolved Hide resolved

src/distributed_graph_generator.cc Show resolved Hide resolved

github-actions bot reviewed Oct 25, 2023

View reviewed changes

test/graph_gen_reduction_tests.cc Outdated Show resolved Hide resolved

PeterTh approved these changes Nov 2, 2023

View reviewed changes

test/graph_gen_reduction_tests.cc Outdated Show resolved Hide resolved

fknorr force-pushed the fix-cdag-reductions branch from 0b1bf48 to 4c61835 Compare November 2, 2023 13:23

fknorr added 2 commits November 2, 2023 14:54

Fix: Reduction commands must anti-depend on their partial-result pushes

3853c92

Fix: Results from single-node reductions must be await-pushed on othe…

8183de9

…r nodes

fknorr added a commit that referenced this pull request Nov 2, 2023

Update benchmark results for #223

2cfd1c2

fknorr force-pushed the fix-cdag-reductions branch from 4c61835 to 2cfd1c2 Compare November 2, 2023 13:54

Update benchmark results for #223

2f6687a

fknorr force-pushed the fix-cdag-reductions branch from 2cfd1c2 to 2f6687a Compare November 2, 2023 17:19

fknorr merged commit b2ee29d into master Nov 8, 2023
28 checks passed

psalz deleted the fix-cdag-reductions branch November 8, 2023 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix command graph generation bugs around reductions #223

Fix command graph generation bugs around reductions #223

fknorr commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

psalz left a comment

github-actions bot left a comment

PeterTh left a comment

github-actions bot commented Nov 2, 2023

github-actions bot commented Nov 2, 2023

fknorr commented Nov 2, 2023

fknorr commented Nov 8, 2023

Fix command graph generation bugs around reductions #223

Fix command graph generation bugs around reductions #223

Conversation

fknorr commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

psalz left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

PeterTh left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 2, 2023

github-actions bot commented Nov 2, 2023

fknorr commented Nov 2, 2023

fknorr commented Nov 8, 2023