rfcs: add proposal for collecting and analyzing synthetic data #1999

rjoursler · 2024-07-18T17:20:40Z

Document Link
This RFC proposes new tooling to enable a data pipeline on synthetic data analyze and validate oneDNN performance.

Simonsays095 · 2024-07-18T20:59:49Z

rfcs/20240718-synthetic-benchmark/README.md

+The second modification that will be needed for benchdnn is a new low-fidelity
+batched benchmarking mode. This mode will would only execute each workload once,
+with the expectation that run-to-run variability is being handled by executing
+multiple similar workloads. The benchmarking is expected to be performed as
+follows on GPU, although some tweaks may be required to get performance stability


I think this functionality is currently present in benchdnn via the --fix-times-per-prb knob, we could set it to 1 or some other small value.

The problem with the --fix-times-per-prb knob, is that it is not setup to keep the device saturated with work. This could cause issues with benchmarking if the target device enters some kind power saving mode while oneDNN primitives are being created. To avoid that, benchdnn currently performs a warmup phase for each workload. This warmup should be unnecessary in a batched benchmarking mode and as the goal here is to execute a primitive exactly once, the existing warmup phase would be a significant contributor to benchmarking throughput.

Simonsays095 · 2024-07-18T21:06:28Z

rfcs/20240718-synthetic-benchmark/README.md

+* Automated Diffing: As one of the relevant metrics is a comparison between
+  oneDNN builds, this tool needs a way to execute multiple oneDNN libraries. As
+  a consequence, we can add a testing API for comparing implementations. When
+  implementations match between oneDNN builds, we can often skip benchmarking.
+  As oneDNN primitive creation and dispatching can be quite complicated, this is
+  beneficial for developers as they cannot always predict the effect of an
+  implementation change. In addition, if functionality validation is supported,
+  this provides a much faster way to initially validate correctness by skipping
+  unchanged implementations.


You're suggesting that the tool could be used to compare performance (or correctness) differences between two versions of oneDNN? How would you handle changes to the API (breaking or otherwise) between the versions? It seems to me like this feature would add unnecessary complexity.

How would you handle changes to the API (breaking or otherwise) between the versions?

As things currently stand, this would not be supported. The main intention here is for a fast to run regression test on a PR, not for testing between something like oneDNN releases. The problem, as I see it, is that after an optimization is made, a (most likely small) percentage of workloads are actually changed and we need to validate that optimization. If we can collect performance of 200 changed cases, we would have effective evidence for how the PR is performing in practice. This can be done in under a minute, but only if we can quickly identify what has changed. In addition, by using an automated search, we can remove accidental bias caused by developers restricting the search space to cases they believe are effected, which may not align with the reality.

densamoilov · 2024-07-19T06:25:09Z

rfcs/20240718-synthetic-benchmark/README.md

+ * Option 2: Hash relevant primitive data to generate a UUID
+
+Given that we are looking at generating millions of primitives, and serialized
+primitive data is likely to be in the 10's of kilobytes in size, Option 1


There is an API to query a cache blob ID from primitive descriptors to distinguish different primitives. Would it work here and have you checked how big it is?

I had not considered that API. The biggest issue I see is that primitive descriptors do not contain enough information. For example, consider the recent GEMM fix in commit 25f6896. This only effects the GPU kernel binary generated by the jit:gemm implementation, and so a kernel changed by this commit cannot be caught within the scope of the primitive descriptor.

On the other hand, the previous change in commit e8e05b4 would effect the primitive descriptor, so in theory could be caught by a cache blob ID (with appropriate modifications for this task). I had considered having a mode the filters on primitive descriptor changes, which boils down to a performance/accuracy tradeoff that developers can use based the work performed. I decided to not to include it until we know generating primitives is too slow in practice. While this definitely is a potential issue as we can only generate around ~100 OpenCL kernels per second for the GPU, this shouldn't be an issue with the move to reusable implementations unless searching lots of post-ops combinations becomes important.

rjoursler added the RFC A design document label Jul 18, 2024

rfcs: proposal for collecting synthetic data

ae021b6

rjoursler force-pushed the rjoursle/synthetic_rfc branch from f2734b8 to ae021b6 Compare July 18, 2024 20:51

Simonsays095 reviewed Jul 18, 2024

View reviewed changes

densamoilov reviewed Jul 19, 2024

View reviewed changes

rjoursler force-pushed the rjoursle/synthetic_rfc branch from d4a1b20 to 21cef93 Compare July 29, 2024 17:23

rfcs: synthetic benchmark: add example use case

029767c

rjoursler force-pushed the rjoursle/synthetic_rfc branch from 21cef93 to 029767c Compare July 29, 2024 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfcs: add proposal for collecting and analyzing synthetic data #1999

rfcs: add proposal for collecting and analyzing synthetic data #1999

rjoursler commented Jul 18, 2024

Simonsays095 Jul 18, 2024

rjoursler Jul 18, 2024

Simonsays095 Jul 18, 2024

rjoursler Jul 18, 2024

densamoilov Jul 19, 2024 •

edited

Loading

rjoursler Jul 19, 2024 •

edited

Loading

rfcs: add proposal for collecting and analyzing synthetic data #1999

Are you sure you want to change the base?

rfcs: add proposal for collecting and analyzing synthetic data #1999

Conversation

rjoursler commented Jul 18, 2024

Simonsays095 Jul 18, 2024

Choose a reason for hiding this comment

rjoursler Jul 18, 2024

Choose a reason for hiding this comment

Simonsays095 Jul 18, 2024

Choose a reason for hiding this comment

rjoursler Jul 18, 2024

Choose a reason for hiding this comment

densamoilov Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

rjoursler Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

densamoilov Jul 19, 2024 •

edited

Loading

rjoursler Jul 19, 2024 •

edited

Loading