Skip to content

Commit

Permalink
update readme to include ET replay introduction.
Browse files Browse the repository at this point in the history
Summary: Update the readme to reflect ET replay component.

Reviewed By: louisfeng

Differential Revision: D47613998

fbshipit-source-id: fd9ad8413fc0f5906d0c633e9111613dbbbfe791
  • Loading branch information
Wenyin Fu authored and facebook-github-bot committed Jul 20, 2023
1 parent fe3de4a commit 5d0bc8e
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Our initial release of PARAM benchmarks focuses on AI training and comprises of:
1. Communication: PyTorch based collective benchmarks across arbitrary message sizes, effectiveness of compute-communication overlap, and DLRM communication patterns in fwd/bwd pass
2. Compute: PyTorch based GEMM, embedding lookup, and linear layer
3. DLRM: tracks the `ext_dist` branch of DRLM benchmark use Facebook's DLRM benchmark (https://github.com/facebookresearch/dlrm). In short, PARAM fully relies on DLRM benchmark for end-to-end workload evaluation; with additional extensions as required for scale-out AI training platforms.
4. PyTorch Execution Trace (ET) replay based tests: The PyTorch ET capturing capabilities, which have recently been introduced, allow for the recording of runtime information of a model at the operator level. This capability enables the creation of replay-based benchmarks (https://dl.acm.org/doi/abs/10.1145/3579371.3589072) to accurately reproduce the original performance.


In essence, PARAM bridges the gap between stand-alone C++ benchmarks and PyTorch/Tensorflow based application benchmarks. This enables us to gain deep insights into the inner workings of the system architecture as well as identify framework-level overheads by stressing all subcomponents of a system.

Expand Down

0 comments on commit 5d0bc8e

Please sign in to comment.