Skip to content

Commit

Permalink
find/replace Omniperf to ROCm Compute Profiler
Browse files Browse the repository at this point in the history
Signed-off-by: Peter Park <[email protected]>
  • Loading branch information
peterjunpark committed Oct 3, 2024
1 parent fb210ab commit f5f0bac
Show file tree
Hide file tree
Showing 34 changed files with 291 additions and 291 deletions.
4 changes: 2 additions & 2 deletions docs/conceptual/command-processor.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. meta::
:description: Omniperf performance model: Command processor (CP)
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, command, processor, fetcher, packet processor, CPF, CPC
:description: ROCm Compute Profiler performance model: Command processor (CP)
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, command, processor, fetcher, packet processor, CPF, CPC

**********************
Command processor (CP)
Expand Down
6 changes: 3 additions & 3 deletions docs/conceptual/compute-unit.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. meta::
:description: Omniperf performance model: Compute unit (CU)
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, GCN, compute, unit, pipeline, workgroup, wavefront,
:description: ROCm Compute Profiler performance model: Compute unit (CU)
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, GCN, compute, unit, pipeline, workgroup, wavefront,
CDNA

*****************
Expand All @@ -19,7 +19,7 @@ CDNA™-based accelerators. All :ref:`wavefronts <desc-wavefront>` of a
The CU consists of several independent execution pipelines and functional units.
The :doc:`/conceptual/pipeline-descriptions` section details the various
execution pipelines -- VALU, SALU, LDS, scheduler, and so forth. The metrics
presented by Omniperf for these pipelines are described in
presented by ROCm Compute Profiler for these pipelines are described in
:doc:`pipeline-metrics`. The :doc:`vL1D <vector-l1-cache>` cache and
:doc:`LDS <local-data-share>` are described in their own sections.

Expand Down
6 changes: 3 additions & 3 deletions docs/conceptual/definitions.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
.. meta::
:description: Omniperf terminology and definitions
:keywords: Omniperf, ROCm, glossary, definitions, terms, profiler, tool,
:description: ROCm Compute Profiler terminology and definitions
:keywords: ROCm Compute Profiler, ROCm, glossary, definitions, terms, profiler, tool,
Instinct, accelerator, AMD

***********
Definitions
***********

The following table briefly defines some terminology used in Omniperf interfaces
The following table briefly defines some terminology used in ROCm Compute Profiler interfaces
and in this documentation.

.. include:: ./includes/terms.rst
Expand Down
2 changes: 1 addition & 1 deletion docs/conceptual/includes/normalization-units.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ include:
that is, the total runtime of the kernel in seconds, as measured by the
:doc:`command processor <command-processor>`.

By default, Omniperf uses the ``per_wave`` normalization.
By default, ROCm Compute Profiler uses the ``per_wave`` normalization.

.. tip::

Expand Down
10 changes: 5 additions & 5 deletions docs/conceptual/l2-cache.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. meta::
:description: Omniperf performance model: L2 cache (TCC)
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, L2, cache, infinity fabric, metrics
:description: ROCm Compute Profiler performance model: L2 cache (TCC)
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, L2, cache, infinity fabric, metrics

**************
L2 cache (TCC)
Expand All @@ -21,7 +21,7 @@ across the L2 channels. Requests that miss in the L2 cache are passed out to
:ref:`Infinity Fabric™ <l2-fabric>` to be routed to the appropriate memory
location.

The L2 cache metrics reported by Omniperf are broken down into four
The L2 cache metrics reported by ROCm Compute Profiler are broken down into four
categories:

* :ref:`L2 Speed-of-Light <l2-sol>`
Expand Down Expand Up @@ -299,7 +299,7 @@ accelerator’s memory, or even in the CPU’s memory. Infinity Fabric
is responsible for routing these memory requests/data to the correct
location and returning any fetched data to the L2 cache. The
:ref:`l2-request-flow` describes the flow of these requests through
Infinity Fabric in more detail, as described by Omniperf metrics,
Infinity Fabric in more detail, as described by ROCm Compute Profiler metrics,
while :ref:`l2-request-metrics` give detailed definitions of
individual metrics.

Expand All @@ -309,7 +309,7 @@ Request flow
------------

The following is a diagram that illustrates how L2↔Fabric requests are reported
by Omniperf:
by ROCm Compute Profiler:

.. figure:: ../data/performance-model/fabric.png
:align: center
Expand Down
4 changes: 2 additions & 2 deletions docs/conceptual/local-data-share.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. meta::
:description: Omniperf performance model: Local data share (LDS)
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, local, data, share, LDS
:description: ROCm Compute Profiler performance model: Local data share (LDS)
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, local, data, share, LDS

**********************
Local data share (LDS)
Expand Down
10 changes: 5 additions & 5 deletions docs/conceptual/performance-model.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
.. meta::
:description: Omniperf performance model
:keywords: Omniperf, ROCm, performance, model, profiler, tool, Instinct,
:description: ROCm Compute Profiler performance model
:keywords: ROCm Compute Profiler, ROCm, performance, model, profiler, tool, Instinct,
accelerator, AMD

*****************
Performance model
*****************

Omniperf makes available an extensive list of metrics to better understand
ROCm Compute Profiler makes available an extensive list of metrics to better understand
achieved application performance on AMD Instinct™ MI-series accelerators
including Graphics Core Next™ (GCN) GPUs like the AMD Instinct MI50, CDNA™
accelerators like the MI100, and CDNA2 accelerators such as the MI250X, MI250,
Expand All @@ -18,7 +18,7 @@ hardware blocks of AMD Instinct accelerators. This section describes each
hardware block on the accelerator as interacted with by a software developer to
give a deeper understanding of the metrics reported by profiling data. Refer to
:doc:`/tutorial/profiling-by-example` for more practical examples and details on how
to use Omniperf to optimize your code.
to use ROCm Compute Profiler to optimize your code.

.. _mixxx-note:

Expand All @@ -34,7 +34,7 @@ to use Omniperf to optimize your code.
:prod-page:`MI250 <mi200/mi250>`, and :prod-page:`MI210 <mi200/mi210>`
product pages.

In this chapter, the AMD Instinct performance model used by Omniperf is divided into a handful of
In this chapter, the AMD Instinct performance model used by ROCm Compute Profiler is divided into a handful of
key hardware blocks, each detailed in the following sections:

* :doc:`compute-unit`
Expand Down
10 changes: 5 additions & 5 deletions docs/conceptual/pipeline-descriptions.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. meta::
:description: Omniperf performance model: Shader engine (SE)
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, pipeline, VALU, SALU, VMEM, SMEM, LDS, branch,
:description: ROCm Compute Profiler performance model: Shader engine (SE)
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, pipeline, VALU, SALU, VMEM, SMEM, LDS, branch,
scheduler, MFMA, AGPRs

*********************
Expand Down Expand Up @@ -101,7 +101,7 @@ coordinate between wavefronts in a workgroup.
Performance model of the local data share (LDS) on AMD Instinct MI-series
accelerators.

Above is Omniperf's performance model of the LDS on CDNA accelerators (adapted
Above is ROCm Compute Profiler's performance model of the LDS on CDNA accelerators (adapted
from :mantor-gcn-pdf:`20`). The SIMDs in the :ref:`VALU <desc-valu>` are
connected to the LDS in pairs (see above). Only one SIMD per pair may issue an
LDS instruction at a time, but both pairs may issue concurrently.
Expand Down Expand Up @@ -186,7 +186,7 @@ shadow (see the :ref:`MFMA <desc-mfma>` section for more detail).

.. note::

The IPC model used by Omniperf omits the following two complications for
The IPC model used by ROCm Compute Profiler omits the following two complications for
clarity. First, CDNA accelerators contain other execution units on the CU
that are unused for compute applications. Second, so-called "internal"
instructions (see :gcn-crash-course:`29`) are not issued to a functional
Expand Down Expand Up @@ -237,7 +237,7 @@ various AMD accelerators (including the CDNA line), we recommend the
GPRs required for D: 4
GPR alignment requirement: 8 bytes
For the purposes of Omniperf, the MFMA unit is typically treated as a separate
For the purposes of ROCm Compute Profiler, the MFMA unit is typically treated as a separate
pipeline from the :ref:`VALU <desc-valu>`, as other VALU instructions (along
with other execution pipelines such as the :ref:`SALU <desc-salu>`) typically can be
issued during a portion of the total duration of an MFMA operation.
Expand Down
10 changes: 5 additions & 5 deletions docs/conceptual/pipeline-metrics.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
.. meta::
:description: Omniperf performance model: Pipeline metrics
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, pipeline, wavefront, metrics, launch, runtime
:description: ROCm Compute Profiler performance model: Pipeline metrics
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, pipeline, wavefront, metrics, launch, runtime
VALU, MFMA, instruction mix, FLOPs, arithmetic, operations

****************
Pipeline metrics
****************

In this section, we describe the metrics available in Omniperf to analyze the
In this section, we describe the metrics available in ROCm Compute Profiler to analyze the
pipelines discussed in the :doc:`pipeline-descriptions`.

.. _wavefront:
Expand Down Expand Up @@ -233,7 +233,7 @@ Instruction mix

The instruction mix panel shows a breakdown of the various types of instructions
executed by the user’s kernel, and which pipelines on the
:doc:`CU <compute-unit>` they were executed on. In addition, Omniperf reports
:doc:`CU <compute-unit>` they were executed on. In addition, ROCm Compute Profiler reports
further information about the breakdown of operation types for the
:ref:`VALU <desc-valu>`, vector-memory, and :ref:`MFMA <desc-mfma>`
instructions.
Expand Down Expand Up @@ -555,7 +555,7 @@ Compute pipeline
FLOP counting conventions
-------------------------

Omniperf’s conventions for VALU FLOP counting are as follows:
ROCm Compute Profiler’s conventions for VALU FLOP counting are as follows:

* Addition or multiplication: 1 operation

Expand Down
4 changes: 2 additions & 2 deletions docs/conceptual/references.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. meta::
:description: Omniperf performance model: References
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, HIP, GCN, LLVM, docs, documentation, training
:description: ROCm Compute Profiler performance model: References
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, HIP, GCN, LLVM, docs, documentation, training

**********
References
Expand Down
8 changes: 4 additions & 4 deletions docs/conceptual/shader-engine.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. meta::
:description: Omniperf performance model: Shader engine (SE)
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, shader, engine, sL1D, L1I, workgroup manager, SPI
:description: ROCm Compute Profiler performance model: Shader engine (SE)
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, shader, engine, sL1D, L1I, workgroup manager, SPI

******************
Shader engine (SE)
Expand All @@ -21,7 +21,7 @@ The number of CUs on a SE varies from chip to chip -- see for example
:hip-training-pdf:`20`. In addition, newer accelerators such as the AMD
Instinct™ MI 250X have 8 SEs per accelerator.

For the purposes of Omniperf, we consider resources that are shared between
For the purposes of ROCm Compute Profiler, we consider resources that are shared between
multiple CUs on a single SE as part of the SE's metrics.

These include:
Expand Down Expand Up @@ -487,7 +487,7 @@ issuing concurrently).

.. note::

Current versions of the profiling libraries underlying Omniperf attempt to
Current versions of the profiling libraries underlying ROCm Compute Profiler attempt to
serialize concurrent kernels running on the accelerator, as the performance
counters on the device are global (that is, shared between concurrent
kernels). This means that these scheduler-pipe utilization metrics are
Expand Down
6 changes: 3 additions & 3 deletions docs/conceptual/system-speed-of-light.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
.. meta::
:description: Omniperf performance model: System Speed-of-Light
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, AMD, system, speed of light
:description: ROCm Compute Profiler performance model: System Speed-of-Light
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, AMD, system, speed of light

*********************
System Speed-of-Light
*********************

System Speed-of-Light summarizes some of the key metrics from various sections
of Omniperf’s profiling report.
of ROCm Compute Profiler’s profiling report.

.. warning::

Expand Down
12 changes: 6 additions & 6 deletions docs/conceptual/vector-l1-cache.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. meta::
:description: Omniperf performance model: Vector L1 cache (vL1D)
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, AMD, vector, l1, cache, vl1d
:description: ROCm Compute Profiler performance model: Vector L1 cache (vL1D)
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, AMD, vector, l1, cache, vl1d

**********************
Vector L1 cache (vL1D)
Expand Down Expand Up @@ -124,7 +124,7 @@ passes information about the commands (coalescing state, destination SIMD,
etc.) to the :ref:`data processing unit <desc-td>` for use after the requested
data has been retrieved.

Omniperf reports several metrics to indicate performance bottlenecks in
ROCm Compute Profiler reports several metrics to indicate performance bottlenecks in
the address processing unit, which are broken down into a few
categories:

Expand Down Expand Up @@ -378,7 +378,7 @@ Translation Cache (UTCL1). This cache contains a L1 Translation
Lookaside Buffer (TLB) which stores recently translated addresses to
reduce the cost of subsequent re-translations.

Omniperf reports the following L1 TLB metrics:
ROCm Compute Profiler reports the following L1 TLB metrics:

.. list-table::
:header-rows: 1
Expand Down Expand Up @@ -656,7 +656,7 @@ latencies of read/write memory operations to the :doc:`L2 cache <l2-cache>`.
:ref:`Cache access metrics <vl1d-cache-stall-metrics>` section when
evaluating the vL1D hit rate.
.. [#vl1d-activity] Omniperf considers the vL1D to be active when any part of
.. [#vl1d-activity] ROCm Compute Profiler considers the vL1D to be active when any part of
the vL1D (excluding the :ref:`address processor <desc-ta>` and
:ref:`data return <desc-td>` units) are active, for example, when performing
a translation, waiting for data, accessing the Tag or Cache RAMs, etc.
Expand Down Expand Up @@ -685,7 +685,7 @@ from the :ref:`VALU <desc-valu>`. When data is returned from the
:ref:`vL1D cache RAM <desc-tc>`, it is matched to this previously stored request
data, and returned to the appropriate SIMD.

Omniperf reports the following vL1D data-return path metrics:
ROCm Compute Profiler reports the following vL1D data-return path metrics:

.. list-table::
:header-rows: 1
Expand Down
20 changes: 10 additions & 10 deletions docs/how-to/analyze/cli.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
.. meta::
:description: Omniperf analysis: CLI analysis
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, command line, analyze, filtering, metrics, baseline, comparison
:description: ROCm Compute Profiler analysis: CLI analysis
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, command line, analyze, filtering, metrics, baseline, comparison

************
CLI analysis
************

This section provides an overview of Omniperf's CLI analysis features.
This section provides an overview of ROCm Compute Profiler's CLI analysis features.

* :ref:`Derived metrics <cli-list-metrics>`: All of Omniperf's built-in metrics.
* :ref:`Derived metrics <cli-list-metrics>`: All of ROCm Compute Profiler's built-in metrics.

* :ref:`Baseline comparison <analysis-baseline-comparison>`: Compare multiple
runs in a side-by-side manner.
Expand All @@ -26,7 +26,7 @@ Run ``omniperf analyze -h`` for more details.
Walkthrough
===========

1. To begin, generate a high-level analysis report using Omniperf's ``-b`` (or ``--block``) flag.
1. To begin, generate a high-level analysis report using ROCm Compute Profiler's ``-b`` (or ``--block``) flag.

.. code-block:: shell
Expand All @@ -40,7 +40,7 @@ Walkthrough
|_|
Analysis mode = cli
[analysis] deriving Omniperf metrics...
[analysis] deriving ROCm Compute Profiler metrics...
--------------------------------------------------------------------------------
0. Top Stats
Expand Down Expand Up @@ -146,7 +146,7 @@ Walkthrough
|_|
Analysis mode = cli
[analysis] deriving Omniperf metrics...
[analysis] deriving ROCm Compute Profiler metrics...
0 -> Top Stats
1 -> System Info
2 -> System Speed-of-Light
Expand Down Expand Up @@ -280,7 +280,7 @@ Walkthrough
4. Optimize the application, iterate, and re-profile to inspect performance
changes.

5. Redo a comprehensive analysis with Omniperf CLI at any optimization
5. Redo a comprehensive analysis with ROCm Compute Profiler CLI at any optimization
milestone.

.. _cli-analysis-options:
Expand Down Expand Up @@ -322,7 +322,7 @@ Filter kernels
$ omniperf analyze -p workloads/vcopy/MI200/ --list-stats
Analysis mode = cli
[analysis] deriving Omniperf metrics...
[analysis] deriving ROCm Compute Profiler metrics...
--------------------------------------------------------------------------------
Detected Kernels (sorted descending by duration)
Expand All @@ -349,7 +349,7 @@ Filter kernels
$ omniperf analyze -p workloads/vcopy/MI200/ -k 0
Analysis mode = cli
[analysis] deriving Omniperf metrics...
[analysis] deriving ROCm Compute Profiler metrics...
--------------------------------------------------------------------------------
0. Top Stats
Expand Down
Loading

0 comments on commit f5f0bac

Please sign in to comment.