find/replace Omniperf to ROCm Compute Profiler

Signed-off-by: Peter Park <[email protected]>
ROCm · Oct 3, 2024 · f5f0bac · f5f0bac
1 parent fb210ab
commit f5f0bac
Show file tree

Hide file tree

Showing 34 changed files with 291 additions and 291 deletions.
diff --git a/docs/conceptual/command-processor.rst b/docs/conceptual/command-processor.rst
@@ -1,6 +1,6 @@
 .. meta::
-   :description: Omniperf performance model: Command processor (CP)
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, command, processor, fetcher, packet processor, CPF, CPC
+   :description: ROCm Compute Profiler performance model: Command processor (CP)
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, command, processor, fetcher, packet processor, CPF, CPC
 
 **********************
 Command processor (CP)

diff --git a/docs/conceptual/compute-unit.rst b/docs/conceptual/compute-unit.rst
@@ -1,6 +1,6 @@
 .. meta::
-   :description: Omniperf performance model: Compute unit (CU)
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, GCN, compute, unit, pipeline, workgroup, wavefront,
+   :description: ROCm Compute Profiler performance model: Compute unit (CU)
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, GCN, compute, unit, pipeline, workgroup, wavefront,
               CDNA
 
 *****************
@@ -19,7 +19,7 @@ CDNA™-based accelerators. All :ref:`wavefronts <desc-wavefront>` of a
 The CU consists of several independent execution pipelines and functional units.
 The :doc:`/conceptual/pipeline-descriptions` section details the various
 execution pipelines -- VALU, SALU, LDS, scheduler, and so forth. The metrics
-presented by Omniperf for these pipelines are described in
+presented by ROCm Compute Profiler for these pipelines are described in
 :doc:`pipeline-metrics`. The :doc:`vL1D <vector-l1-cache>` cache and
 :doc:`LDS <local-data-share>` are described in their own sections.
 

diff --git a/docs/conceptual/definitions.rst b/docs/conceptual/definitions.rst
@@ -1,13 +1,13 @@
 .. meta::
-   :description: Omniperf terminology and definitions
-   :keywords: Omniperf, ROCm, glossary, definitions, terms, profiler, tool,
+   :description: ROCm Compute Profiler terminology and definitions
+   :keywords: ROCm Compute Profiler, ROCm, glossary, definitions, terms, profiler, tool,
               Instinct, accelerator, AMD
 
 ***********
 Definitions
 ***********
 
-The following table briefly defines some terminology used in Omniperf interfaces
+The following table briefly defines some terminology used in ROCm Compute Profiler interfaces
 and in this documentation.
 
 .. include:: ./includes/terms.rst

diff --git a/docs/conceptual/includes/normalization-units.rst b/docs/conceptual/includes/normalization-units.rst
@@ -34,7 +34,7 @@ include:
        that is, the total runtime of the kernel in seconds, as measured by the
        :doc:`command processor <command-processor>`.
 
-By default, Omniperf uses the ``per_wave`` normalization.
+By default, ROCm Compute Profiler uses the ``per_wave`` normalization.
 
 .. tip::
 

diff --git a/docs/conceptual/l2-cache.rst b/docs/conceptual/l2-cache.rst
@@ -1,6 +1,6 @@
 .. meta::
-   :description: Omniperf performance model: L2 cache (TCC)
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, L2, cache, infinity fabric, metrics
+   :description: ROCm Compute Profiler performance model: L2 cache (TCC)
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, L2, cache, infinity fabric, metrics
 
 **************
 L2 cache (TCC)
@@ -21,7 +21,7 @@ across the L2 channels. Requests that miss in the L2 cache are passed out to
 :ref:`Infinity Fabric™ <l2-fabric>` to be routed to the appropriate memory
 location.
 
-The L2 cache metrics reported by Omniperf are broken down into four
+The L2 cache metrics reported by ROCm Compute Profiler are broken down into four
 categories:
 
 *  :ref:`L2 Speed-of-Light <l2-sol>`
@@ -299,7 +299,7 @@ accelerator’s memory, or even in the CPU’s memory. Infinity Fabric
 is responsible for routing these memory requests/data to the correct
 location and returning any fetched data to the L2 cache. The
 :ref:`l2-request-flow` describes the flow of these requests through
-Infinity Fabric in more detail, as described by Omniperf metrics,
+Infinity Fabric in more detail, as described by ROCm Compute Profiler metrics,
 while :ref:`l2-request-metrics` give detailed definitions of
 individual metrics.
 
@@ -309,7 +309,7 @@ Request flow
 ------------
 
 The following is a diagram that illustrates how L2↔Fabric requests are reported
-by Omniperf:
+by ROCm Compute Profiler:
 
 .. figure:: ../data/performance-model/fabric.png
    :align: center

diff --git a/docs/conceptual/local-data-share.rst b/docs/conceptual/local-data-share.rst
@@ -1,6 +1,6 @@
 .. meta::
-   :description: Omniperf performance model: Local data share (LDS)
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, local, data, share, LDS
+   :description: ROCm Compute Profiler performance model: Local data share (LDS)
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, local, data, share, LDS
 
 **********************
 Local data share (LDS)

diff --git a/docs/conceptual/performance-model.rst b/docs/conceptual/performance-model.rst
@@ -1,13 +1,13 @@
 .. meta::
-   :description: Omniperf performance model
-   :keywords: Omniperf, ROCm, performance, model, profiler, tool, Instinct,
+   :description: ROCm Compute Profiler performance model
+   :keywords: ROCm Compute Profiler, ROCm, performance, model, profiler, tool, Instinct,
               accelerator, AMD
 
 *****************
 Performance model
 *****************
 
-Omniperf makes available an extensive list of metrics to better understand
+ROCm Compute Profiler makes available an extensive list of metrics to better understand
 achieved application performance on AMD Instinct™ MI-series accelerators
 including Graphics Core Next™ (GCN) GPUs like the AMD Instinct MI50, CDNA™
 accelerators like the MI100, and CDNA2 accelerators such as the MI250X, MI250,
@@ -18,7 +18,7 @@ hardware blocks of AMD Instinct accelerators. This section describes each
 hardware block on the accelerator as interacted with by a software developer to
 give a deeper understanding of the metrics reported by profiling data. Refer to
 :doc:`/tutorial/profiling-by-example` for more practical examples and details on how
-to use Omniperf to optimize your code.
+to use ROCm Compute Profiler to optimize your code.
 
 .. _mixxx-note:
 
@@ -34,7 +34,7 @@ to use Omniperf to optimize your code.
    :prod-page:`MI250 <mi200/mi250>`, and :prod-page:`MI210 <mi200/mi210>`
    product pages.
 
-In this chapter, the AMD Instinct performance model used by Omniperf is divided into a handful of
+In this chapter, the AMD Instinct performance model used by ROCm Compute Profiler is divided into a handful of
 key hardware blocks, each detailed in the following sections:
 
 * :doc:`compute-unit`

diff --git a/docs/conceptual/pipeline-descriptions.rst b/docs/conceptual/pipeline-descriptions.rst
@@ -1,6 +1,6 @@
 .. meta::
-   :description: Omniperf performance model: Shader engine (SE)
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, pipeline, VALU, SALU, VMEM, SMEM, LDS, branch,
+   :description: ROCm Compute Profiler performance model: Shader engine (SE)
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, pipeline, VALU, SALU, VMEM, SMEM, LDS, branch,
               scheduler, MFMA, AGPRs
 
 *********************
@@ -101,7 +101,7 @@ coordinate between wavefronts in a workgroup.
    Performance model of the local data share (LDS) on AMD Instinct MI-series
    accelerators.
 
-Above is Omniperf's performance model of the LDS on CDNA accelerators (adapted
+Above is ROCm Compute Profiler's performance model of the LDS on CDNA accelerators (adapted
 from  :mantor-gcn-pdf:`20`). The SIMDs in the :ref:`VALU <desc-valu>` are
 connected to the LDS in pairs (see above). Only one SIMD per pair may issue an
 LDS instruction at a time, but both pairs may issue concurrently.
@@ -186,7 +186,7 @@ shadow (see the :ref:`MFMA <desc-mfma>` section for more detail).
 
 .. note::
 
-   The IPC model used by Omniperf omits the following two complications for
+   The IPC model used by ROCm Compute Profiler omits the following two complications for
    clarity. First, CDNA accelerators contain other execution units on the CU
    that are unused for compute applications. Second, so-called "internal"
    instructions (see :gcn-crash-course:`29`) are not issued to a functional
@@ -237,7 +237,7 @@ various AMD accelerators (including the CDNA line), we recommend the
             GPRs required for D: 4
             GPR alignment requirement: 8 bytes
 
-For the purposes of Omniperf, the MFMA unit is typically treated as a separate
+For the purposes of ROCm Compute Profiler, the MFMA unit is typically treated as a separate
 pipeline from the :ref:`VALU <desc-valu>`, as other VALU instructions (along
 with other execution pipelines such as the :ref:`SALU <desc-salu>`) typically can be
 issued during a portion of the total duration of an MFMA operation.

diff --git a/docs/conceptual/pipeline-metrics.rst b/docs/conceptual/pipeline-metrics.rst
@@ -1,13 +1,13 @@
 .. meta::
-   :description: Omniperf performance model: Pipeline metrics
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, pipeline, wavefront, metrics, launch, runtime
+   :description: ROCm Compute Profiler performance model: Pipeline metrics
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, pipeline, wavefront, metrics, launch, runtime
               VALU, MFMA, instruction mix, FLOPs, arithmetic, operations
 
 ****************
 Pipeline metrics
 ****************
 
-In this section, we describe the metrics available in Omniperf to analyze the
+In this section, we describe the metrics available in ROCm Compute Profiler to analyze the
 pipelines discussed in the :doc:`pipeline-descriptions`.
 
 .. _wavefront:
@@ -233,7 +233,7 @@ Instruction mix
 
 The instruction mix panel shows a breakdown of the various types of instructions
 executed by the user’s kernel, and which pipelines on the
-:doc:`CU <compute-unit>` they were executed on. In addition, Omniperf reports
+:doc:`CU <compute-unit>` they were executed on. In addition, ROCm Compute Profiler reports
 further information about the breakdown of operation types for the
 :ref:`VALU <desc-valu>`, vector-memory, and :ref:`MFMA <desc-mfma>`
 instructions.
@@ -555,7 +555,7 @@ Compute pipeline
 FLOP counting conventions
 -------------------------
 
-Omniperf’s conventions for VALU FLOP counting are as follows:
+ROCm Compute Profiler’s conventions for VALU FLOP counting are as follows:
 
 * Addition or multiplication: 1 operation
 

diff --git a/docs/conceptual/references.rst b/docs/conceptual/references.rst
@@ -1,6 +1,6 @@
 .. meta::
-   :description: Omniperf performance model: References
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, HIP, GCN, LLVM, docs, documentation, training
+   :description: ROCm Compute Profiler performance model: References
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, HIP, GCN, LLVM, docs, documentation, training
 
 **********
 References

diff --git a/docs/conceptual/shader-engine.rst b/docs/conceptual/shader-engine.rst
@@ -1,6 +1,6 @@
 .. meta::
-   :description: Omniperf performance model: Shader engine (SE)
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, shader, engine, sL1D, L1I, workgroup manager, SPI
+   :description: ROCm Compute Profiler performance model: Shader engine (SE)
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, shader, engine, sL1D, L1I, workgroup manager, SPI
 
 ******************
 Shader engine (SE)
@@ -21,7 +21,7 @@ The number of CUs on a SE varies from chip to chip -- see for example
 :hip-training-pdf:`20`. In addition, newer accelerators such as the AMD
 Instinct™ MI 250X have 8 SEs per accelerator.
 
-For the purposes of Omniperf, we consider resources that are shared between
+For the purposes of ROCm Compute Profiler, we consider resources that are shared between
 multiple CUs on a single SE as part of the SE's metrics.
 
 These include:
@@ -487,7 +487,7 @@ issuing concurrently).
 
 .. note::
 
-   Current versions of the profiling libraries underlying Omniperf attempt to
+   Current versions of the profiling libraries underlying ROCm Compute Profiler attempt to
    serialize concurrent kernels running on the accelerator, as the performance
    counters on the device are global (that is, shared between concurrent
    kernels). This means that these scheduler-pipe utilization metrics are

diff --git a/docs/conceptual/system-speed-of-light.rst b/docs/conceptual/system-speed-of-light.rst
@@ -1,13 +1,13 @@
 .. meta::
-   :description: Omniperf performance model: System Speed-of-Light
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, AMD, system, speed of light
+   :description: ROCm Compute Profiler performance model: System Speed-of-Light
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, AMD, system, speed of light
 
 *********************
 System Speed-of-Light
 *********************
 
 System Speed-of-Light summarizes some of the key metrics from various sections
-of Omniperf’s profiling report.
+of ROCm Compute Profiler’s profiling report.
 
 .. warning::
 

diff --git a/docs/conceptual/vector-l1-cache.rst b/docs/conceptual/vector-l1-cache.rst
@@ -1,6 +1,6 @@
 .. meta::
-   :description: Omniperf performance model: Vector L1 cache (vL1D)
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, AMD, vector, l1, cache, vl1d
+   :description: ROCm Compute Profiler performance model: Vector L1 cache (vL1D)
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, AMD, vector, l1, cache, vl1d
 
 **********************
 Vector L1 cache (vL1D)
@@ -124,7 +124,7 @@ passes information about the commands (coalescing state, destination SIMD,
 etc.) to the :ref:`data processing unit <desc-td>` for use after the requested
 data has been retrieved.
 
-Omniperf reports several metrics to indicate performance bottlenecks in
+ROCm Compute Profiler reports several metrics to indicate performance bottlenecks in
 the address processing unit, which are broken down into a few
 categories:
 
@@ -378,7 +378,7 @@ Translation Cache (UTCL1). This cache contains a L1 Translation
 Lookaside Buffer (TLB) which stores recently translated addresses to
 reduce the cost of subsequent re-translations.
 
-Omniperf reports the following L1 TLB metrics:
+ROCm Compute Profiler reports the following L1 TLB metrics:
 
 .. list-table::
    :header-rows: 1
@@ -656,7 +656,7 @@ latencies of read/write memory operations to the :doc:`L2 cache <l2-cache>`.
    :ref:`Cache access metrics <vl1d-cache-stall-metrics>` section when
    evaluating the vL1D hit rate.
 
-.. [#vl1d-activity] Omniperf considers the vL1D to be active when any part of
+.. [#vl1d-activity] ROCm Compute Profiler considers the vL1D to be active when any part of
    the vL1D (excluding the :ref:`address processor <desc-ta>` and
    :ref:`data return <desc-td>` units) are active, for example, when performing
    a translation, waiting for data, accessing the Tag or Cache RAMs, etc.
@@ -685,7 +685,7 @@ from the :ref:`VALU <desc-valu>`. When data is returned from the
 :ref:`vL1D cache RAM <desc-tc>`, it is matched to this previously stored request
 data, and returned to the appropriate SIMD.
 
-Omniperf reports the following vL1D data-return path metrics:
+ROCm Compute Profiler reports the following vL1D data-return path metrics:
 
 .. list-table::
    :header-rows: 1

diff --git a/docs/how-to/analyze/cli.rst b/docs/how-to/analyze/cli.rst
@@ -1,14 +1,14 @@
 .. meta::
-   :description: Omniperf analysis: CLI analysis
-   :keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, command line, analyze, filtering, metrics, baseline, comparison
+   :description: ROCm Compute Profiler analysis: CLI analysis
+   :keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, command line, analyze, filtering, metrics, baseline, comparison
 
 ************
 CLI analysis
 ************
 
-This section provides an overview of Omniperf's CLI analysis features.
+This section provides an overview of ROCm Compute Profiler's CLI analysis features.
 
-* :ref:`Derived metrics <cli-list-metrics>`: All of Omniperf's built-in metrics.
+* :ref:`Derived metrics <cli-list-metrics>`: All of ROCm Compute Profiler's built-in metrics.
 
 * :ref:`Baseline comparison <analysis-baseline-comparison>`: Compare multiple
   runs in a side-by-side manner.
@@ -26,7 +26,7 @@ Run ``omniperf analyze -h`` for more details.
 Walkthrough
 ===========
 
-1. To begin, generate a high-level analysis report using Omniperf's ``-b`` (or ``--block``) flag. 
+1. To begin, generate a high-level analysis report using ROCm Compute Profiler's ``-b`` (or ``--block``) flag. 
 
    .. code-block:: shell
 
@@ -40,7 +40,7 @@ Walkthrough
                               |_|                  
 
       Analysis mode = cli
-      [analysis] deriving Omniperf metrics...
+      [analysis] deriving ROCm Compute Profiler metrics...
 
       --------------------------------------------------------------------------------
       0. Top Stats
@@ -146,7 +146,7 @@ Walkthrough
                               |_|                  
 
       Analysis mode = cli
-      [analysis] deriving Omniperf metrics...
+      [analysis] deriving ROCm Compute Profiler metrics...
       0 -> Top Stats
       1 -> System Info
       2 -> System Speed-of-Light
@@ -280,7 +280,7 @@ Walkthrough
 4. Optimize the application, iterate, and re-profile to inspect performance
    changes.
 
-5. Redo a comprehensive analysis with Omniperf CLI at any optimization
+5. Redo a comprehensive analysis with ROCm Compute Profiler CLI at any optimization
    milestone.
 
 .. _cli-analysis-options:
@@ -322,7 +322,7 @@ Filter kernels
      $ omniperf analyze -p workloads/vcopy/MI200/ --list-stats
 
      Analysis mode = cli
-     [analysis] deriving Omniperf metrics...
+     [analysis] deriving ROCm Compute Profiler metrics...
 
      --------------------------------------------------------------------------------
      Detected Kernels (sorted descending by duration)
@@ -349,7 +349,7 @@ Filter kernels
      $ omniperf analyze -p workloads/vcopy/MI200/ -k 0
 
      Analysis mode = cli
-     [analysis] deriving Omniperf metrics...
+     [analysis] deriving ROCm Compute Profiler metrics...
 
      --------------------------------------------------------------------------------
      0. Top Stats