Skip to content

Backport of 6.4.2 for cherry-pick list #714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: release/rocm-rel-6.4
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/formatting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ concurrency:

jobs:
python:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04

steps:
- name: Checkout
Expand All @@ -35,7 +35,7 @@ jobs:
uses: isort/isort-action@master

cmake:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04

steps:
- uses: actions/checkout@v4
Expand All @@ -58,7 +58,7 @@ jobs:
fi

python-bytecode:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04

steps:
- uses: actions/checkout@v4
Expand Down
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@

Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/).

## ROCm Compute Profiler 3.2.0 for ROCm 6.4.2

### Added

* Add FP8 metrics' support for MI300
* Add additional datatype for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on gpu architecture)
* Add datatype selection option for roofline profiling: --roofline-data-type / -R option (Default is FP32)
* Change dependency from rocm-smi to amd-smi

### Changed


### Resolved issues
* Fixed a crash related to Agent ID caused by the new format of the rocprofv3 output CSV file

## ROCm Compute Profiler 3.1.0 for ROCm 6.4.0

### Added
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.1.0
3.2.0
Binary file modified docs/data/profile/sample-roof-plot.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions docs/how-to/analyze/standalone-gui.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,11 @@ application's profiling data:

#. Memory Chart Analysis
#. Empirical Roofline Analysis

Use ``--roofline-data-type`` option to specify which data type(s) you would like plotted on the roofline PDFs in the standalone analysis GUI.
Datatypes can be stacked- for example, "--roofline-data-type FP32 FP64 I32" would display one PDF with FP32 and FP64 stacked, and one PDF with INT32.
Default roofline datatype plotted is FP32.

#. Top Stats (Top Kernel Statistics)
#. System Info
#. System Speed-of-Light
Expand Down
12 changes: 6 additions & 6 deletions docs/how-to/profile/mode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,9 @@ Roofline options
Allows you to specify a device ID to collect performance data from when
running a roofline benchmark on your system.

``--roofline-data-type <datatype>``
Allows you to specify the data types that you want plotted in the roofline PDF output(s). Selecting more than one data type will overlay the results onto the same plot. Default data type: FP32

To distinguish different kernels in your ``.pdf`` roofline plot use
``--kernel-names``. This will give each kernel a unique marker identifiable from
the plot's key.
Expand Down Expand Up @@ -431,8 +434,7 @@ successfully.

$ ls workloads/vcopy/MI200/
total 48
-rw-r--r-- 1 auser agroup 13331 Mar 1 16:05 empirRoof_gpu-0_fp32_fp64.pdf
-rw-r--r-- 1 auser agroup 13136 Mar 1 16:05 empirRoof_gpu-0_int8_fp16.pdf
-rw-r--r-- 1 auser agroup 13331 Mar 1 16:05 empirRoof_gpu-0_FP32.pdf
drwxr-xr-x 1 auser agroup 0 Mar 1 16:03 perfmon
-rw-r--r-- 1 auser agroup 1101 Mar 1 16:03 pmc_perf.csv
-rw-r--r-- 1 auser agroup 1715 Mar 1 16:05 roofline.csv
Expand All @@ -441,11 +443,9 @@ successfully.

.. note::

ROCm Compute Profiler generates two roofline outputs to organize results and reduce
clutter. One chart plots FP32/FP64 performance while the other plots I8/FP16
performance.
ROCm Compute Profiler currently captures roofline profiling for all data types, and you can reduce the clutter in the PDF outputs by filtering the data type(s). Selecting multiple data types will overlay the results into the same PDF. To generate results in separate PDFs for each data type from the same workload run, you can re-run the profiling command with each data type as long as the ``roofline.csv`` file still exists in the workload folder.

The following image is a sample ``empirRoof_gpu-0_int8_fp16.pdf`` roofline
The following image is a sample ``empirRoof_gpu-0_FP32.pdf`` roofline
plot.

.. image:: ../../data/profile/sample-roof-plot.jpg
Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/use.rst
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ The following table lists ROCm Compute Profiler's basic operations, their

* - :ref:`Standalone roofline analysis <standalone-roofline>`
- ``profile``
- ``--name``, ``--roof-only``, ``-- <profile_cmd>``
- ``--name``, ``--roof-only``, ``--roofline-data-type <data_type>``, ``-- <profile_cmd>``

* - :ref:`Import a workload to database <grafana-gui-import>`
- ``database``
Expand Down
28 changes: 27 additions & 1 deletion src/argparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def omniarg_parser(
default=False,
action="store_true",
help=argparse.SUPPRESS,
#help="\t\t\tKokkos trace, traces Kokkos API calls.",
# help="\t\t\tKokkos trace, traces Kokkos API calls.",
)
profile_group.add_argument(
"-k",
Expand Down Expand Up @@ -316,6 +316,19 @@ def omniarg_parser(
action="store_true",
help="\t\t\tInclude kernel names in roofline plot.",
)

roofline_group.add_argument(
"-R",
"--roofline-data-type",
required=False,
choices=["FP8", "FP16", "BF16", "FP32", "FP64", "I8", "I32", "I64"],
metavar="",
nargs="+",
type=str,
default=["FP32"],
help="\t\t\tChoose datatypes to view roofline PDFs for: (DEFAULT: FP32)\n\t\t\t FP8\n\t\t\t FP16\n\t\t\t BF16\n\t\t\t FP32\n\t\t\t FP64\n\t\t\t I8\n\t\t\t I32\n\t\t\t I64\n\t\t\t ",
)

# roofline_group.add_argument('-w', '--workgroups', required=False, default=-1, type=int, help="\t\t\tNumber of kernel workgroups (DEFAULT: 1024)")
# roofline_group.add_argument('--wsize', required=False, default=-1, type=int, help="\t\t\tWorkgroup size (DEFAULT: 256)")
# roofline_group.add_argument('--dataset', required=False, default = -1, type=int, help="\t\t\tDataset size (DEFAULT: 536M)")
Expand Down Expand Up @@ -510,6 +523,19 @@ def omniarg_parser(
const=8050,
help="\t\tActivate a GUI to interate with rocprofiler-compute metrics.\n\t\tOptionally, specify port to launch application (DEFAULT: 8050)",
)

analyze_group.add_argument(
"-R",
"--roofline-data-type",
required=False,
choices=["FP8", "FP16", "BF16", "FP32", "FP64", "I8", "I32", "I64"],
metavar="",
nargs="+",
type=str,
default=["FP32"],
help="\t\t\tChoose datatypes to view roofline PDFs for: (DEFAULT: FP32)\n\t\t\t FP8\n\t\t\t FP16\n\t\t\t BF16\n\t\t\t FP32\n\t\t\t FP64\n\t\t\t I8\n\t\t\t I32\n\t\t\t I64\n\t\t\t ",
)

analyze_advanced_group.add_argument(
"--random-port",
action="store_true",
Expand Down
3 changes: 3 additions & 0 deletions src/rocprof_compute_analyze/analysis_webui.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ def __init__(self, args, supported_archs):
# define any elements which will have full width
self.__full_width_elements = {1801}

self.__roofline_data_type = args.roofline_data_type

@demarcate
def build_layout(self, input_filters, arch_configs):
"""
Expand Down Expand Up @@ -180,6 +182,7 @@ def generate_from_filter(
"mem_level": "ALL",
"include_kernel_names": False,
"is_standalone": False,
"roofline_data_type": self.__roofline_data_type,
}
)
roof_obj = self.get_socs()[self.arch].roofline_obj
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ Panel Config:
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
pop: None # No perf counter
tips:
MFMA FLOPs (F8):
value: None # No HW module
unit: GFLOP
peak: None # No HW module
pop: None # No HW module
tips:
MFMA FLOPs (BF16):
value: None # No perf counter
unit: GFLOPs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,12 @@ Panel Config:
max: None # No HW module
unit: (instr + $normUnit)
tips:
MFMA-F8:
avg: None # No HW module
min: None # No HW module
max: None # No HW module None # No HW module
unit: (instr + $normUnit)
tips:
MFMA-F16:
avg: None # No HW module
min: None # No HW module
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,22 @@ Panel Config:
metric:
VALU FLOPs:
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
VALU IOPs:
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F8):
value: None # No perf counter
unit: GFLOP
peak: None # No perf counter
pop: None # No perf counter
tips:
MFMA FLOPs (BF16):
value: None # No perf counter
Unit: None
Expand All @@ -39,25 +45,25 @@ Panel Config:
tips:
MFMA FLOPs (F16):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F32):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F64):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
Expand Down Expand Up @@ -174,6 +180,12 @@ Panel Config:
max: None # No perf counter
unit: (OPs + $normUnit)
tips:
F8 OPs:
avg: None # No HW module
min: None # No HW module
max: None # No HW module
unit: (OPs + $normUnit)
tips:
F16 OPs:
avg: None # No perf counter
min: None # No perf counter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ Panel Config:
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
pop: None # No perf counter
tips:
MFMA FLOPs (F8):
value: None # No HW module
unit: GFLOP
peak: None # No HW module
pop: None # No HW module
tips:
MFMA FLOPs (BF16):
value: None # No perf counter
unit: GFLOPs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,13 @@ Panel Config:
unit: Unit
tips: Tips
metric:
INT-32:
INT32:
avg: None # No perf counter
min: None # No perf counter
max: None # No perf counter
unit: (instr + $normUnit)
tips:
INT-64:
INT64:
avg: None # No perf counter
min: None # No perf counter
max: None # No perf counter
Expand Down Expand Up @@ -241,6 +241,12 @@ Panel Config:
max: None # No HW module
unit: (instr + $normUnit)
tips:
MFMA-F8:
avg: None # No HW module
min: None # No HW module
max: None # No HW module None # No HW module
unit: (instr + $normUnit)
tips:
MFMA-F16:
avg: None # No HW module
min: None # No HW module
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,22 @@ Panel Config:
metric:
VALU FLOPs:
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
VALU IOPs:
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F8):
value: None # No perf counter
unit: GFLOP
peak: None # No perf counter
pop: None # No perf counter
tips:
MFMA FLOPs (BF16):
value: None # No perf counter
Unit: None
Expand All @@ -39,25 +45,25 @@ Panel Config:
tips:
MFMA FLOPs (F16):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F32):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F64):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
Expand Down Expand Up @@ -174,6 +180,12 @@ Panel Config:
max: None # No perf counter
unit: (OPs + $normUnit)
tips:
F8 OPs:
avg: None # No HW module
min: None # No HW module
max: None # No HW module
unit: (OPs + $normUnit)
tips:
F16 OPs:
avg: None # No perf counter
min: None # No perf counter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,12 @@ Panel Config:
pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
- Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
tips:
MFMA FLOPs (F8):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA FLOPs (BF16):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
Expand Down
Loading