Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BN FWD INFER OCL to HIP #3106

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from
Open

Conversation

sgundabo
Copy link
Contributor

@sgundabo sgundabo commented Jul 10, 2024

This PR focuses on converting the Batch Norm Forward Inference Spatial and Per Activation kernels from OpenCL to HIP. This conversion is a part of the broader initiative to translate all OpenCL kernels within MIOpen, as the OpenCL backend has been deprecated.

Ensuring correctness:
The PR includes a GTEST that compares the output of the OpenCL kernel with the HIP implementation. The test cases are derived from the existing batch norm forward inference kernel GTEST.

Ensuring GPU Performance parity:
The GTEST also measures the minimum, maximum, mean, median, and standard deviation of the kernel execution time across five runs and records the data in a CSV file. This data is used to create graphs that illustrate the average performance improvement of the HIP implementation over OpenCL. An average performance gain greater than one is considered favorable.
TODO: Collect perf metrics on a wider variety of test cases on a gfx90a, and ensure parity.

Ensuring Host side Performance parity:
As the OpenCL backend support is deprecated in MIOpen, the assumption is that this decision was made while being aware of the compilation overhead of HIP kernels over OpenCL.

CAHEK7
CAHEK7 previously requested changes Jul 10, 2024
Copy link
Contributor

@CAHEK7 CAHEK7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main concern is missed half data type test. Other comments are mostly C++ related and not very critical.

Also clang-format should be applied to the files. And can you take a look into this draft https://gist.github.com/CAHEK7/9ed75ca55acf85f92efd89eb2446a94d and adopt it.

@sgundabo sgundabo changed the title BN FWD INFER SPATIAL HIP BN FWD INFER OCL to HIP Jul 11, 2024
@sgundabo sgundabo requested a review from CAHEK7 July 11, 2024 20:50
Copy link
Contributor

@CAHEK7 CAHEK7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks fine, except a few tiny typos.
Only test and perf data are required to get it merged.
Could you adjust verification algorithm and thresholds and attach perf measurements data?

@atamazov
Copy link
Contributor

@sgundabo PR description is missing some important info.

Please provide:

  • explanation of what is done and why
  • how this have been tested for correctness
  • the same Q about GPU time/performance
    • is it guaranteed that the new kernels have the same or better perf than the old ones? If yes, then how this is guaranteed?
  • the same Q about host-side overhead.
    • we know that HIP compilation time is typically 10x longer than OCL. We must ensure that OCL->HIP transition does not lead to degradation of "initial iteration time".

/cc @CAHEK7 @junliume

@CAHEK7
Copy link
Contributor

CAHEK7 commented Jul 22, 2024

the same Q about host-side overhead.

  • we know that HIP compilation time is typically 10x longer than OCL. We must ensure that OCL->HIP transition does not lead to degradation of "initial iteration time".

We can't do anything with it, except deprecating HIP and get back to OCL.

@junliume junliume requested a review from CAHEK7 July 24, 2024 00:51
@junliume junliume dismissed CAHEK7’s stale review July 24, 2024 00:52

Re-request review

@junliume
Copy link
Contributor

@sgundabo could you move this PR to ready for review?

@CAHEK7
Copy link
Contributor

CAHEK7 commented Jul 24, 2024

@junliume we are waiting for the perf metrics

@sgundabo
Copy link
Contributor Author

Raw Perf Data
BNFwdInferRawPerfData.zip

HW info: gfx90a

FP32 Perf
BNFwdInferRawPerfData_FP32

FP16 Perf
BNFwdInferRawPerfData_FP16

@CAHEK7 CAHEK7 marked this pull request as ready for review September 12, 2024 15:27
@CAHEK7
Copy link
Contributor

CAHEK7 commented Sep 12, 2024

It can be safely merged since it does not affect production code. I just don't want to lose this PR.

@junliume
Copy link
Contributor

Restarted CI for sanity checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants