Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocm6.4 IFU CP 09122024 #1596

Draft
wants to merge 60 commits into
base: main
Choose a base branch
from
Draft

rocm6.4 IFU CP 09122024 #1596

wants to merge 60 commits into from

Conversation

dnikolaev-amd
Copy link

rocm6.4_internal_testing

rraminen and others added 30 commits September 13, 2024 10:31
* changes to build Centos stream 9 images

* Added scripts for centos and centos stream images

* Added an extra line

* Add ninja installation

* Optimized code

* Fixes

* Add comment

* Optimized code

* Added AMDGPU mapping for ROCm 5.2 and invalid-url for rocm_baseurl

Co-authored-by: Jithun Nair <[email protected]>
- Rocblas API support is requested
- SWDEV-383635 & sub task - SWDEV-390218
* Add hip_basic tensorpipe support to PyTorch

* Enabling hip_basic for Tensorpipe for pyTorch

* removing upstream tensorpipe module

* Adding ROCm specific tensopipe submodule

* tensorpipe submodule updated

* Update the hip invalid device string

* Added ignore for tensorpipe git submodule

* Moved include of tensorpipe_cuda.h to hipify

* Updates based on review comments

* Defining the variable __HIP_PLATFORM_AMD__

* Enabling the UTs

Co-authored-by: Ronak Malik <[email protected]>
- Fortran package installation moved after gcc
- Update libtinfo search code in cmake1
- Install libstdc++.so
Reversed the condition as required
- Add missing common_utils.sh
- Update the install vision part
- Move to amdgpu rhel 9.3 builds
- Update to pick python from conda path
- Add a missing package
- Add ROCM_PATH and magma
- Updated repo radeon path
This also fixes a problem in gesvd driver when UV is not needed.
- build_environment is hard coded to value from upstream when
  branch for created, since the dev/QA ENV build_environment
  value can be varing
* Fix the parsing of /etc/os-release

The old code parses OS_DISTRO as 'PRETTY_Ubuntu' on Ubuntu and thus
never links to libtinfo correctly.

* Configurable CMAKE_PREFIX_PATH in CI script.
- This is done as per QA request, needs to be reverted and
  not required to be cherry-picked into later releases.
* Moved NAVI check to the test file

* Revised NAVI check as a function
* Running triton kernel on ROCM only has one GB/s metric reported

* Update test_kernel_benchmark.py
* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
alugorey and others added 28 commits September 13, 2024 14:22
…pired (#1399)

* Skip certificate check only for CentOS7 since certificate expired

* Naming
- PYTORCH_EXTRA_INSTALL_REQUIREMENTS is set in builder repo
- Remove the PYTORCH_EXTRA_INSTALL_REQUIREMENTS step from this file
- Causing regression - SWDEV-463083
* Fix SWDEV-459623. The Rank of logsumexp Tensor must be 3.

This tensor was considered for internal use only but apparently exposed to UTs.

* Fix for mGPU.

The stream should be selected after picking the current device according
to input tensor.
* Add formal FP8 check in common_cuda.py

* Enable inductor/test_valid_cast

* Support for test_eager_fallback

* allow fnuz types on amax test

* Finalize passing tests vs failing

* Fix fnuz constants in _to_fp8_saturated
* Enable batchnorm NHWC for MIOpen

* cleanup

* test to compare NHWC MIOpen batchnorm with CPU

* fix 'use_miopen' condition for nhwc miopen

* fix includes

* use native nhwc batchnorm to verify miopen

* remove extra spaces

* remove empty lines

* set PYTORCH_MIOPEN_SUGGEST_NHWC=1 for all test_nn.py test
…1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string
* Initial commit to port intra_node_comm to ROCm

(cherry picked from commit 48d1c33)

* gpt-fast running now with intra-node comm

(cherry picked from commit 618c54e)

---------

Co-authored-by: Prachi Gupta <[email protected]>
* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics
Fixes
inductor.test_torchinductor_dynamic_shapes::TestInductorDynamicCUDA::test_item_unbacked_stride_nobreak_cuda
* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>
… installstion (#1557)

This PR pins sympy==1.12.1 in the .ci/docker/requirements-ci.txt file
Also it skips pytorch-nightly installation in docker images

Installation of pytorch-nightly is needed to prefetch mobilenet_v2 avd
v3 models for some tests.
Came from

85bd6bc
Models are downloaded on first use to the folder /root/.cache/torch/hub
But pytorch-nightly installation also overrides
.ci/docker/requirements-ci.txt settings and upgrades some of python
packages (sympy from 1.12.0 to 1.13.0) which causes several
'dynamic_shapes' tests to fail
Skip prefetching models affects these tests without any errors (but
**internet access required**):

- python test/mobile/model_test/gen_test_model.py mobilenet_v2
- python test/quantization/eager/test_numeric_suite_eager.py -k
test_mobilenet_v3

Issue ROCm/frameworks-internal#8772

Also, in case of some issues these models can be prefetched after
pytorch building and before testing

(cherry picked from commit b92b34d)

Fixes #ISSUE_NUMBER
New tests introduced for testing NHWC and NCHW batchnorm on MIOpen : 

- test_batchnorm_nhwc_miopen_cuda_float32
- test_batchnorm_nchw_miopen_cuda_float32

This test verifies weight and bias gradients, running_mean and
running_var
We can add other dtypes later

How to run:
`MIOPEN_ENABLE_LOGGING_CMD=1 python -u test/test_nn.py -v -k
test_batchnorm_nhwc_miopen_cuda_float32`

There is a difference in running_variance for NHWC batchnorm fp32
between MIOpen and native
```
MIOPEN_ENABLE_LOGGING_CMD=1 python -u test/test_nn.py -v -k test_batchnorm_nhwc_miopen_cuda_float32
...
self.assertEqual(mod.running_var, ref_mod.running_var)
AssertionError: Tensor-likes are not close!
Mismatched elements: 8 / 8 (100.0%)
Greatest absolute difference: 0.05455732345581055 at index (5,) (up to 1e-05 allowed)
Greatest relative difference: 0.030772637575864792 at index (5,) (up to 1.3e-06 allowed)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.