Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocm6.4 IFU CP 09122024 #1596

Closed
wants to merge 60 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
98cc4e1
[SOW MS3] Centos stream9 PyTorch image support (#1090)
rraminen Aug 30, 2022
05d6126
Updated to latest conda for CentOS stream 9
pruthvistony Aug 31, 2022
dd31176
Temporarily skip test_conv3d_64bit_indexing
pruthvistony Mar 23, 2023
e96aba2
Enable tensorpipe with hip_basic backend (#1135)
pruthvistony Nov 13, 2022
8fffb23
Updates to build on Jammy
pruthvistony Jul 28, 2023
cbd0b44
Fix lstsq related regressions (part of SWDEV-392820)
pruthvistony Sep 12, 2023
9923275
[UB22.04] Updates to support latest scipy
pruthvistony Aug 18, 2023
b0697b9
Build required version of libpng for CentOS7
pruthvistony Aug 18, 2023
4fd6e13
Update tensorpipe submodule to support ROCm 6.0
pruthvistony Aug 5, 2023
f2de668
Set ROCM_PATH in env for centOS docker container
pruthvistony Aug 13, 2023
e83f564
Updated condition for libstc++ for Jammy
pruthvistony Sep 5, 2023
b56588b
Skip ddp apply_optim_in_bwd tests for gloo (#1302)
jataylo Oct 27, 2023
e59bfe3
Changes to support docker v23
pruthvistony Sep 25, 2023
281e2bf
[CS9] Updates to CentOS stream 9 build (#1326)
pruthvistony Nov 28, 2023
eea29cd
Update to hipify mapping
pruthvistony Dec 6, 2023
e5067c2
Correcting usage of USE_ROCM
pruthvistony Dec 6, 2023
2be2a79
Enable gesvda for ROCM >= 6.1 (#1339)
xinyazhang Dec 20, 2023
8f6c7af
Increase lifespan of test-times files
pruthvistony Dec 31, 2023
f1f2b4e
Fixes CI build script (#1350)
xinyazhang Jan 26, 2024
d98149c
[NO CP] Temporary dumping of test exec log to stderr
pruthvistony Jan 30, 2024
8a4d1e2
Add skipIfRocmArch decorator for Navi skips (#1356)
jataylo Feb 26, 2024
5b77292
Converted NAVI check as a function (#1364)
BLOrange-AMD Mar 7, 2024
4c93554
Triton build conditionalized on ROCM_VERSION
pruthvistony Mar 8, 2024
7da900e
Remove ROCmloops specific test
pruthvistony Mar 12, 2024
4aba300
Bad import in test_torchinductor and skip torchvision related UT (#1374)
pragupta Mar 20, 2024
5580969
skip test_inductor_freezing failing UTs (#1375)
pragupta Mar 20, 2024
183802e
Skip test_mm_triton_kernel_benchmark (#1376)
pragupta Mar 21, 2024
90c132a
temporarily ignore certificate check for Miniconda
yanyao-wang Apr 12, 2024
0d89328
Implementation of PyTorch ut parsing script - QA helper function (#1386)
BLOrange-AMD Apr 18, 2024
f47dca8
[HIP] Returned error string update
pruthvistony May 13, 2024
e27ff6e
PR #1255 to rocm6.2 release
ramcherukuri May 15, 2024
0ae8e99
Reformat test_float8_basics for current rocm support (#1415)
alugorey May 16, 2024
ed694e4
Enable e5m2 x e4m3 test in test_float8_scale (#1419)
alugorey May 16, 2024
6876373
[release/2.1] Skip certificate check for CentOS7 since certificate ex…
jithunnair-amd Apr 24, 2024
7dade27
skip vmapvjpvjp_linalg_householder_product_cuda_float32 (#1420)
alugorey May 21, 2024
6e12b31
Include the ROCm version in triton version
pruthvistony May 21, 2024
d4d80ee
Change Torch extra install requirement
pruthvistony May 22, 2024
e6ff669
Remove the installation of rocm-llvm-dev package
pruthvistony May 22, 2024
da0e1b4
Fix SWDEV-459623 (#1428)
xinyazhang Jun 2, 2024
4b8aea1
Enable fp8 inductor unit tests (#1421)
alugorey Jun 2, 2024
4c94122
Enable NHWC batchnorm for miopen (#1400)
dnikolaev-amd Jun 4, 2024
4c85c6c
[HIP] Few more updates to the returned error string
pruthvistony Jun 11, 2024
d10d2fa
skipIfRocm needs msg parameter
jithunnair-amd Jun 13, 2024
93c7b7f
[NO CP] Updated changes to skip few UTs
pruthvistony Jun 13, 2024
bf3a2cd
Print consolidated log file for pytorch unit test automation scripts …
jithunnair-amd Jun 18, 2024
a5641fa
[ROCm] Intra-node all reduce initial implementation (#1435)
jataylo Jun 18, 2024
7f7d24b
Sync updates from hipify_torch. (#1168)
wenchenvincent Feb 14, 2023
d3201b0
fix install_centos() function
dnikolaev-amd Jun 20, 2024
a82ac7b
[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)
jithunnair-amd Jul 22, 2024
2ec0172
Added functions imports (#1521)
BLOrange-AMD Aug 5, 2024
8c1fa06
PyTorch unit test helper scripts enhancements (#1517)
jithunnair-amd Aug 6, 2024
e3ebe30
[Navi] [Inductor] Unskip Navi inductor UTs (#1514)
jataylo Aug 19, 2024
5b9a211
[rocm6.3_internal_testing] pin sympy==1.12.1 and skip pytorch-nightly…
dnikolaev-amd Aug 22, 2024
5783557
Add test_batchnorm_nhwc_miopen_cuda_float32 (#1561)
dnikolaev-amd Aug 26, 2024
115944d
Imported skipIfRocm in certain test suites (#1577)
BLOrange-AMD Sep 9, 2024
ac86642
[SWDEV-473498] Pin sympy for >=python3.9 (#1576)
jataylo Sep 9, 2024
9833f2d
Several issues fix of QA helper script (#1564)
BLOrange-AMD Sep 9, 2024
524bef2
rocm6.4 related_commits
dnikolaev-amd Sep 16, 2024
ccdc413
rocm6.4 test-times
dnikolaev-amd Sep 16, 2024
ae08f9f
fix sympy version in requirements-ci.txt
dnikolaev-amd Sep 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .additional_ci_files/test-class-times.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions .additional_ci_files/test-times.json

Large diffs are not rendered by default.

178 changes: 178 additions & 0 deletions .automation_scripts/parse_xml_results.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
""" The Python PyTorch testing script.
##
# Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
"""

import xml.etree.ElementTree as ET
from pathlib import Path
from typing import Any, Dict, Tuple

# Backends list
BACKENDS_LIST = [
"dist-gloo",
"dist-nccl"
]

TARGET_WORKFLOW = "--rerun-disabled-tests"

def get_job_id(report: Path) -> int:
# [Job id in artifacts]
# Retrieve the job id from the report path. In our GHA workflows, we append
# the job id to the end of the report name, so `report` looks like:
# unzipped-test-reports-foo_5596745227/test/test-reports/foo/TEST-foo.xml
# and we want to get `5596745227` out of it.
try:
return int(report.parts[0].rpartition("_")[2])
except ValueError:
return -1

def is_rerun_disabled_tests(root: ET.ElementTree) -> bool:
"""
Check if the test report is coming from rerun_disabled_tests workflow
"""
skipped = root.find(".//*skipped")
# Need to check against None here, if not skipped doesn't work as expected
if skipped is None:
return False

message = skipped.attrib.get("message", "")
return TARGET_WORKFLOW in message or "num_red" in message

def parse_xml_report(
tag: str,
report: Path,
workflow_id: int,
workflow_run_attempt: int,
work_flow_name: str
) -> Dict[Tuple[str], Dict[str, Any]]:
"""Convert a test report xml file into a JSON-serializable list of test cases."""
print(f"Parsing {tag}s for test report: {report}")

job_id = get_job_id(report)
print(f"Found job id: {job_id}")

test_cases: Dict[Tuple[str], Dict[str, Any]] = {}

root = ET.parse(report)
# TODO: unlike unittest, pytest-flakefinder used by rerun disabled tests for test_ops
# includes skipped messages multiple times (50 times by default). This slows down
# this script too much (O(n)) because it tries to gather all the stats. This should
# be fixed later in the way we use pytest-flakefinder. A zipped test report from rerun
# disabled test is only few MB, but will balloon up to a much bigger XML file after
# extracting from a dozen to few hundred MB
if is_rerun_disabled_tests(root):
return test_cases

for test_case in root.iter(tag):
case = process_xml_element(test_case)
if tag == 'testcase':
case["workflow_id"] = workflow_id
case["workflow_run_attempt"] = workflow_run_attempt
case["job_id"] = job_id
case["work_flow_name"] = work_flow_name

# [invoking file]
# The name of the file that the test is located in is not necessarily
# the same as the name of the file that invoked the test.
# For example, `test_jit.py` calls into multiple other test files (e.g.
# jit/test_dce.py). For sharding/test selection purposes, we want to
# record the file that invoked the test.
#
# To do this, we leverage an implementation detail of how we write out
# tests (https://bit.ly/3ajEV1M), which is that reports are created
# under a folder with the same name as the invoking file.
case_name = report.parent.name
for ind in range(len(BACKENDS_LIST)):
if BACKENDS_LIST[ind] in report.parts:
case_name = case_name + "_" + BACKENDS_LIST[ind]
break
case["invoking_file"] = case_name
test_cases[ ( case["invoking_file"], case["classname"], case["name"], case["work_flow_name"] ) ] = case
elif tag == 'testsuite':
case["work_flow_name"] = work_flow_name
case["invoking_xml"] = report.name
case["running_time_xml"] = case["time"]
case_name = report.parent.name
for ind in range(len(BACKENDS_LIST)):
if BACKENDS_LIST[ind] in report.parts:
case_name = case_name + "_" + BACKENDS_LIST[ind]
break
case["invoking_file"] = case_name

test_cases[ ( case["invoking_file"], case["invoking_xml"], case["work_flow_name"] ) ] = case

return test_cases

def process_xml_element(element: ET.Element) -> Dict[str, Any]:
"""Convert a test suite element into a JSON-serializable dict."""
ret: Dict[str, Any] = {}

# Convert attributes directly into dict elements.
# e.g.
# <testcase name="test_foo" classname="test_bar"></testcase>
# becomes:
# {"name": "test_foo", "classname": "test_bar"}
ret.update(element.attrib)

# The XML format encodes all values as strings. Convert to ints/floats if
# possible to make aggregation possible in Rockset.
for k, v in ret.items():
try:
ret[k] = int(v)
except ValueError:
pass
try:
ret[k] = float(v)
except ValueError:
pass

# Convert inner and outer text into special dict elements.
# e.g.
# <testcase>my_inner_text</testcase> my_tail
# becomes:
# {"text": "my_inner_text", "tail": " my_tail"}
if element.text and element.text.strip():
ret["text"] = element.text
if element.tail and element.tail.strip():
ret["tail"] = element.tail

# Convert child elements recursively, placing them at a key:
# e.g.
# <testcase>
# <foo>hello</foo>
# <foo>world</foo>
# <bar>another</bar>
# </testcase>
# becomes
# {
# "foo": [{"text": "hello"}, {"text": "world"}],
# "bar": {"text": "another"}
# }
for child in element:
if child.tag not in ret:
ret[child.tag] = process_xml_element(child)
else:
# If there are multiple tags with the same name, they should be
# coalesced into a list.
if not isinstance(ret[child.tag], list):
ret[child.tag] = [ret[child.tag]]
ret[child.tag].append(process_xml_element(child))
return ret
Loading
Loading