Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] Added smoke tests for conv solvers (ConvAsmImplicitGemmV4R1DynamicFwd_1x1 and more). Some fixes. #1911

Open
wants to merge 114 commits into
base: develop
Choose a base branch
from

Conversation

atamazov
Copy link
Contributor

@atamazov atamazov commented Dec 23, 2022

This is the last PR in series that aims to cover all convolution Solvers with smoke tests.

@junliume Proposed reviewers: @averinevg (mostly to share knowledge), @shurale-nkn. https://github.com/ROCmSoftwarePlatform/MIOpen/labels/testing https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_high

This PR adds smoke tests for conv solvers:

  • ConvAsmImplicitGemmV4R1DynamicFwd_1x1
  • ConvMPBidirectWinograd_xdlops (with tuning)
  • ConvMPBidirectWinograd<M,N>
  • ConvOclBwdWrW2 (with tuning)
  • ConvOclBwdWrW2NonTunable
  • ConvOclDirectFwd (with tuning)
  • ConvOclDirectFwd1x1 (with tuning)
  • GemmBwd1x1_stride1
  • GemmBwd1x1_stride2
  • GemmBwdRest
  • GemmFwd1x1_0_1
  • GemmFwd1x1_0_1_int8
  • GemmFwd1x1_0_2
  • GemmFwdRest
  • GemmWrw1x1_stride1
  • GemmWrwUniversal
  • ConvHipImplicitGemmBwdXdlops
    • ⚠️ WORKAROUND_ISSUE_2173 introduced.

By products:

TODO

…h results in omission of all custom tests
…d() check from ConvAsm1x1UV2 is it does not support FP16
…t potential performance degradation after tests.
@atamazov
Copy link
Contributor Author

atamazov commented May 1, 2023

@averinevg @shurale-nkn @johnny-keker @junliume Solver tests updated for MI300. Ready for review & CI testing.

@atamazov atamazov marked this pull request as ready for review May 1, 2023 21:22
@junliume
Copy link
Contributor

junliume commented May 3, 2023

@atamazov another round of CI has just started.

@atamazov
Copy link
Contributor Author

atamazov commented May 3, 2023

I think fresh develop needs to be merged once more.

@atamazov
Copy link
Contributor Author

atamazov commented May 3, 2023

@junliume @johnny-keker Please start another CI round, thanks!

@atamazov
Copy link
Contributor Author

atamazov commented May 3, 2023

@junliume @johnny-keker Can you please send me failing logs? Please also let me know which docker image is currently used on CI. Thanks!

@junliume
Copy link
Contributor

junliume commented May 3, 2023

@junliume @johnny-keker Can you please send me failing logs? Please also let me know which docker image is currently used on CI. Thanks!

Hi @atamazov it failed in smoke_solver_ConvHipImplicitGemmBwdXdlops test for the stage Fp32 Hip Debug gfx90a. Here is the error message:

[2023-05-03T19:26:30.939Z] /home/jenkins/workspace/MLLibs_MIOpen_PR-1911/build/bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-weights --input 128 64 7 7 --weights 64 64 3 3 --batch_size 128 --input_channels 64 --output_channels 64 --spatial_dim_elements 7 7 --filter_dims 3 3 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 --in_layout NHWC --fil_layout NHWC --out_layout NHWC --deterministic 0 --tensor_vect 0 --vector_length 1 --output_type int32 --int8_vectorize 0 
[2023-05-03T19:26:30.939Z] MIOpen(HIP): Info [GetWorkSpaceSize] 0
[2023-05-03T19:26:30.939Z] MIOpen(HIP): Info [FindConvBwdDataAlgorithm] requestAlgoCount = 1, workspace = 0
[2023-05-03T19:26:30.939Z] MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 0.16479 ms
[2023-05-03T19:26:30.939Z] MIOpen(HIP): Info [TryLoad] Find-db regenerating.
[2023-05-03T19:26:30.939Z] MIOpen(HIP): Info [GetPerfDbPathFile] Found exact perf database file
[2023-05-03T19:26:30.939Z] MIOpen(HIP): Info [FindSolutionImpl] ConvHipImplicitGemmBwdXdlops
[2023-05-03T19:26:30.939Z] MIOpen(HIP): Warning [FindSolutionImpl] Perf Db: load skipped: ConvHipImplicitGemmBwdXdlops, enforce: SEARCH_DB_UPDATE(4)
[2023-05-03T19:26:30.939Z] MIOpen(HIP): Info [FindSolutionImpl] Starting search: ConvHipImplicitGemmBwdXdlops, enforce: SEARCH_DB_UPDATE(4)
[2023-05-03T19:26:30.939Z] UndefinedBehaviorSanitizer:DEADLYSIGNAL
[2023-05-03T19:26:30.939Z] ==189537==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address (pc 0x7fda3951ae1e bp 0x7ffe35f2cce0 sp 0x7ffe35f2c990 T189537)
[2023-05-03T19:26:30.939Z] ==189537==The signal is caused by a READ memory access.
[2023-05-03T19:26:30.939Z] ==189537==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
[2023-05-03T19:26:30.939Z]     #0 0x7fda3951ae1e  (/home/jenkins/workspace/MLLibs_MIOpen_PR-1911/build/lib/libMIOpen.so.1+0x1cfc1e1e)

@atamazov
Copy link
Contributor Author

atamazov commented May 4, 2023

@junliume Thanks! Which docker image is currently used on CI?

…onvHipImplicitGemmBwdXdlops for MI200 && FP32
@atamazov
Copy link
Contributor Author

@junliume wrt #1911 (comment):

Merged from the recent develop as well. Can you please start CI again. Thanks!

@atamazov
Copy link
Contributor Author

atamazov commented Jul 1, 2023

@atamazov
Copy link
Contributor Author

atamazov commented Jul 4, 2023

@junliume Now this PR is unblocked because (the fix for #2239 is here).

@@ -237,6 +238,9 @@ option( WORKAROUND_ISSUE_1187 "" ${WORKAROUND_ISSUE_1187_DEFAULT})
set_var_to_condition(WORKAROUND_ISSUE_1148_DEFAULT (MIOPEN_TEST_GFX103X OR MIOPEN_TEST_GFX110X) AND MIOPEN_TEST_FLOAT)
option( WORKAROUND_ISSUE_1148 "" ${WORKAROUND_ISSUE_1148_DEFAULT})

set_var_to_condition(WORKAROUND_ISSUE_2173_DEFAULT (MIOPEN_TEST_GFX90A OR MIOPEN_TEST_GFX908 OR MIOPEN_TEST_GFX94X) AND MIOPEN_TEST_FLOAT)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Enable W/A for Debug builds only

@junliume
Copy link
Contributor

@atamazov could we revisit this PR? Update it and I can kick off a whole set of CI on it.

@atamazov
Copy link
Contributor Author

@junliume Sure, as soon as I have time. Quite a lot of work required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants