Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with multiple targets and remove smfmac tests from unsupported test targets #1372

Merged
merged 17 commits into from
Jul 4, 2024

Conversation

junliume
Copy link
Contributor

@junliume junliume commented Jul 3, 2024

@junliume
Copy link
Contributor Author

junliume commented Jul 3, 2024

@illsilin @jakpiase please help to check if this check if complete, build multiple targets and tests multiple targets for CK is indeed tricky

@junliume junliume added the bug Something isn't working label Jul 3, 2024
carlushuang
carlushuang previously approved these changes Jul 3, 2024
Copy link
Contributor

@carlushuang carlushuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@junliume junliume changed the title Remove smfmac tests from unsupported test targets Fix issue with multiple targets and remove smfmac tests from unsupported test targets Jul 3, 2024
reg_c.template AsType<float4_t>()(Number<0>{}) = __builtin_amdgcn_smfmac_f32_16x16x32_f16(
reg_a, reg_b, reg_c.template AsType<float4_t>()[Number<0>{}], reg_idx, 0, 0);
#else
ignore = reg_a;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are consolidated from #1358

@@ -2,11 +2,11 @@ add_gtest_executable(test_grouped_convnd_bwd_data test_grouped_convnd_bwd_data_x
if(result EQUAL 0)
target_link_libraries(test_grouped_convnd_bwd_data PRIVATE utility device_grouped_conv2d_bwd_data_instance device_grouped_conv3d_bwd_data_instance)
endif()
add_gtest_executable(test_grouped_convnd_bwd_data_interface test_grouped_convnd_bwd_data_interface_xdl.cpp)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: these are fixing duplicate test name issue

@jakpiase jakpiase self-requested a review July 3, 2024 07:57
Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I remember smfmac instructions won't work on earlier ROCm versions than 6.2. Is there any way to stop using this test on cmake level based on that condition?

@junliume
Copy link
Contributor Author

junliume commented Jul 3, 2024

From what I remember smfmac instructions won't work on earlier ROCm versions than 6.2. Is there any way to stop using this test on cmake level based on that condition?

maybe if(CK_hip_VERSION VERSION_GREATER_EQUAL 6.2) ?

@junliume
Copy link
Contributor Author

junliume commented Jul 3, 2024

@jakpiase @aosewski please feel free to edit this branch in this PR :) I will be offline for the next a few hours.

All the remaining problems are easily reproducible by:

CXX=/opt/rocm/bin/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_BUILD_TYPE=Release -DGPU_TARGETS="gfx1100;gfx90a;gfx942" ..

and then

make -j$(nproc) check

@junliume junliume added the urgency_blocker blocking feature deliverables label Jul 3, 2024
Comment on lines 3 to 4
#if(!defined(__HIP_DEVICE_COMPILE__) || defined(__gfx908__) || defined(__gfx90a__) || \
defined(__gfx94__))
Copy link

@atamazov atamazov Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junliume The expected effect is to compile the code for the host and for MI100/200/300, right? Looks good (at least for clang 19, see https://clang.llvm.org/docs/AMDGPUSupport.html) except __gfx94__ which looks suspicious. If you are in doubt too, then you can try this:

#if(!defined(__HIP_DEVICE_COMPILE__) || (defined(__gfx9__) && !defined(__gfx906__) && !defined(__gfx900__))

The above should enable compilation for all gfx9 parts except Vega10/20.

BTW it's a shame that compiler doesn't have the __HIP_ARCH_* feature macro that indicates the presence of xDLOPs (MAI) on target, see https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/hip_porting_guide.html#table-of-architecture-properties. Perhaps it's worth requesting it.

Copy link

@atamazov atamazov Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junliume Maybe it is worth introducing intermediate macros that would enable cleaner programming, e.g. put something like this into a header (target_features.hpp?):

#define TARGET_HOST   1
#define TARGET_GFX9   0
#define TARGET_GFX900 0
#define TARGET_GFX906 0
#define TARGET_GFX908 0
#define TARGET_GFX90A 0
#define TARGET_GFX941 0
#define TARGET_GFX942 0
...
#if defined(__HIP_DEVICE_COMPILE__)
  #undef TARGET_HOST
  #define TARGET_HOST 0
  #if defined(__gfx9__)
    #undef TARGET_GFX9
    #define TARGET_GFX9 1
  #endif
  ...
  #if defined(__gfx906__)
    #undef TARGET_GFX906
    #define TARGET_GFX906 1
  #endif
  #if defined(__gfx908__)
    #undef TARGET_GFX908
    #define TARGET_GFX908 1
  #endif
  ...
  #if defined(__gfx940__)
    #undef TARGET_GFX940
    #define TARGET_GFX940 1
  #endif
  #if defined(__gfx941__)
    #undef TARGET_GFX941
    #define TARGET_GFX941 1
  #endif
  ...
#endif

// Now define features:
#define TARGET_GFX94X (TARGET_GFX941 || TARGET_GFX942 || /* enumerate all gfx94s */ )
#define TARGET_GFX_XDLOPS (TARGET_GFX908 || TARGET_GFX90A || TARGET_GFX94X /* add others in the future*/ )

Then you can use this like:

#include <target_features.hpp>

#if TARGET_HOST || (TARGET_XDLOPS && TARGET_GFX9)
  // Enabled for host and for GFX9 XDLOPS devices...

which seems much cleaner to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atamazov found these lines so maybe we should use existing ones there
https://github.com/ROCm/composable_kernel/blob/fix_1371/include/ck/ck.hpp#L55-L74

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junliume At first glance, yes. And you may even find some suitable "feature" macros there, like CK_USE_AMD_MFMA_BF16_1K_OP or so (but I am still unable to find things like TARGET_GFX_XDLOPS there)))

However, these just define, which enforces to use defined() everywhere. What I am proposing is always defining to either 0 or 1, which allows for cleaner constructs. Just to mention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junliume At first glance, yes. And you may even find some suitable "feature" macros there, like CK_USE_AMD_MFMA_BF16_1K_OP or so (but I am still unable to find things like TARGET_GFX_XDLOPS there)))

However, these just define, which enforces to use defined() everywhere. What I am proposing is always defining to either 0 or 1, which allows for cleaner constructs. Just to mention.

Good point, I will keep a note about this refactoring and assign CK developer on it.

Copy link
Contributor

@carlushuang carlushuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@junliume junliume merged commit 9590738 into develop Jul 4, 2024
26 checks passed
@illsilin illsilin deleted the fix_1371 branch August 7, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CI - Pass urgency_blocker blocking feature deliverables
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants