-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with multiple targets and remove smfmac tests from unsupported test targets #1372
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
reg_c.template AsType<float4_t>()(Number<0>{}) = __builtin_amdgcn_smfmac_f32_16x16x32_f16( | ||
reg_a, reg_b, reg_c.template AsType<float4_t>()[Number<0>{}], reg_idx, 0, 0); | ||
#else | ||
ignore = reg_a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are consolidated from #1358
@@ -2,11 +2,11 @@ add_gtest_executable(test_grouped_convnd_bwd_data test_grouped_convnd_bwd_data_x | |||
if(result EQUAL 0) | |||
target_link_libraries(test_grouped_convnd_bwd_data PRIVATE utility device_grouped_conv2d_bwd_data_instance device_grouped_conv3d_bwd_data_instance) | |||
endif() | |||
add_gtest_executable(test_grouped_convnd_bwd_data_interface test_grouped_convnd_bwd_data_interface_xdl.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewers: these are fixing duplicate test name issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I remember smfmac instructions won't work on earlier ROCm versions than 6.2. Is there any way to stop using this test on cmake level based on that condition?
maybe |
@jakpiase @aosewski please feel free to edit this branch in this PR :) I will be offline for the next a few hours. All the remaining problems are easily reproducible by:
and then
|
#if(!defined(__HIP_DEVICE_COMPILE__) || defined(__gfx908__) || defined(__gfx90a__) || \ | ||
defined(__gfx94__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junliume The expected effect is to compile the code for the host and for MI100/200/300, right? Looks good (at least for clang 19, see https://clang.llvm.org/docs/AMDGPUSupport.html) except __gfx94__
which looks suspicious. If you are in doubt too, then you can try this:
#if(!defined(__HIP_DEVICE_COMPILE__) || (defined(__gfx9__) && !defined(__gfx906__) && !defined(__gfx900__))
The above should enable compilation for all gfx9 parts except Vega10/20.
BTW it's a shame that compiler doesn't have the __HIP_ARCH_*
feature macro that indicates the presence of xDLOPs (MAI) on target, see https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/hip_porting_guide.html#table-of-architecture-properties. Perhaps it's worth requesting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junliume Maybe it is worth introducing intermediate macros that would enable cleaner programming, e.g. put something like this into a header (target_features.hpp?):
#define TARGET_HOST 1
#define TARGET_GFX9 0
#define TARGET_GFX900 0
#define TARGET_GFX906 0
#define TARGET_GFX908 0
#define TARGET_GFX90A 0
#define TARGET_GFX941 0
#define TARGET_GFX942 0
...
#if defined(__HIP_DEVICE_COMPILE__)
#undef TARGET_HOST
#define TARGET_HOST 0
#if defined(__gfx9__)
#undef TARGET_GFX9
#define TARGET_GFX9 1
#endif
...
#if defined(__gfx906__)
#undef TARGET_GFX906
#define TARGET_GFX906 1
#endif
#if defined(__gfx908__)
#undef TARGET_GFX908
#define TARGET_GFX908 1
#endif
...
#if defined(__gfx940__)
#undef TARGET_GFX940
#define TARGET_GFX940 1
#endif
#if defined(__gfx941__)
#undef TARGET_GFX941
#define TARGET_GFX941 1
#endif
...
#endif
// Now define features:
#define TARGET_GFX94X (TARGET_GFX941 || TARGET_GFX942 || /* enumerate all gfx94s */ )
#define TARGET_GFX_XDLOPS (TARGET_GFX908 || TARGET_GFX90A || TARGET_GFX94X /* add others in the future*/ )
Then you can use this like:
#include <target_features.hpp>
#if TARGET_HOST || (TARGET_XDLOPS && TARGET_GFX9)
// Enabled for host and for GFX9 XDLOPS devices...
which seems much cleaner to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atamazov found these lines so maybe we should use existing ones there
https://github.com/ROCm/composable_kernel/blob/fix_1371/include/ck/ck.hpp#L55-L74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junliume At first glance, yes. And you may even find some suitable "feature" macros there, like CK_USE_AMD_MFMA_BF16_1K_OP
or so (but I am still unable to find things like TARGET_GFX_XDLOPS there)))
However, these just define, which enforces to use defined()
everywhere. What I am proposing is always defining to either 0 or 1, which allows for cleaner constructs. Just to mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junliume At first glance, yes. And you may even find some suitable "feature" macros there, like
CK_USE_AMD_MFMA_BF16_1K_OP
or so (but I am still unable to find things like TARGET_GFX_XDLOPS there)))However, these just define, which enforces to use
defined()
everywhere. What I am proposing is always defining to either 0 or 1, which allows for cleaner constructs. Just to mention.
Good point, I will keep a note about this refactoring and assign CK developer on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#1371