help needed for improving conv with given shape #2088

shawnxhong · 2024-09-09T08:49:35Z

Hi dear team,

Is there any other way to accelerate this conv (ic=16, oc=16, height=208, width=32, stride=1, kernel=3) on a single core?

ONEDNN_VERBOSE=1 numactl -C 1 -m 0 ./benchdnn --mode=P --conv --dt=bf16:bf16:bf16 --stag=any --wtag=any --dtag=any --dir=FWD_B --attr-post-ops=eltwise_relu:0:1 --alg=AUTO mb1_ic16oc16_ih208oh208kh3sh1dh0ph1_iw32ow32kw3sw1dw0pw1

The result is:

onednn_verbose,v1,primitive,exec,cpu,convolution,brg_conv_fwd:avx10_1_512_amx,forward_training,src:bf16:a:blocked:acdb::f0 wei:bf16:a:blocked:AcdB16a2b::f0 bia:bf16:a:blocked:a::f0 dst:bf16:a:blocked:acdb::f0,attr-post-ops:eltwise_relu:0:1,alg:convolution_direct,mb1_ic16oc16_ih208oh208kh3sh1dh0ph1_iw32ow32kw3sw1dw0pw1,0.0310059

The branch is rls-v3.6 and I built with CC=icx and CXX=icpx.
Fusing conv + relu, upgrading to the latest version and using intel C++ compiler are the only useful method I can think of.
I tried to use other memory tags but not useful.
Please kindly suggest any other methods that can accelerate this conv.
Thanks a lot for your great help in advance.

The text was updated successfully, but these errors were encountered:

shawnxhong · 2024-09-10T07:37:15Z

Dear team,
as the kernel is 3 * 3, is it helpful to modify the input (208, 32) so that winograd conv can be used? what input size does winograd need?
Or, is it helpful to use Ukernel to improve the blocking?

asirvaiy · 2024-09-10T09:12:54Z

Hi @shawnxhong ,
Thanks for posting your query.
Winograd can work with small kernels like 3x3. The speedup depends on how much is the reduction in scaler multiplications, and that depends on the input shape.

Winograd is supported on GPU (FP16 and FP32) and AArch64 CPUs. Which platform are you targeting here?

shawnxhong · 2024-09-11T01:58:13Z

Hi @shawnxhong , Thanks for posting your query. Winograd can work with small kernels like 3x3. The speedup depends on how much is the reduction in scaler multiplications, and that depends on the input shape.

Winograd is supported on GPU (FP16 and FP32) and AArch64 CPUs. Which platform are you targeting here?

Thanks for your answer. I am targeting x86 with AMX. So it means winograd only works for gpu and aarch64?
Then what else methods are feasible to improve this conv itself of this shape on x86 with AMX? Is using Ukernel helpful?

asirvaiy · 2024-09-12T09:30:55Z

Hi @shawnxhong , Thanks for posting your query. Winograd can work with small kernels like 3x3. The speedup depends on how much is the reduction in scaler multiplications, and that depends on the input shape.
Winograd is supported on GPU (FP16 and FP32) and AArch64 CPUs. Which platform are you targeting here?

Thanks for your answer. I am targeting x86 with AMX. So it means winograd only works for gpu and aarch64? Then what else methods are feasible to improve this conv itself of this shape on x86 with AMX? Is using Ukernel helpful?

Hi,
Ukernal is useful if we don't have the BRGCONV/BRGeMM implementation. It can be used to implement BRGeMM.
However, oneDNN already has BRGeMM and BRGCONV implemented, there is no need for it.

Going for INT8 Precision is one way to utilize AMX and get better Conv performance.

shawnxhong added the question label Sep 9, 2024

shu1chen assigned onednnsupporttriage Sep 9, 2024

shu1chen added the platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 label Sep 9, 2024

asirvaiy assigned asirvaiy and unassigned onednnsupporttriage Sep 10, 2024

asirvaiy closed this as completed Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

help needed for improving conv with given shape #2088

help needed for improving conv with given shape #2088

shawnxhong commented Sep 9, 2024 •

edited

Loading

shawnxhong commented Sep 10, 2024

asirvaiy commented Sep 10, 2024

shawnxhong commented Sep 11, 2024 •

edited

Loading

asirvaiy commented Sep 12, 2024

help needed for improving conv with given shape #2088

help needed for improving conv with given shape #2088

Comments

shawnxhong commented Sep 9, 2024 • edited Loading

shawnxhong commented Sep 10, 2024

asirvaiy commented Sep 10, 2024

shawnxhong commented Sep 11, 2024 • edited Loading

asirvaiy commented Sep 12, 2024

shawnxhong commented Sep 9, 2024 •

edited

Loading

shawnxhong commented Sep 11, 2024 •

edited

Loading