v3.4.2
This is a patch release containing the following changes to v3.4.1:
- Fixed performance regression in deconvolution on processors with Intel AVX-512 instruction set (307b35b, f46fffb)
- Improved performance of batched matmul with binary post-op on processors with Intel AVX-512 instruction set (d39e1b7)
- Fixed performance regression in softmax with destination memory format set to
any
on processors with Intel AVX-512 instruction set (756d3cf) - Fixed incorrect results in int8 deconvolution with source zero points on processors with Intel AMX instruction set (d5ddbc8)
- Fixed performance regression in convolution on processors with Intel AVX2 instruction set (2968c89)
- Improved f8_e4m3 matmul performance on Intel Data Center GPU Max Series (068f850, 668abae, c3972ef, ad94382)
- Fixed sporadic accuracy issues in bf16 depthwise convolution backpropagation on processors with Intel AVX-512 instruction set (0184044)
- Fixed primitive creation issue for fp16 pooling backpropagation on Intel GPUs (e4737d9)
- Fixed failure for subgraphs with int8 matmul operation with experimental Graph Compiler on processors with Intel AMX instruction set (5ebde2e)
- Fixed assert in experimental Graph Compiler on Windows (f53fbd1, fd903ae)
- Fixed incorrect results for subgraphs with shuffle operation with experimental Graph Compiler (aef5023)
- Improved performance of subgraphs involving int8 matmul with experimental Graph Compiler on processors with Intel AMX support (0ca5bc5)
- Fixed page fault in fp16 matmul primitive on Intel Data Center GPU Max Series (5587f08)
- Fixed incorrect results in dp32 deconvolution with Arm Compute Library on AArch64 processors (b7694a0)
- Fixed performance regression in deconvolution on processors with Intel AVX2 instruction set (6f452e2)