Skip to content

v3.0.1

Compare
Choose a tag to compare
@vpirogov vpirogov released this 24 Feb 03:20
· 30 commits to rls-v3.0 since this release

This is a patch release containing the following changes to v3.0:

  • Fixed potential correctness issue in convolution weight gradient with 1x1 filter and strides (e589966)
  • Improved convolution, deconvolution, inner product, and matmul primitives performance with scales on Intel CPUs (38319f1, 18de927, b6170d1, 85171b0)
  • Reverted MEMFD allocator in Xbyak to avoid fails in high load scenarios (eaaa41b)
  • Fixed array out of bounds issue in bfloat16 convolution weight gradient on Intel CPUs (a17a64c)
  • Improved compatibility with future versions of Intel GPU driver (eb7a0a0)
  • Fixed segfault in fp16 and bfloat16 convolution backward propagation on systems with Intel AMX support (293561b)
  • Fixed build issue with GCC 13 (1d7971c)
  • Fixed correctness issue in int8 RNN primitive Vanilla GRU flavor on Intel CPUs (f4a149c, fbf8dca)
  • Added check for unsupported arguments in binary primitive implementation for AArch64-based processors (5bb9070)
  • Fixed correctness issue in int8 convolution with zero-points on Intel Data Center GPU Max Series (96e868c)
  • Fixed runtime error in convolution primitive with small number of channels on Xe-based graphics (068893e)
  • Removed use of OpenCL C variable length arrays in reduction primitive implementation for Intel GPUs (41e8612)
  • Fixed correctness issue in matmul and inner product primitives on Intel Data Center GPU Max Series (a1e6bc5, dbb7c28)
  • Fixed segfault in fp16 and bfloat16 convolution backward propagation on future Intel Xeon processors (code name Sierra Forest) (399b7c5)
  • Fixed runtime error in Graph API for partitions with quantized matmul and add operations (f881da5, 699ba75, b8d21a5, 9421fb2)
  • Fixed convolution performance regression on Xe-based graphics (1869bf2)
  • Improved convolution performance with OHWI and OIHW weight formats on Intel Data Center GPU Max Series (2d0b31e, 5bd5d52)
  • Fixed include files handling in build system affecting CMake projects relying on oneDNN (c616453)
  • Added tbb::finalize to tests and examples to address intermittent test crashes with TBB runtime (891a415, c79e543, 8312c3a, 1a32b95, bd0389d, f05013d, ab7938f, 31c9e7b, f3261e4, d58ac41, f8c67b9, 258849b, b20a8c7)
  • Fixed segfault in fp16 convolution primitive on future Intel Xeon processors (code name Granite Rapids) (a574fff)
  • Fixed correctness issue in fp16 convolution primitive on future Intel Xeon processors (code name Sierra Forest) (f165ed8)
  • Fixed correctness issue in int8 convolution primitive on Intel CPUs (ca15922, 27845b8)
  • Fixed correctness issue in int8 convolution primitive on Intel Data Center GPU Max Series (8bb651c)
  • Fixed correctness issue in resampling primitive with post-ops on Intel CPUs (aa52a51)
  • Addressed excessive memory consumption in 3D convolution on Intel CPUs (3d6412a, 097acb5, fd69663)
  • Fixed segfault in convolution with sum and relu post-ops on Intel CPUs (63ad769, 1b13037, 0a8116b, 9972cb8)
  • Addressed convolution performance regression with small number of channels on Intel GPUs (d3af877)
  • Worked around MSVS 2019 bug resulting in build fails on Windows (4024775)
  • Updated code base formatting to clang-format 11 (23576f9, 0b1bf84)