From 9021941263e48c9f5515929679feb4a190522096 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Thu, 23 May 2024 11:28:16 +0200
Subject: [PATCH 01/49] Updated CHANGELOG.md for 0.15.0 (#1846)

* Updated CHANGELOG.md for 0.15.0

* Update CHANGELOG.md

Co-authored-by: Natalia Polina <natalia.polina@intel.com>

* Update CHANGELOG.md

Co-authored-by: Natalia Polina <natalia.polina@intel.com>

* Update CHANGELOG.md

Co-authored-by: Natalia Polina <natalia.polina@intel.com>

---------

Co-authored-by: Natalia Polina <natalia.polina@intel.com>
---
 CHANGELOG.md | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index d8c5261cc1bc..9e2bc27d4e1e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,7 +4,75 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [0.14.0] - MM/DD/2024
+## [0.15.0] - 05/DD/2024
+
+This release completes implementation of `dpnp.linalg` module and array creation routine, adds cumulative reductions and histogram functions.
+
+### Added
+
+* Implemented `dpnp.frombuffer`, `dpnp.fromfile` and `dpnp.fromstring` functions [#1727](https://github.com/IntelPython/dpnp/pull/1727)
+* Implemented `dpnp.fromfunction`, `dpnp.fromiter` and `dpnp.loadtxt` functions [#1728](https://github.com/IntelPython/dpnp/pull/1728)
+* Added implementation of `dpnp.linalg.pinv` function [#1704](https://github.com/IntelPython/dpnp/pull/1704)
+* Added implementation of `dpnp.linalg.eigvalsh` function [#1714](https://github.com/IntelPython/dpnp/pull/1714)
+* Added implementation of `dpnp.linalg.tensorinv` function [#1752](https://github.com/IntelPython/dpnp/pull/1752)
+* Added implementation of `dpnp.linalg.tensorsolve` function [#1753](https://github.com/IntelPython/dpnp/pull/1753)
+* Added implementation of `dpnp.linalg.lstsq` function [#1792](https://github.com/IntelPython/dpnp/pull/1792)
+* Added implementation of `dpnp.einsum` and `dpnp.einsum_path` functions [#1779](https://github.com/IntelPython/dpnp/pull/1779)
+* Added implementation of `dpnp.histogram` function [#1785](https://github.com/IntelPython/dpnp/pull/1785)
+* Added implementation of `dpnp.histogram_bin_edges` function [#1823](https://github.com/IntelPython/dpnp/pull/1823)
+* Extended pre-commit hooks with `pylint` configuration [#1718](https://github.com/IntelPython/dpnp/pull/1718)
+* Extended pre-commit hooks with `codespell` configuration [#1798](https://github.com/IntelPython/dpnp/pull/1798)
+* Added a Security policy page [#1730](https://github.com/IntelPython/dpnp/pull/1730)
+* Implemented `nin` and `nout` properties for `dpnp` elementwise functions [#1712](https://github.com/IntelPython/dpnp/pull/1712)
+* Implemented `outer` method for `dpnp` elementwise functions [#1813](https://github.com/IntelPython/dpnp/pull/1813)
+
+### Changed
+
+* Added support of more number of data types and dimensions for input arrays, and all keyword arguments in `dpnp.cross` function [#1715](https://github.com/IntelPython/dpnp/pull/1715)
+* Added support of more number of data types and dimensions for input array, and all keyword arguments in `dpnp.linalg.matrix_rank` function [#1717](https://github.com/IntelPython/dpnp/pull/1717)
+* Added support of more number of data types and dimensions for input arrays in `dpnp.inner` function [#1726](https://github.com/IntelPython/dpnp/pull/1726)
+* Added support of more number of data types and dimensions for input arrays in `dpnp.linalg.multi_dot` function [#1729](https://github.com/IntelPython/dpnp/pull/1729)
+* Added support of more number of data types and dimensions for input arrays in `dpnp.kron` function [#1732](https://github.com/IntelPython/dpnp/pull/1732)
+* Added support of more number of data types and dimensions for input arrays in `dpnp.linalg.matrix_power` function [#1748](https://github.com/IntelPython/dpnp/pull/1748)
+* Added support of more number of data types and dimensions for input array, and all keyword arguments in `dpnp.norm` function [#1746](https://github.com/IntelPython/dpnp/pull/1746)
+* Added support of more number of data types and dimensions for input array in `dpnp.cond` function [#1773](https://github.com/IntelPython/dpnp/pull/1773)
+* Extended `dpnp.matmul` function to support `axes` keyword argument [#1705](https://github.com/IntelPython/dpnp/pull/1705)
+* Extended `dpnp.searchsorted` function to support `side` and `sorter` keyword arguments [#1751](https://github.com/IntelPython/dpnp/pull/1751)
+* Extended `dpnp.where` function to support scalar type by `x` and `y` arrays [#1760](https://github.com/IntelPython/dpnp/pull/1760)
+* Extended `dpnp.ndarray.transpose` method to support `axes` keyword as a list [#1770](https://github.com/IntelPython/dpnp/pull/1770)
+* Extended `dpnp.nancumsum` function to support `axis`, `dtype` and `out` keyword arguments [#1781](https://github.com/IntelPython/dpnp/pull/1781)
+* Extended `dpnp.nancumprod` function to support `axis`, `dtype` and `out` keyword arguments [#1812](https://github.com/IntelPython/dpnp/pull/1812)
+* Extended `dpnp.put` function to support more number of data types and dimensions for input arrays [#1838](https://github.com/IntelPython/dpnp/pull/1838)
+* Extended `dpnp.trace` function to support `axis1`, `axis2`, `dtype` and `out` keyword arguments [#1842](https://github.com/IntelPython/dpnp/pull/1842)
+* Corrected `dpnp.ndarray.real`and `dpnp.ndarray.imag` methods to return a view of the array [#1719](https://github.com/IntelPython/dpnp/pull/1719)
+* Corrected `dpnp.nonzero` function to raise `TypeError` exception for input array of unexpected type [#1764](https://github.com/IntelPython/dpnp/pull/1764)
+* Corrected `dpnp.diagonal` function to return a view of the array [#1817](https://github.com/IntelPython/dpnp/pull/1817)
+* Removed `dpnp.find_common_type` function as it was deprecated since NumPy 1.25.0 [#1742](https://github.com/IntelPython/dpnp/pull/1742)
+* Removed use of `dpctl` queue manager API [#1735](https://github.com/IntelPython/dpnp/pull/1735)
+* Leveraged `dpctl.tensor` implementation for `dpnp.cumsum` function [#1772](https://github.com/IntelPython/dpnp/pull/1772)
+* Leveraged `dpctl.tensor` implementation for `dpnp.cumprod` function [#1811](https://github.com/IntelPython/dpnp/pull/1811)
+* Leveraged `dpctl.tensor` implementation for `dpnp.cumlogsumexp` function [#1816](https://github.com/IntelPython/dpnp/pull/1816)
+* Leveraged `dpctl.tensor` support of `out` keyword argument in reduction and `dpnp.where` functions [#1808](https://github.com/IntelPython/dpnp/pull/1808)
+* Aligned with `dpctl` interface changes per Python Array API 2023.12 specification [#1774](https://github.com/IntelPython/dpnp/pull/1774)
+* Reworked `dpnp.linalg.eig` and `dpnp.linalg.eigvals` implementations to fall back on on NumPy calculation due to a lack of required functionality in OneMKL LAPACK [#1780](https://github.com/IntelPython/dpnp/pull/1780)
+* `dpnp` uses pybind11 2.12.0 [#1783](https://github.com/IntelPython/dpctl/pull/1783)
+* Improved `dpnp.matmul` implementation to use column major `gemm` layout for F-contiguous input arrays [#1793](https://github.com/IntelPython/dpnp/pull/1793)
+* Improved performance of `dpnp.matmul` function by call of `dpnp.kron` and `dpnp.dot` for special cases [#1815](https://github.com/IntelPython/dpnp/pull/1815)
+* Improved performance of `dpnp.diag` function by use of `dpnp.diagonal` which returns a view of the array [#1822](https://github.com/IntelPython/dpnp/pull/1822)
+* Removed limitations from `diag_indices`, `diag_indices_from`, `fill_diagonal`, `tril_indices`, `tril_indices_from`, `triu_indices`, `triu_indices_from` functions
+and added implementation of `dpnp.mask_indices` function [#1814](https://github.com/IntelPython/dpnp/pull/1814)
+
+### Fixed
+
+* Changed `dpnp.linalg.solve` to use a pair of `getrf` and `getrs` calls from OneMKL library instead of `gesv` one to mitigate an unexpected `RuntimeError` exception [#1763](https://github.com/IntelPython/dpnp/pull/1763)
+* Resolved a hang in batch implementation of `dpnp.linalg.solve` when computes on CPU device [#1778](https://github.com/IntelPython/dpnp/pull/1778)
+* Resolved an unexpected `TypeError` exception raised from `dpnp.random.vonmises` when used with a scalar `kappa` argument [#1799](https://github.com/IntelPython/dpnp/pull/1799)
+* Changed `dpnp.flatten` to comply with compute follows data approach [#1825](https://github.com/IntelPython/dpnp/pull/1825)
+* Resolved a hang in batch implementation of `dpnp.linalg.eigh` when computes on CPU device [#1832](https://github.com/IntelPython/dpnp/pull/1832)
+* Resolved an unexpected `ValueError` exception raised from `dpnp.linalg.pinv` due to a shape issue in `dpnp.matmul` [#1843](https://github.com/IntelPython/dpnp/pull/1843)
+
+
+## [0.14.0] - 02/16/2024
 
 This release will require DPC++ `2024.1.0`, which no longer supports Intel Gen9 integrated GPUs found in Intel CPUs of 10th generation and older.
 

From 6c41c4f250a495a8428094575d42c7e8cf774c64 Mon Sep 17 00:00:00 2001
From: vlad-perevezentsev <vladislav.perevezentsev@intel.com>
Date: Thu, 23 May 2024 14:23:51 +0200
Subject: [PATCH 02/49] Implement `dpnp.digitize()` (#1847)

* Implement dpnp.digitize

* Update cupy tests for digitize func

* Update skipped_tests files

* Add tests in test_sycl_queue and test_usm_type

* Return pylint disable

* Handle empty bins

* Small update cupy tests

* Add dpnp tests for dpnp.digitize

* Increase code coverage

* Apply remarks

* Move tests from test_statistic to test_histogram

* Add test with different dtypes
---
 dpnp/dpnp_iface_histograms.py                 | 95 ++++++++++++++++++-
 tests/skipped_tests.tbl                       | 55 -----------
 tests/skipped_tests_gpu.tbl                   | 55 -----------
 tests/test_histogram.py                       | 93 ++++++++++++++++++
 tests/test_sycl_queue.py                      |  1 +
 tests/test_usm_type.py                        |  1 +
 .../cupy/statistics_tests/test_histogram.py   | 15 ++-
 7 files changed, 196 insertions(+), 119 deletions(-)

diff --git a/dpnp/dpnp_iface_histograms.py b/dpnp/dpnp_iface_histograms.py
index 919c3f64b99d..1a1b4daf740d 100644
--- a/dpnp/dpnp_iface_histograms.py
+++ b/dpnp/dpnp_iface_histograms.py
@@ -46,6 +46,7 @@
 import dpnp
 
 __all__ = [
+    "digitize",
     "histogram",
     "histogram_bin_edges",
 ]
@@ -208,6 +209,98 @@ def _search_sorted_inclusive(a, v):
     )
 
 
+def digitize(x, bins, right=False):
+    """
+    Return the indices of the bins to which each value in input array belongs.
+
+    For full documentation refer to :obj:`numpy.digitize`.
+
+    Parameters
+    ----------
+    a : {dpnp.ndarray, usm_ndarray}
+        Input array to be binned.
+    bins : {dpnp.ndarray, usm_ndarray}
+        Array of bins. It has to be 1-dimensional and monotonic
+        increasing or decreasing.
+    right : bool, optional
+        Indicates whether the intervals include the right or the left bin edge.
+        Default: ``False``.
+
+    Returns
+    -------
+    indices : dpnp.ndarray
+        Array of indices with the same shape as `x`.
+
+    Notes
+    -----
+    This will not raise an exception when the input array is
+    not monotonic.
+
+    See Also
+    --------
+    :obj:`dpnp.bincount` : Count number of occurrences of each value in array
+                           of non-negative integers.
+    :obj:`dpnp.histogram` : Compute the histogram of a data set.
+    :obj:`dpnp.unique` : Find the unique elements of an array.
+    :obj:`dpnp.searchsorted` : Find indices where elements should be inserted
+                               to maintain order.
+
+    Examples
+    --------
+    >>> import dpnp as np
+    >>> x = np.array([0.2, 6.4, 3.0, 1.6])
+    >>> bins = np.array([0.0, 1.0, 2.5, 4.0, 10.0])
+    >>> inds = np.digitize(x, bins)
+    >>> inds
+    array([1, 4, 3, 2])
+    >>> for n in range(x.size):
+    ...     print(bins[inds[n]-1], "<=", x[n], "<", bins[inds[n]])
+    ...
+    0. <= 0.2 < 1.
+    4. <= 6.4 < 10.
+    2.5 <= 3. < 4.
+    1. <= 1.6 < 2.5
+
+    >>> x = np.array([1.2, 10.0, 12.4, 15.5, 20.])
+    >>> bins = np.array([0, 5, 10, 15, 20])
+    >>> np.digitize(x, bins, right=True)
+    array([1, 2, 3, 4, 4])
+    >>> np.digitize(x, bins, right=False)
+    array([1, 3, 3, 4, 5])
+
+    """
+
+    dpnp.check_supported_arrays_type(x, bins)
+
+    if dpnp.issubdtype(x.dtype, dpnp.complexfloating):
+        raise TypeError("x may not be complex")
+
+    if bins.ndim > 1:
+        raise ValueError("object too deep for desired array")
+    if bins.ndim < 1:
+        raise ValueError("object of too small depth for desired array")
+
+    # This is backwards because the arguments below are swapped
+    side = "left" if right else "right"
+
+    # Check if bins are monotonically increasing.
+    # If bins is empty, the array is considered to be increasing.
+    # If all bins are NaN, the array is considered to be decreasing.
+    if bins.size == 0:
+        bins_increasing = True
+    else:
+        bins_increasing = bins[0] <= bins[-1] or (
+            not dpnp.isnan(bins[0]) and dpnp.isnan(bins[-1])
+        )
+
+    if bins_increasing:
+        # Use dpnp.searchsorted directly if bins are increasing
+        return dpnp.searchsorted(bins, x, side=side)
+
+    # Reverse bins and adjust indices if bins are decreasing
+    return bins.size - dpnp.searchsorted(bins[::-1], x, side=side)
+
+
 def histogram(a, bins=10, range=None, density=None, weights=None):
     """
     Compute the histogram of a data set.
@@ -335,8 +428,8 @@ def histogram(a, bins=10, range=None, density=None, weights=None):
         n = dpnp.diff(cum_n)
 
     if density:
-        db = dpnp.diff(bin_edges).astype(dpnp.default_float_type())
         # pylint: disable=possibly-used-before-assignment
+        db = dpnp.diff(bin_edges).astype(dpnp.default_float_type())
         return n / db / n.sum(), bin_edges
 
     return n, bin_edges
diff --git a/tests/skipped_tests.tbl b/tests/skipped_tests.tbl
index a9cb3d095604..7fa1510e8a54 100644
--- a/tests/skipped_tests.tbl
+++ b/tests/skipped_tests.tbl
@@ -613,61 +613,6 @@ tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef_rowvar
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef_y
 
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeInvalid::test_digitize_complex
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeInvalid::test_digitize_nd_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_all_nan_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan_bins_decreasing
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan_bins_decreasing_repeated
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan_bins_repeated
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_searchsorted_inf
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_searchsorted_minf
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_all_nan_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan_bins_decreasing
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan_bins_decreasing_repeated
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan_bins_repeated
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_searchsorted_inf
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_searchsorted_minf
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_0_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_10_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_11_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_12_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_13_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_14_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_15_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_16_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_17_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_18_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_19_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_1_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_20_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_21_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_22_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_23_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_24_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_25_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_26_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_27_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_28_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_29_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_2_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_30_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_31_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_32_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_33_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_34_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_35_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_3_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_4_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_5_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_6_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_7_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_8_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_9_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=False, shape=()}::test_digitize
-
 tests/third_party/cupy/statistics_tests/test_order.py::TestOrder::test_percentile_defaults[linear]
 tests/third_party/cupy/statistics_tests/test_order.py::TestOrder::test_percentile_defaults[lower]
 tests/third_party/cupy/statistics_tests/test_order.py::TestOrder::test_percentile_defaults[higher]
diff --git a/tests/skipped_tests_gpu.tbl b/tests/skipped_tests_gpu.tbl
index fa8d00145d17..8791400846b0 100644
--- a/tests/skipped_tests_gpu.tbl
+++ b/tests/skipped_tests_gpu.tbl
@@ -619,61 +619,6 @@ tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef_rowvar
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef_y
 
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeInvalid::test_digitize_complex
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeInvalid::test_digitize_nd_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_all_nan_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan_bins_decreasing
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan_bins_decreasing_repeated
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_digitize_nan_bins_repeated
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_searchsorted_inf
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_0_{right=True}::test_searchsorted_minf
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_all_nan_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan_bins
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan_bins_decreasing
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan_bins_decreasing_repeated
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_digitize_nan_bins_repeated
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_searchsorted_inf
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitizeNanInf_param_1_{right=False}::test_searchsorted_minf
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_0_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_10_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_11_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_12_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_13_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_14_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_15_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_16_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_17_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=True, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_18_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_19_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_1_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_20_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_21_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_22_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_23_{bins=[-1.0, 1.0, 2.5, 4.0, 20.0], increasing=False, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_24_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_25_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_26_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_27_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_28_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_29_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=True, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_2_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_30_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_31_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_32_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_33_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_34_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_35_{bins=[0.0, 1.0, 1.0, 4.0, 4.0, 10.0], increasing=False, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_3_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=False, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_4_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=False, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_5_{bins=[1.5, 2.5, 4.0, 6.0], increasing=True, right=False, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_6_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=True, shape=()}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_7_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=True, shape=(10,)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_8_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=True, shape=(6, 3, 3)}::test_digitize
-tests/third_party/cupy/statistics_tests/test_histogram.py::TestDigitize_param_9_{bins=[1.5, 2.5, 4.0, 6.0], increasing=False, right=False, shape=()}::test_digitize
-
 tests/third_party/cupy/statistics_tests/test_order.py::TestOrder::test_percentile_defaults[linear]
 tests/third_party/cupy/statistics_tests/test_order.py::TestOrder::test_percentile_defaults[lower]
 tests/third_party/cupy/statistics_tests/test_order.py::TestOrder::test_percentile_defaults[higher]
diff --git a/tests/test_histogram.py b/tests/test_histogram.py
index a70f2db80443..7601d67c54a9 100644
--- a/tests/test_histogram.py
+++ b/tests/test_histogram.py
@@ -20,6 +20,99 @@
 )
 
 
+class TestDigitize:
+    @pytest.mark.parametrize(
+        "dtype", get_all_dtypes(no_bool=True, no_complex=True)
+    )
+    @pytest.mark.parametrize("right", [True, False])
+    @pytest.mark.parametrize(
+        "x, bins",
+        [
+            # Negative values
+            (
+                numpy.array([-5, -3, -1, 0, 1, 3, 5]),
+                numpy.array([-4, -2, 0, 2, 4]),
+            ),
+            # Non-uniform bins
+            (
+                numpy.array([1, 2, 3, 4, 5, 6, 7, 8, 9]),
+                numpy.array([1, 4, 6, 7]),
+            ),
+            # Infinity values
+            (
+                numpy.array([-numpy.inf, -1, 0, 1, numpy.inf]),
+                numpy.array([-2, -1, 0, 1, 2]),
+            ),
+            # Repeated elements
+            (numpy.array([1, 2, 2, 3, 3, 3, 4, 5]), numpy.array([1, 2, 3, 4])),
+        ],
+    )
+    def test_digitize(self, x, bins, dtype, right):
+        x = x.astype(dtype)
+        bins = bins.astype(dtype)
+        x_dp = dpnp.array(x)
+        bins_dp = dpnp.array(bins)
+
+        result = dpnp.digitize(x_dp, bins_dp, right=right)
+        expected = numpy.digitize(x, bins, right=right)
+        assert_dtype_allclose(result, expected)
+
+    @pytest.mark.parametrize(
+        "dtype_x", get_all_dtypes(no_bool=True, no_complex=True)
+    )
+    @pytest.mark.parametrize(
+        "dtype_bins", get_all_dtypes(no_bool=True, no_complex=True)
+    )
+    @pytest.mark.parametrize("right", [True, False])
+    def test_digitize_diff_types(self, dtype_x, dtype_bins, right):
+        x = numpy.array([1, 2, 3, 4, 5], dtype=dtype_x)
+        bins = numpy.array([1, 3, 5], dtype=dtype_bins)
+        x_dp = dpnp.array(x)
+        bins_dp = dpnp.array(bins)
+
+        result = dpnp.digitize(x_dp, bins_dp, right=right)
+        expected = numpy.digitize(x, bins, right=right)
+        assert_dtype_allclose(result, expected)
+
+    @pytest.mark.parametrize(
+        "dtype", get_all_dtypes(no_bool=True, no_complex=True)
+    )
+    @pytest.mark.parametrize(
+        "x, bins",
+        [
+            # Empty array
+            (numpy.array([]), numpy.array([1, 2, 3])),
+            # Empty bins
+            (numpy.array([1, 2, 3]), numpy.array([])),
+        ],
+    )
+    def test_digitize_empty(self, x, bins, dtype):
+        x = x.astype(dtype)
+        bins = bins.astype(dtype)
+        x_dp = dpnp.array(x)
+        bins_dp = dpnp.array(bins)
+
+        result = dpnp.digitize(x_dp, bins_dp)
+        expected = numpy.digitize(x, bins)
+        assert_dtype_allclose(result, expected)
+
+    def test_digitize_error(self):
+        x_dp = dpnp.array([1, 2, 3], dtype="float32")
+        bins_dp = dpnp.array([1, 2, 3], dtype="float32")
+
+        # unsupported type
+        x_np = dpnp.asnumpy(x_dp)
+        bins_np = dpnp.asnumpy(bins_dp)
+        with pytest.raises(TypeError):
+            dpnp.digitize(x_np, bins_dp)
+            dpnp.digitize(x_dp, bins_np)
+
+        # bins ndim < 1
+        bins_scalar = dpnp.array(1)
+        with pytest.raises(ValueError):
+            dpnp.digitize(x_dp, bins_scalar)
+
+
 class TestHistogram:
     @pytest.mark.usefixtures("suppress_complex_warning")
     @pytest.mark.parametrize(
diff --git a/tests/test_sycl_queue.py b/tests/test_sycl_queue.py
index 9286131a65b5..fae4dd52221c 100644
--- a/tests/test_sycl_queue.py
+++ b/tests/test_sycl_queue.py
@@ -606,6 +606,7 @@ def test_reduce_hypot(device):
         pytest.param("arctan2", [[-1, +1, +1, -1]], [[-1, -1, +1, +1]]),
         pytest.param("copysign", [0.0, 1.0, 2.0], [-1.0, 0.0, 1.0]),
         pytest.param("cross", [1.0, 2.0, 3.0], [4.0, 5.0, 6.0]),
+        pytest.param("digitize", [0.2, 6.4, 3.0], [0.0, 1.0, 2.5, 4.0]),
         pytest.param(
             "divide", [0.0, 1.0, 2.0, 3.0, 4.0], [4.0, 4.0, 4.0, 4.0, 4.0]
         ),
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index a2b38b82e8d0..eab59cf001b6 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -614,6 +614,7 @@ def test_1in_1out(func, data, usm_type):
         pytest.param("arctan2", [[-1, +1, +1, -1]], [[-1, -1, +1, +1]]),
         pytest.param("copysign", [0.0, 1.0, 2.0], [-1.0, 0.0, 1.0]),
         pytest.param("cross", [1.0, 2.0, 3.0], [4.0, 5.0, 6.0]),
+        pytest.param("digitize", [0.2, 6.4, 3.0], [0.0, 1.0, 2.5, 4.0]),
         # dpnp.dot has 3 different implementations based on input arrays dtype
         # checking all of them
         pytest.param("dot", [3.0, 4.0, 5.0], [1.0, 2.0, 3.0]),
diff --git a/tests/third_party/cupy/statistics_tests/test_histogram.py b/tests/third_party/cupy/statistics_tests/test_histogram.py
index bb1dd8e07ce5..18fd4a0aa555 100644
--- a/tests/third_party/cupy/statistics_tests/test_histogram.py
+++ b/tests/third_party/cupy/statistics_tests/test_histogram.py
@@ -345,7 +345,6 @@ def test_bincount_too_small_minlength(self, dtype):
 # in this comment to restore the support.
 
 
-@pytest.mark.skip("digitize() is not implemented yet")
 @testing.parameterize(
     *testing.product(
         {
@@ -367,6 +366,8 @@ class TestDigitize:
     @testing.for_all_dtypes(no_bool=True, no_complex=True)
     @testing.numpy_cupy_array_equal()
     def test_digitize(self, xp, dtype):
+        if self.shape == () and not self.increasing:
+            pytest.skip("dpctl issue #1689")
         x = testing.shaped_arange(self.shape, xp, dtype)
         bins = self.bins
         if not self.increasing:
@@ -376,7 +377,6 @@ def test_digitize(self, xp, dtype):
         return (y,)
 
 
-@pytest.mark.skip("digitize() is not implemented yet")
 @testing.parameterize({"right": True}, {"right": False})
 class TestDigitizeNanInf(unittest.TestCase):
     @testing.numpy_cupy_array_equal()
@@ -432,7 +432,7 @@ def test_digitize_all_nan_bins(self, xp):
 
     @testing.numpy_cupy_array_equal()
     def test_searchsorted_inf(self, xp):
-        x = testing.shaped_arange((14,), xp, xp.float64)
+        x = testing.shaped_arange((14,), xp, cupy.default_float_type())
         x[5] = float("inf")
         bins = xp.array([0, 1, 2, 4, 10])
         y = xp.digitize(x, bins, right=self.right)
@@ -440,25 +440,24 @@ def test_searchsorted_inf(self, xp):
 
     @testing.numpy_cupy_array_equal()
     def test_searchsorted_minf(self, xp):
-        x = testing.shaped_arange((14,), xp, xp.float64)
+        x = testing.shaped_arange((14,), xp, cupy.default_float_type())
         x[5] = float("-inf")
         bins = xp.array([0, 1, 2, 4, 10])
         y = xp.digitize(x, bins, right=self.right)
         return (y,)
 
 
-@pytest.mark.skip("digitize() is not implemented yet")
 class TestDigitizeInvalid(unittest.TestCase):
     def test_digitize_complex(self):
         for xp in (numpy, cupy):
-            x = testing.shaped_arange((14,), xp, complex)
-            bins = xp.array([1.0, 3.0, 5.0, 8.0, 12.0], complex)
+            x = testing.shaped_arange((14,), xp, xp.complex64)
+            bins = xp.array([1.0, 3.0, 5.0, 8.0, 12.0], xp.complex64)
             with pytest.raises(TypeError):
                 xp.digitize(x, bins)
 
     def test_digitize_nd_bins(self):
         for xp in (numpy, cupy):
-            x = testing.shaped_arange((14,), xp, xp.float64)
+            x = testing.shaped_arange((14,), xp, cupy.default_float_type())
             bins = xp.array([[1], [2]])
             with pytest.raises(ValueError):
                 xp.digitize(x, bins)

From 41bd6586fee2b9a68b3ba1cdb7551fb4423fef3c Mon Sep 17 00:00:00 2001
From: vlad-perevezentsev <vladislav.perevezentsev@intel.com>
Date: Thu, 23 May 2024 16:07:44 +0200
Subject: [PATCH 03/49] Update CHANGELOG.md (#1848)

---
 CHANGELOG.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9e2bc27d4e1e..1fc3b1d5078c 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -20,6 +20,7 @@ This release completes implementation of `dpnp.linalg` module and array creation
 * Added implementation of `dpnp.einsum` and `dpnp.einsum_path` functions [#1779](https://github.com/IntelPython/dpnp/pull/1779)
 * Added implementation of `dpnp.histogram` function [#1785](https://github.com/IntelPython/dpnp/pull/1785)
 * Added implementation of `dpnp.histogram_bin_edges` function [#1823](https://github.com/IntelPython/dpnp/pull/1823)
+* Added implementation of `dpnp.digitize` function [#1847](https://github.com/IntelPython/dpnp/pull/1847)
 * Extended pre-commit hooks with `pylint` configuration [#1718](https://github.com/IntelPython/dpnp/pull/1718)
 * Extended pre-commit hooks with `codespell` configuration [#1798](https://github.com/IntelPython/dpnp/pull/1798)
 * Added a Security policy page [#1730](https://github.com/IntelPython/dpnp/pull/1730)

From 71bdbe1fb187d4fd544f865f29b05e153c901628 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Thu, 23 May 2024 18:13:36 +0200
Subject: [PATCH 04/49] Start 0.16 development (#1850)

* Added stub for 0.16 release cycle

* Set CMake version to 0.16
---
 CHANGELOG.md   | 9 +++++++++
 CMakeLists.txt | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 1fc3b1d5078c..1d02f4eb8f31 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,15 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.16.0] - MM/DD/2024
+
+### Added
+
+### Change
+
+### Fixed
+
+
 ## [0.15.0] - 05/DD/2024
 
 This release completes implementation of `dpnp.linalg` module and array creation routine, adds cumulative reductions and histogram functions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 6a3c7d8c99e3..9d061b8020c6 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,6 +1,8 @@
 cmake_minimum_required(VERSION 3.21...3.27 FATAL_ERROR)
 
 project(dpnp
+  VERSION 0.16
+  LANGUAGES CXX
   DESCRIPTION "NumPy-like API accelerated by SYCL."
 )
 

From 7afd98d9f72ecdec95e9d2a466668a4a7dcd3792 Mon Sep 17 00:00:00 2001
From: vlad-perevezentsev <vladislav.perevezentsev@intel.com>
Date: Thu, 23 May 2024 20:35:22 +0200
Subject: [PATCH 05/49] Remove skip in test_digitize (#1851)

---
 tests/third_party/cupy/statistics_tests/test_histogram.py | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tests/third_party/cupy/statistics_tests/test_histogram.py b/tests/third_party/cupy/statistics_tests/test_histogram.py
index 18fd4a0aa555..521bd4062fb3 100644
--- a/tests/third_party/cupy/statistics_tests/test_histogram.py
+++ b/tests/third_party/cupy/statistics_tests/test_histogram.py
@@ -366,8 +366,6 @@ class TestDigitize:
     @testing.for_all_dtypes(no_bool=True, no_complex=True)
     @testing.numpy_cupy_array_equal()
     def test_digitize(self, xp, dtype):
-        if self.shape == () and not self.increasing:
-            pytest.skip("dpctl issue #1689")
         x = testing.shaped_arange(self.shape, xp, dtype)
         bins = self.bins
         if not self.increasing:

From d819a087992a8003dd6ef207e7e7cc7f4e841e60 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sat, 25 May 2024 18:41:16 +0200
Subject: [PATCH 06/49] Bump github/codeql-action from 3.25.5 to 3.25.6 (#1857)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.25.5 to 3.25.6.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/b7cec7526559c32f1616476ff32d17ba4c59b2d6...9fdb3e49720b44c48891d036bb502feb25684276)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
 .github/workflows/openssf-scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/openssf-scorecard.yml b/.github/workflows/openssf-scorecard.yml
index ce05a4b2acf4..726b817e2ffa 100644
--- a/.github/workflows/openssf-scorecard.yml
+++ b/.github/workflows/openssf-scorecard.yml
@@ -68,6 +68,6 @@ jobs:
 
       # Upload the results to GitHub's code scanning dashboard.
       - name: "Upload to code-scanning"
-        uses: github/codeql-action/upload-sarif@b7cec7526559c32f1616476ff32d17ba4c59b2d6 # v3.25.5
+        uses: github/codeql-action/upload-sarif@9fdb3e49720b44c48891d036bb502feb25684276 # v3.25.6
         with:
           sarif_file: results.sarif

From cb48f8de8610c72c68998d09e0d55976b8a66f32 Mon Sep 17 00:00:00 2001
From: vtavana <120411540+vtavana@users.noreply.github.com>
Date: Mon, 27 May 2024 09:02:23 -0500
Subject: [PATCH 07/49] implement gemv (#1834)

---
 dpnp/backend/extensions/blas/CMakeLists.txt   |   1 +
 dpnp/backend/extensions/blas/blas_py.cpp      |  17 +-
 dpnp/backend/extensions/blas/gemv.cpp         | 295 ++++++++++++++++++
 dpnp/backend/extensions/blas/gemv.hpp         |  62 ++++
 dpnp/backend/extensions/blas/types_matrix.hpp |  25 ++
 dpnp/dpnp_iface_linearalgebra.py              |  10 +-
 dpnp/dpnp_utils/dpnp_utils_linearalgebra.py   |  48 ++-
 tests/test_mathematical.py                    |  92 +++++-
 tests/test_product.py                         |  10 +-
 9 files changed, 521 insertions(+), 39 deletions(-)
 create mode 100644 dpnp/backend/extensions/blas/gemv.cpp
 create mode 100644 dpnp/backend/extensions/blas/gemv.hpp

diff --git a/dpnp/backend/extensions/blas/CMakeLists.txt b/dpnp/backend/extensions/blas/CMakeLists.txt
index debd412da9fa..8ef4e7d79e1b 100644
--- a/dpnp/backend/extensions/blas/CMakeLists.txt
+++ b/dpnp/backend/extensions/blas/CMakeLists.txt
@@ -29,6 +29,7 @@ set(_module_src
     ${CMAKE_CURRENT_SOURCE_DIR}/blas_py.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/gemm.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/gemm_batch.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/gemv.cpp
 )
 
 pybind11_add_module(${python_module_name} MODULE ${_module_src})
diff --git a/dpnp/backend/extensions/blas/blas_py.cpp b/dpnp/backend/extensions/blas/blas_py.cpp
index fee0c3bf6cab..3fdfebe7c301 100644
--- a/dpnp/backend/extensions/blas/blas_py.cpp
+++ b/dpnp/backend/extensions/blas/blas_py.cpp
@@ -35,17 +35,19 @@
 #include "dotc.hpp"
 #include "dotu.hpp"
 #include "gemm.hpp"
+#include "gemv.hpp"
 
 namespace blas_ext = dpnp::backend::ext::blas;
 namespace py = pybind11;
 namespace dot_ext = blas_ext::dot;
 using dot_ext::dot_impl_fn_ptr_t;
 
-// populate dispatch tables
-void init_dispatch_tables(void)
+// populate dispatch vectors and tables
+void init_dispatch_vectors_tables(void)
 {
     blas_ext::init_gemm_batch_dispatch_table();
     blas_ext::init_gemm_dispatch_table();
+    blas_ext::init_gemv_dispatch_vector();
 }
 
 static dot_impl_fn_ptr_t dot_dispatch_vector[dpctl_td_ns::num_types];
@@ -54,7 +56,7 @@ static dot_impl_fn_ptr_t dotu_dispatch_vector[dpctl_td_ns::num_types];
 
 PYBIND11_MODULE(_blas_impl, m)
 {
-    init_dispatch_tables();
+    init_dispatch_vectors_tables();
 
     using arrayT = dpctl::tensor::usm_ndarray;
     using event_vecT = std::vector<sycl::event>;
@@ -129,4 +131,13 @@ PYBIND11_MODULE(_blas_impl, m)
               py::arg("sycl_queue"), py::arg("matrixA"), py::arg("matrixB"),
               py::arg("resultC"), py::arg("depends") = py::list());
     }
+
+    {
+        m.def("_gemv", &blas_ext::gemv,
+              "Call `gemv` from OneMKL BLAS library to return "
+              "the matrix-vector product using a general matrix.",
+              py::arg("sycl_queue"), py::arg("matrixA"), py::arg("vectorX"),
+              py::arg("vectorY"), py::arg("transpose"),
+              py::arg("depends") = py::list());
+    }
 }
diff --git a/dpnp/backend/extensions/blas/gemv.cpp b/dpnp/backend/extensions/blas/gemv.cpp
new file mode 100644
index 000000000000..c325299aa030
--- /dev/null
+++ b/dpnp/backend/extensions/blas/gemv.cpp
@@ -0,0 +1,295 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <pybind11/pybind11.h>
+
+// dpctl tensor headers
+#include "utils/memory_overlap.hpp"
+#include "utils/output_validation.hpp"
+#include "utils/type_utils.hpp"
+
+#include "gemv.hpp"
+#include "types_matrix.hpp"
+
+#include "dpnp_utils.hpp"
+
+namespace dpnp
+{
+namespace backend
+{
+namespace ext
+{
+namespace blas
+{
+namespace mkl_blas = oneapi::mkl::blas;
+namespace py = pybind11;
+namespace type_utils = dpctl::tensor::type_utils;
+
+typedef sycl::event (*gemv_impl_fn_ptr_t)(sycl::queue &,
+                                          oneapi::mkl::transpose,
+                                          const std::int64_t,
+                                          const std::int64_t,
+                                          char *,
+                                          const std::int64_t,
+                                          char *,
+                                          const std::int64_t,
+                                          char *,
+                                          const std::int64_t,
+                                          bool,
+                                          const std::vector<sycl::event> &);
+
+static gemv_impl_fn_ptr_t gemv_dispatch_vector[dpctl_td_ns::num_types];
+
+template <typename T>
+static sycl::event gemv_impl(sycl::queue &exec_q,
+                             oneapi::mkl::transpose transA,
+                             const std::int64_t m,
+                             const std::int64_t n,
+                             char *matrixA,
+                             const std::int64_t lda,
+                             char *vectorX,
+                             const std::int64_t incx,
+                             char *vectorY,
+                             const std::int64_t incy,
+                             bool is_row_major,
+                             const std::vector<sycl::event> &depends)
+{
+    type_utils::validate_type_for_device<T>(exec_q);
+
+    T *a = reinterpret_cast<T *>(matrixA);
+    T *x = reinterpret_cast<T *>(vectorX);
+    T *y = reinterpret_cast<T *>(vectorY);
+
+    std::stringstream error_msg;
+    bool is_exception_caught = false;
+
+    sycl::event gemv_event;
+    try {
+        auto gemv_func =
+            [&](sycl::queue &q, oneapi::mkl::transpose transA, std::int64_t m,
+                std::int64_t n, T alpha, const T *a, std::int64_t lda,
+                const T *x, std::int64_t incx, T beta, T *y, std::int64_t incy,
+                const std::vector<sycl::event> &deps) -> sycl::event {
+            if (is_row_major) {
+                return mkl_blas::row_major::gemv(q, transA, m, n, alpha, a, lda,
+                                                 x, incx, beta, y, incy, deps);
+            }
+            else {
+                return mkl_blas::column_major::gemv(q, transA, m, n, alpha, a,
+                                                    lda, x, incx, beta, y, incy,
+                                                    deps);
+            }
+        };
+        gemv_event = gemv_func(
+            exec_q,
+            transA, // Defines the transpose operation for matrix A:
+                    // 'N' indicates no transpose, 'T' for transpose,
+                    // or 'C' for a conjugate transpose.
+            m,      // Number of rows in matrix A.
+            n,      // Number of columns in matrix A.
+            T(1),   // Scaling factor for the matrix-vector product.
+            a,      // Pointer to the input matrix A.
+            lda,    // Leading dimension of matrix A, which is the
+                    // stride between successive rows (for row major
+                    // layout).
+            x,      // Pointer to the input vector x.
+            incx,   // The stride of vector x.
+            T(0),   // Scaling factor for vector y.
+            y,      // Pointer to output vector y, where the result is stored.
+            incy,   // The stride of vector y.
+            depends);
+    } catch (oneapi::mkl::exception const &e) {
+        error_msg
+            << "Unexpected MKL exception caught during gemv() call:\nreason: "
+            << e.what();
+        is_exception_caught = true;
+    } catch (sycl::exception const &e) {
+        error_msg << "Unexpected SYCL exception caught during gemv() call:\n"
+                  << e.what();
+        is_exception_caught = true;
+    }
+
+    if (is_exception_caught) // an unexpected error occurs
+    {
+        throw std::runtime_error(error_msg.str());
+    }
+
+    return gemv_event;
+}
+
+std::pair<sycl::event, sycl::event>
+    gemv(sycl::queue &exec_q,
+         dpctl::tensor::usm_ndarray matrixA,
+         dpctl::tensor::usm_ndarray vectorX,
+         dpctl::tensor::usm_ndarray vectorY,
+         bool transpose,
+         const std::vector<sycl::event> &depends)
+{
+    const int matrixA_nd = matrixA.get_ndim();
+    const int vectorX_nd = vectorX.get_ndim();
+    const int vectorY_nd = vectorY.get_ndim();
+
+    if ((matrixA_nd != 2) || (vectorX_nd != 1) || (vectorY_nd != 1)) {
+        throw py::value_error("The arrays have incorrect dimensions.");
+    }
+
+    auto const &overlap = dpctl::tensor::overlap::MemoryOverlap();
+    if (overlap(matrixA, vectorY)) {
+        throw py::value_error("Input matrix and output vector are overlapping "
+                              "segments of memory");
+    }
+    if (overlap(vectorX, vectorY)) {
+        throw py::value_error("Input vector and output vector are overlapping "
+                              "segments of memory");
+    }
+
+    if (!dpctl::utils::queues_are_compatible(
+            exec_q,
+            {matrixA.get_queue(), vectorX.get_queue(), vectorY.get_queue()}))
+    {
+        throw py::value_error(
+            "USM allocations are not compatible with the execution queue.");
+    }
+
+    bool is_matrixA_f_contig = matrixA.is_f_contiguous();
+    bool is_matrixA_c_contig = matrixA.is_c_contiguous();
+
+    if (!is_matrixA_f_contig and !is_matrixA_c_contig) {
+        throw py::value_error(
+            "Input matrix is not c-contiguous nor f-contiguous.");
+    }
+
+    bool is_row_major = true;
+    if (is_matrixA_f_contig) {
+        is_row_major = false;
+    }
+
+    const py::ssize_t *a_shape = matrixA.get_shape_raw();
+    const py::ssize_t *x_shape = vectorX.get_shape_raw();
+    const py::ssize_t *y_shape = vectorY.get_shape_raw();
+    const std::int64_t m = a_shape[0];
+    const std::int64_t n = a_shape[1];
+    const std::int64_t lda = is_row_major ? n : m;
+
+    oneapi::mkl::transpose transA;
+    size_t src_nelems;
+    if (transpose) {
+        transA = oneapi::mkl::transpose::T;
+        src_nelems = n;
+        if (m != x_shape[0]) {
+            throw py::value_error("The number of rows in A must be equal to "
+                                  "the number of elements in X.");
+        }
+        if (n != y_shape[0]) {
+            throw py::value_error("The number of columns in A must be equal to "
+                                  "the number of elements in Y.");
+        }
+    }
+    else {
+        transA = oneapi::mkl::transpose::N;
+        src_nelems = m;
+        if (n != x_shape[0]) {
+            throw py::value_error("The number of columns in A must be equal to "
+                                  "the number of elements in X.");
+        }
+        if (m != y_shape[0]) {
+            throw py::value_error("The number of rows in A must be equal to "
+                                  "the number of elements in Y.");
+        }
+    }
+    dpctl::tensor::validation::CheckWritable::throw_if_not_writable(vectorY);
+    dpctl::tensor::validation::AmpleMemory::throw_if_not_ample(vectorY,
+                                                               src_nelems);
+
+    int matrixA_typenum = matrixA.get_typenum();
+    int vectorX_typenum = vectorX.get_typenum();
+    int vectorY_typenum = vectorY.get_typenum();
+
+    if (matrixA_typenum != vectorX_typenum ||
+        matrixA_typenum != vectorY_typenum) {
+        throw py::value_error("Given arrays must be of the same type.");
+    }
+
+    auto array_types = dpctl_td_ns::usm_ndarray_types();
+    int type_id = array_types.typenum_to_lookup_id(matrixA_typenum);
+
+    gemv_impl_fn_ptr_t gemv_fn = gemv_dispatch_vector[type_id];
+    if (gemv_fn == nullptr) {
+        throw py::value_error(
+            "Types of input arrays and result array are mismatched.");
+    }
+
+    char *a_typeless_ptr = matrixA.get_data();
+    char *x_typeless_ptr = vectorX.get_data();
+    char *y_typeless_ptr = vectorY.get_data();
+
+    std::vector<py::ssize_t> x_stride = vectorX.get_strides_vector();
+    std::vector<py::ssize_t> y_stride = vectorY.get_strides_vector();
+    const int x_elemsize = vectorX.get_elemsize();
+    const int y_elemsize = vectorY.get_elemsize();
+    const std::int64_t incx = x_stride[0];
+    const std::int64_t incy = y_stride[0];
+    if (incx < 0) {
+        x_typeless_ptr -= (x_shape[0] - 1) * std::abs(incx) * x_elemsize;
+    }
+    if (incy < 0) {
+        y_typeless_ptr -= (y_shape[0] - 1) * std::abs(incy) * y_elemsize;
+    }
+
+    sycl::event gemv_ev =
+        gemv_fn(exec_q, transA, m, n, a_typeless_ptr, lda, x_typeless_ptr, incx,
+                y_typeless_ptr, incy, is_row_major, depends);
+
+    sycl::event args_ev = dpctl::utils::keep_args_alive(
+        exec_q, {matrixA, vectorX, vectorY}, {gemv_ev});
+
+    return std::make_pair(args_ev, gemv_ev);
+}
+
+template <typename fnT, typename varT>
+struct GemvContigFactory
+{
+    fnT get()
+    {
+        if constexpr (types::GemvTypePairSupportFactory<varT>::is_defined) {
+            return gemv_impl<varT>;
+        }
+        else {
+            return nullptr;
+        }
+    }
+};
+
+void init_gemv_dispatch_vector(void)
+{
+    dpctl_td_ns::DispatchVectorBuilder<gemv_impl_fn_ptr_t, GemvContigFactory,
+                                       dpctl_td_ns::num_types>
+        contig;
+    contig.populate_dispatch_vector(gemv_dispatch_vector);
+}
+} // namespace blas
+} // namespace ext
+} // namespace backend
+} // namespace dpnp
diff --git a/dpnp/backend/extensions/blas/gemv.hpp b/dpnp/backend/extensions/blas/gemv.hpp
new file mode 100644
index 000000000000..703f9c4cc0a7
--- /dev/null
+++ b/dpnp/backend/extensions/blas/gemv.hpp
@@ -0,0 +1,62 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <CL/sycl.hpp>
+#include <oneapi/mkl.hpp>
+
+#include <dpctl4pybind11.hpp>
+
+namespace dpnp
+{
+namespace backend
+{
+namespace ext
+{
+namespace blas
+{
+extern std::pair<sycl::event, sycl::event>
+    gemv(sycl::queue &exec_q,
+         dpctl::tensor::usm_ndarray matrixA,
+         dpctl::tensor::usm_ndarray vectorX,
+         dpctl::tensor::usm_ndarray vectorY,
+         bool transpose,
+         const std::vector<sycl::event> &depends);
+
+extern std::pair<sycl::event, sycl::event>
+    gemv_batch(sycl::queue &exec_q,
+               dpctl::tensor::usm_ndarray matrixA,
+               dpctl::tensor::usm_ndarray vectorX,
+               dpctl::tensor::usm_ndarray vectorY,
+               bool transpose,
+               const std::vector<sycl::event> &depends);
+
+extern void init_gemv_dispatch_vector(void);
+extern void init_gemv_batch_dispatch_vector(void);
+} // namespace blas
+} // namespace ext
+} // namespace backend
+} // namespace dpnp
diff --git a/dpnp/backend/extensions/blas/types_matrix.hpp b/dpnp/backend/extensions/blas/types_matrix.hpp
index 2a62a0bd9176..a33fa42b9718 100644
--- a/dpnp/backend/extensions/blas/types_matrix.hpp
+++ b/dpnp/backend/extensions/blas/types_matrix.hpp
@@ -165,6 +165,31 @@ struct GemmBatchTypePairSupportFactory
         // fall-through
         dpctl_td_ns::NotDefinedEntry>::is_defined;
 };
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL BLAS library provides support in oneapi::mkl::blas::gemv<T>
+ * function.
+ *
+ * @tparam T Type of input and output arrays.
+ */
+template <typename T>
+struct GemvTypePairSupportFactory
+{
+    static constexpr bool is_defined = std::disjunction<
+        dpctl_td_ns::TypePairDefinedEntry<T, float, T, float>,
+        dpctl_td_ns::TypePairDefinedEntry<T, double, T, double>,
+        dpctl_td_ns::TypePairDefinedEntry<T,
+                                          std::complex<float>,
+                                          T,
+                                          std::complex<float>>,
+        dpctl_td_ns::TypePairDefinedEntry<T,
+                                          std::complex<double>,
+                                          T,
+                                          std::complex<double>>,
+        // fall-through
+        dpctl_td_ns::NotDefinedEntry>::is_defined;
+};
 } // namespace types
 } // namespace blas
 } // namespace ext
diff --git a/dpnp/dpnp_iface_linearalgebra.py b/dpnp/dpnp_iface_linearalgebra.py
index 4bf0d2ab524e..033929443a5c 100644
--- a/dpnp/dpnp_iface_linearalgebra.py
+++ b/dpnp/dpnp_iface_linearalgebra.py
@@ -140,19 +140,21 @@ def dot(a, b, out=None):
         # functions from BLAS here instead of dpnp.multiply
         return dpnp.multiply(a, b, out=out)
 
-    if a.ndim == 0 or b.ndim == 0:
+    a_ndim = a.ndim
+    b_ndim = b.ndim
+    if a_ndim == 0 or b_ndim == 0:
         # TODO: investigate usage of axpy (axpy_batch) or scal
         # functions from BLAS here instead of dpnp.multiply
         return dpnp.multiply(a, b, out=out)
 
-    if a.ndim == 1 and b.ndim == 1:
+    if a_ndim == 1 and b_ndim == 1:
         return dpnp_dot(a, b, out=out)
 
-    if a.ndim == 2 and b.ndim == 2:
+    if a_ndim == 2 and b_ndim == 2:
         # NumPy does not allow casting even if it is safe
         return dpnp.matmul(a, b, out=out, casting="no")
 
-    if a.ndim == 1 or b.ndim == 1:
+    if a_ndim == 1 or b_ndim == 1:
         # NumPy does not allow casting even if it is safe
         return dpnp.matmul(a, b, out=out, casting="no")
 
diff --git a/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py b/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
index 616b47483a0a..43f6cc1f3fea 100644
--- a/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
+++ b/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
@@ -726,8 +726,16 @@ def _gemm_batch_matmul(exec_q, x1, x2, res, dev_tasks_list):
     chunk = 2048 * 2048
     batch_size = res.shape[0]
     for i in range(0, batch_size, chunk):
-        x1_usm = dpnp.get_usm_ndarray(x1[i : i + chunk, ...])
-        x2_usm = dpnp.get_usm_ndarray(x2[i : i + chunk, ...])
+        if x1.shape[0] == 1:
+            # x1 is repeatedly multiplied with each matrix in x2
+            x1_usm = dpnp.get_usm_ndarray(x1)
+            x2_usm = dpnp.get_usm_ndarray(x2[i : i + chunk, ...])
+        elif x2.shape[0] == 1:
+            x1_usm = dpnp.get_usm_ndarray(x1[i : i + chunk, ...])
+            x2_usm = dpnp.get_usm_ndarray(x2)
+        else:
+            x1_usm = dpnp.get_usm_ndarray(x1[i : i + chunk, ...])
+            x2_usm = dpnp.get_usm_ndarray(x2[i : i + chunk, ...])
         res_usm = dpnp.get_usm_ndarray(res[i : i + chunk, ...])
         ht_blas_ev, _, row_major = bi._gemm_batch(
             exec_q,
@@ -2090,6 +2098,7 @@ def dpnp_matmul(
     )
 
     call_flag = None
+    transpose = False
     x1_shape = x1.shape
     x2_shape = x2.shape
     x1_is_2D, x1_is_1D, x1_base_is_1D = _define_dim_flags(x1, pos=0)
@@ -2110,19 +2119,16 @@ def dpnp_matmul(
         call_flag = "gemm_batch"
         res_shape = result_shape
     elif x1_is_1D and x2_is_2D:
-        # TODO: implement gemv to use it here with transpose
-        call_flag = "gemm"
-        x1 = dpnp.reshape(x1, (1, x1.size))
+        transpose = True
+        call_flag = "gemv"
+        x1 = dpnp.reshape(x1, x1.size)
         x2 = dpnp.reshape(x2, x2_shape[-2:])
-        x1_shape = x1.shape
-        res_shape = (x1_shape[-2], x2_shape[-1])
+        res_shape = (x2_shape[-1],)
     elif x1_is_2D and x2_is_1D:
-        # TODO: implement gemv to use it here without transpose
-        call_flag = "gemm"
+        call_flag = "gemv"
         x1 = dpnp.reshape(x1, x1_shape[-2:])
-        x2 = dpnp.reshape(x2, (x2.size, 1))
-        x2_shape = x2.shape
-        res_shape = (x1_shape[-2], x2_shape[-1])
+        x2 = dpnp.reshape(x2, x2.size)
+        res_shape = (x1_shape[-2],)
     elif x1_is_2D and x2_is_2D:
         call_flag = "gemm"
         x1 = dpnp.reshape(x1, x1_shape[-2:])
@@ -2189,7 +2195,23 @@ def dpnp_matmul(
                 dtype=compute_dtype,
             )
 
-            if call_flag == "gemm":
+            if call_flag == "gemv":
+                if transpose:
+                    a_usm = dpnp.get_usm_ndarray(x2)
+                    x_usm = dpnp.get_usm_ndarray(x1)
+                else:
+                    a_usm = dpnp.get_usm_ndarray(x1)
+                    x_usm = dpnp.get_usm_ndarray(x2)
+                ht_blas_ev, _ = bi._gemv(
+                    exec_q,
+                    a_usm,
+                    x_usm,
+                    dpnp.get_usm_ndarray(res),
+                    transpose,
+                    dep_events_list,
+                )
+                host_tasks_list.append(ht_blas_ev)
+            elif call_flag == "gemm":
                 res = _gemm_matmul(
                     exec_q,
                     x1,
diff --git a/tests/test_mathematical.py b/tests/test_mathematical.py
index 6dc5cb016888..69b590b386ce 100644
--- a/tests/test_mathematical.py
+++ b/tests/test_mathematical.py
@@ -2594,6 +2594,70 @@ def test_matmul_strided3(self, stride, transpose):
         assert result is out
         assert_dtype_allclose(result, expected)
 
+    @pytest.mark.parametrize("shape", [(8, 10)], ids=["2D"])
+    @pytest.mark.parametrize("incx", [-2, 2], ids=["-2", "2"])
+    @pytest.mark.parametrize("incy", [-2, 2], ids=["-2", "2"])
+    @pytest.mark.parametrize("transpose", [False, True], ids=["False", "True"])
+    def test_matmul_strided_mat_vec(self, shape, incx, incy, transpose):
+        if transpose:
+            s1 = shape[-2]
+            s2 = shape[-1]
+        else:
+            s1 = shape[-1]
+            s2 = shape[-2]
+        a = numpy.random.rand(*shape)
+        B = numpy.random.rand(2 * s1)
+        a_dp = dpnp.asarray(a)
+        if transpose:
+            a = numpy.moveaxis(a, (-2, -1), (-1, -2))
+            a_dp = dpnp.moveaxis(a_dp, (-2, -1), (-1, -2))
+        B_dp = dpnp.asarray(B)
+        b = B[::incx]
+        b_dp = B_dp[::incx]
+
+        result = dpnp.matmul(a_dp, b_dp)
+        expected = numpy.matmul(a, b)
+        assert_dtype_allclose(result, expected)
+
+        out_shape = shape[:-2] + (2 * s2,)
+        OUT = dpnp.empty(out_shape, dtype=result.dtype)
+        out = OUT[..., ::incy]
+        result = dpnp.matmul(a_dp, b_dp, out=out)
+        assert result is out
+        assert_dtype_allclose(result, expected)
+
+    @pytest.mark.parametrize("shape", [(8, 10)], ids=["2D"])
+    @pytest.mark.parametrize("incx", [-2, 2], ids=["-2", "2"])
+    @pytest.mark.parametrize("incy", [-2, 2], ids=["-2", "2"])
+    @pytest.mark.parametrize("transpose", [False, True], ids=["False", "True"])
+    def test_matmul_strided_vec_mat(self, shape, incx, incy, transpose):
+        if transpose:
+            s1 = shape[-2]
+            s2 = shape[-1]
+        else:
+            s1 = shape[-1]
+            s2 = shape[-2]
+        a = numpy.random.rand(*shape)
+        B = numpy.random.rand(2 * s2)
+        a_dp = dpnp.asarray(a)
+        if transpose:
+            a = numpy.moveaxis(a, (-2, -1), (-1, -2))
+            a_dp = dpnp.moveaxis(a_dp, (-2, -1), (-1, -2))
+        B_dp = dpnp.asarray(B)
+        b = B[::incx]
+        b_dp = B_dp[::incx]
+
+        result = dpnp.matmul(b_dp, a_dp)
+        expected = numpy.matmul(b, a)
+        assert_dtype_allclose(result, expected)
+
+        out_shape = shape[:-2] + (2 * s1,)
+        OUT = dpnp.empty(out_shape, dtype=result.dtype)
+        out = OUT[..., ::incy]
+        result = dpnp.matmul(b_dp, a_dp, out=out)
+        assert result is out
+        assert_dtype_allclose(result, expected)
+
     @pytest.mark.parametrize(
         "dtype", get_all_dtypes(no_none=True, no_bool=True)
     )
@@ -2631,26 +2695,24 @@ def test_matmul_out_0D(self, out_shape):
 
     @testing.slow
     @pytest.mark.parametrize(
-        "shape",
+        "shape_pair",
         [
-            ((4096, 4096, 4, 4)),
-            ((2048, 2048, 8, 8)),
+            ((4096, 4096, 2, 2), (4096, 4096, 2, 2)),
+            ((2, 2), (4096, 4096, 2, 2)),
+            ((4096, 4096, 2, 2), (2, 2)),
         ],
     )
-    def test_matmul_large(self, shape):
-        size = numpy.prod(shape, dtype=int)
-        a = numpy.array(numpy.random.uniform(-5, 5, size)).reshape(shape)
+    def test_matmul_large(self, shape_pair):
+        shape1, shape2 = shape_pair
+        size1 = numpy.prod(shape1, dtype=int)
+        size2 = numpy.prod(shape2, dtype=int)
+        a = numpy.array(numpy.random.uniform(-5, 5, size1)).reshape(shape1)
+        b = numpy.array(numpy.random.uniform(-5, 5, size2)).reshape(shape2)
         a_dp = dpnp.asarray(a)
+        b_dp = dpnp.asarray(b)
 
-        result = dpnp.matmul(a_dp, a_dp)
-        expected = numpy.matmul(a, a)
-        assert_dtype_allclose(result, expected, factor=24)
-
-        # make the 2-d base f-contiguous
-        a = a.transpose(0, 1, 3, 2)
-        a_dp = a_dp.transpose(0, 1, 3, 2)
-        result = dpnp.matmul(a_dp, a_dp)
-        expected = numpy.matmul(a, a)
+        result = dpnp.matmul(a_dp, b_dp)
+        expected = numpy.matmul(a, b)
         assert_dtype_allclose(result, expected, factor=24)
 
 
diff --git a/tests/test_product.py b/tests/test_product.py
index ae233b7d3abc..ded938bda7f1 100644
--- a/tests/test_product.py
+++ b/tests/test_product.py
@@ -244,9 +244,9 @@ def test_dot_scalar(self, dtype):
             ((10,), (10,)),
             ((4, 3), (3, 2)),
             ((4, 3), (3,)),
+            ((4,), (4, 2)),
             ((5, 4, 3), (3,)),
             ((4,), (5, 4, 3)),
-            ((4,), (4, 2)),
             ((5, 3, 4), (6, 4, 2)),
         ],
         ids=[
@@ -256,9 +256,9 @@ def test_dot_scalar(self, dtype):
             "1d_1d",
             "2d_2d",
             "2d_1d",
+            "1d_2d",
             "3d_1d",
             "1d_3d",
-            "1d_2d",
             "3d_3d",
         ],
     )
@@ -404,8 +404,9 @@ def test_dot_out_scalar(self, dtype):
             ((10,), (10,), ()),
             ((4, 3), (3, 2), (4, 2)),
             ((4, 3), (3,), (4,)),
-            ((5, 4, 3), (3,), (5, 4)),
             ((4,), (4, 2), (2,)),
+            ((5, 4, 3), (3,), (5, 4)),
+            ((4,), (5, 4, 3), (5, 3)),
             ((5, 3, 4), (6, 4, 2), (5, 3, 6, 2)),
         ],
         ids=[
@@ -415,8 +416,9 @@ def test_dot_out_scalar(self, dtype):
             "1d_1d",
             "2d_2d",
             "2d_1d",
-            "3d_1d",
             "1d_2d",
+            "3d_1d",
+            "1d_3d",
             "3d_3d",
         ],
     )

From 410cb1ba46fcfb9f53dbd5c8f444316224da4aab Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Tue, 28 May 2024 18:49:02 +0200
Subject: [PATCH 08/49] Use mamba in GitHub actions (#1858)

* Use mamba in GitHub actions

* Use rattler-build to build conda package

* Resolve spaces issue

* Use conda-build command

* Use conda activate inside retry step

* Explicitly removing defaults channel

* Update actions for Generate coverage and Build Sphinx

* Disable download speed check in mamba

* Corrected setting the env variable
---
 .github/workflows/build-sphinx.yml       | 28 +++++---
 .github/workflows/conda-package.yml      | 90 +++++++++++++++++-------
 .github/workflows/generate_coverage.yaml | 25 +++++--
 3 files changed, 102 insertions(+), 41 deletions(-)

diff --git a/.github/workflows/build-sphinx.yml b/.github/workflows/build-sphinx.yml
index 7d5cd8afc3f8..5d0372fb48da 100644
--- a/.github/workflows/build-sphinx.yml
+++ b/.github/workflows/build-sphinx.yml
@@ -99,31 +99,43 @@ jobs:
       - name: Setup miniconda
         uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3.0.4
         with:
-          auto-update-conda: true
+          miniforge-variant: Mambaforge
+          miniforge-version: latest
+          use-mamba: true
+          channels: conda-forge
           python-version: ${{ env.python-ver }}
-          miniconda-version: 'latest'
           activate-environment: 'docs'
-          channels: intel, conda-forge
+
+      # Here is an issue in conda gh-12356 causing adding defaults to the list of channels
+      # upon running `conda config --append channels conda-forge`, while mamba requires to have only conda-forge channel
+      - name: Remove defaults channel
+        run: |
+          conda config --remove channels defaults
+          conda config --show
+
+      # Sometimes `mamba install ...` fails due to slow download speed rate, so disable the check in mamba
+      - name: Disable speed limit check in mamba
+        run: echo "MAMBA_NO_LOW_SPEED_LIMIT=1" >> $GITHUB_ENV
 
       - name: Install sphinx dependencies
         run: |
-          conda install sphinx sphinx_rtd_theme
+          mamba install sphinx sphinx_rtd_theme
           pip install sphinxcontrib-googleanalytics==0.4 \
             pyenchant sphinxcontrib-spelling
 
       - name: Install dpnp dependencies
         run: |
-          conda install numpy"<1.24" dpctl">=0.17.0dev0" mkl-devel-dpcpp onedpl-devel tbb-devel dpcpp_linux-64 \
+          mamba install numpy"<1.24" dpctl">=0.17.0dev0" mkl-devel-dpcpp onedpl-devel tbb-devel dpcpp_linux-64 \
               cmake cython pytest ninja scikit-build ${{ env.CHANNELS }}
 
       - name: Install cuPy dependencies
-        run: conda install cupy cudatoolkit=10.0
+        run: mamba install cupy cudatoolkit=10.0
 
       - name: Conda info
-        run: conda info
+        run: mamba info
 
       - name: Conda list
-        run: conda list
+        run: mamba list
 
       - name: Build library
         run: python scripts/build_locally.py
diff --git a/.github/workflows/conda-package.yml b/.github/workflows/conda-package.yml
index 55ec174227ca..3b24acf77749 100644
--- a/.github/workflows/conda-package.yml
+++ b/.github/workflows/conda-package.yml
@@ -12,7 +12,7 @@ env:
   PACKAGE_NAME: dpnp
   MODULE_NAME: dpnp
   CHANNELS: '-c dppy/label/dev -c intel -c conda-forge --override-channels'
-  CONDA_BUILD_VERSION: '24.1.2'
+  CONDA_BUILD_VERSION: '24.5.0'
   CONDA_INDEX_VERSION: '0.4.0'
   TEST_ENV_NAME: 'test'
   TEST_SCOPE: >-
@@ -96,19 +96,32 @@ jobs:
       - name: Setup miniconda
         uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3.0.4
         with:
-          auto-update-conda: true
+          miniforge-variant: Mambaforge
+          miniforge-version: latest
+          use-mamba: true
+          channels: conda-forge
           python-version: ${{ matrix.python }}
-          miniconda-version: 'latest'
           activate-environment: 'build'
 
+      # Here is an issue in conda gh-12356 causing adding defaults to the list of channels
+      # upon running `conda config --append channels conda-forge`, while mamba requires to have only conda-forge channel
+      - name: Remove defaults channel
+        run: |
+          conda config --remove channels defaults
+          conda config --show
+
+      # Sometimes `mamba install ...` fails due to slow download speed rate, so disable the check in mamba
+      - name: Disable speed limit check in mamba
+        run: echo "MAMBA_NO_LOW_SPEED_LIMIT=1" >> $GITHUB_ENV
+
       - name: Store conda paths as envs
         shell: bash -l {0}
         run: |
-          echo "CONDA_BLD=$CONDA_PREFIX/conda-bld/${{ runner.os == 'Linux' && 'linux' || 'win' }}-64/" | tr "\\" '/' >> $GITHUB_ENV
+          echo "CONDA_BLD=$CONDA_PREFIX/conda-bld/${{ runner.os == 'Linux' && 'linux' || 'win' }}-64/" | tr "\\\\" '/' >> $GITHUB_ENV
           echo "WHEELS_OUTPUT_FOLDER=$GITHUB_WORKSPACE${{ runner.os == 'Linux' && '/' || '\\' }}" >> $GITHUB_ENV
 
       - name: Install conda-build
-        run: conda install conda-build=${{ env.CONDA_BUILD_VERSION}}
+        run: mamba install conda-build=${{ env.CONDA_BUILD_VERSION}}
 
       - name: Cache conda packages
         uses: actions/cache@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9 # v4.0.2
@@ -123,7 +136,7 @@ jobs:
             ${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-
 
       - name: Build conda package
-        run: conda build --no-test --python ${{ matrix.python }} --numpy 1.23 ${{ env.CHANNELS }} conda-recipe
+        run: conda build --no-test --python ${{ matrix.python }} --numpy 1.24 ${{ env.CHANNELS }} conda-recipe
 
       - name: Upload artifact
         uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4.3.3
@@ -178,13 +191,18 @@ jobs:
       - name: Setup miniconda
         uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3.0.4
         with:
-          auto-update-conda: true
+          miniforge-variant: Mambaforge
+          miniforge-version: latest
+          use-mamba: true
+          channels: conda-forge
           python-version: ${{ matrix.python }}
-          miniconda-version: 'latest'
           activate-environment: ${{ env.TEST_ENV_NAME }}
 
+      - name: Remove defaults channel
+        run: conda config --remove channels defaults
+
       - name: Install conda-index
-        run: conda install conda-index=${{ env.CONDA_INDEX_VERSION }}
+        run: mamba install conda-index=${{ env.CONDA_INDEX_VERSION }}
 
       - name: Create conda channel
         run: |
@@ -192,7 +210,7 @@ jobs:
 
       - name: Test conda channel
         run: |
-          conda search ${{ env.PACKAGE_NAME }} -c ${{ env.channel-path }} --override-channels --info --json > ${{ env.ver-json-path }}
+          mamba search ${{ env.PACKAGE_NAME }} -c ${{ env.channel-path }} --override-channels --info --json > ${{ env.ver-json-path }}
           cat ${{ env.ver-json-path }}
 
       - name: Collect dependencies
@@ -202,7 +220,7 @@ jobs:
           echo PACKAGE_VERSION=${PACKAGE_VERSION}
           echo "PACKAGE_VERSION=$PACKAGE_VERSION" >> $GITHUB_ENV
 
-          conda install ${{ env.PACKAGE_NAME }}=${PACKAGE_VERSION} python=${{ matrix.python }} ${{ env.TEST_CHANNELS }} --only-deps --dry-run > lockfile
+          mamba install ${{ env.PACKAGE_NAME }}=${PACKAGE_VERSION} python=${{ matrix.python }} ${{ env.TEST_CHANNELS }} --only-deps --dry-run > lockfile
           cat lockfile
         env:
           TEST_CHANNELS: '-c ${{ env.channel-path }} ${{ env.CHANNELS }}'
@@ -220,12 +238,13 @@ jobs:
             ${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-
 
       - name: Install dpnp
-        run: conda install ${{ env.PACKAGE_NAME }}=${{ env.PACKAGE_VERSION }} pytest python=${{ matrix.python }} ${{ env.TEST_CHANNELS }}
+        run: mamba install ${{ env.PACKAGE_NAME }}=${{ env.PACKAGE_VERSION }} pytest python=${{ matrix.python }} ${{ env.TEST_CHANNELS }}
         env:
           TEST_CHANNELS: '-c ${{ env.channel-path }} ${{ env.CHANNELS }}'
+          MAMBA_NO_LOW_SPEED_LIMIT: 1
 
       - name: List installed packages
-        run: conda list
+        run: mamba list
 
       - name: Smoke test
         run: |
@@ -302,11 +321,16 @@ jobs:
       - name: Setup miniconda
         uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3.0.4
         with:
-          auto-update-conda: true
+          miniforge-variant: Mambaforge
+          miniforge-version: latest
+          use-mamba: true
+          channels: conda-forge
           python-version: ${{ matrix.python }}
-          miniconda-version: 'latest'
           activate-environment: ${{ env.TEST_ENV_NAME }}
 
+      - name: Remove defaults channel
+        run: conda config --remove channels defaults
+
       - name: Store conda paths as envs
         run: |
           @echo on
@@ -314,7 +338,7 @@ jobs:
           (echo CONDA_LIB_BIN_PATH=%CONDA_PREFIX%\Library\bin\) >> %GITHUB_ENV%
 
       - name: Install conda-index
-        run: conda install conda-index=${{ env.CONDA_INDEX_VERSION}}
+        run: mamba install conda-index=${{ env.CONDA_INDEX_VERSION }}
 
       - name: Create conda channel
         run: |
@@ -324,7 +348,7 @@ jobs:
       - name: Test conda channel
         run: |
           @echo on
-          conda search ${{ env.PACKAGE_NAME }} -c ${{ env.channel-path }} --override-channels --info --json > ${{ env.ver-json-path }}
+          mamba search ${{ env.PACKAGE_NAME }} -c ${{ env.channel-path }} --override-channels --info --json > ${{ env.ver-json-path }}
 
       - name: Dump version.json
         run: more ${{ env.ver-json-path }}
@@ -339,7 +363,7 @@ jobs:
           echo PACKAGE_VERSION: %PACKAGE_VERSION%
           (echo PACKAGE_VERSION=%PACKAGE_VERSION%) >> %GITHUB_ENV%
 
-          conda install ${{ env.PACKAGE_NAME }}=%PACKAGE_VERSION% python=${{ matrix.python }} ${{ env.TEST_CHANNELS }} --only-deps --dry-run > lockfile
+          mamba install ${{ env.PACKAGE_NAME }}=%PACKAGE_VERSION% python=${{ matrix.python }} ${{ env.TEST_CHANNELS }} --only-deps --dry-run > lockfile
         env:
           TEST_CHANNELS: '-c ${{ env.channel-path }} ${{ env.CHANNELS }}'
 
@@ -361,12 +385,13 @@ jobs:
       - name: Install dpnp
         run: |
           @echo on
-          conda install ${{ env.PACKAGE_NAME }}=${{ env.PACKAGE_VERSION }} pytest python=${{ matrix.python }} ${{ env.TEST_CHANNELS }}
+          mamba install ${{ env.PACKAGE_NAME }}=${{ env.PACKAGE_VERSION }} pytest python=${{ matrix.python }} ${{ env.TEST_CHANNELS }}
         env:
           TEST_CHANNELS: '-c ${{ env.channel-path }} ${{ env.CHANNELS }}'
+          MAMBA_NO_LOW_SPEED_LIMIT: 1
 
       - name: List installed packages
-        run: conda list
+        run: mamba list
 
       - name: Activate OCL CPU RT
         shell: pwsh
@@ -398,7 +423,7 @@ jobs:
           max_attempts: 5
           retry_on: any
           command: >-
-            conda activate ${{ env.TEST_ENV_NAME }}
+            mamba activate ${{ env.TEST_ENV_NAME }}
             & cd ${{ env.tests-path }}
             & python -m pytest -q -ra --disable-warnings -vv ${{ env.TEST_SCOPE }}
 
@@ -438,13 +463,18 @@ jobs:
       - name: Setup miniconda
         uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3.0.4
         with:
-          auto-update-conda: true
+          miniforge-variant: Mambaforge
+          miniforge-version: latest
+          use-mamba: true
+          channels: conda-forge
           python-version: ${{ matrix.python }}
-          miniconda-version: 'latest'
           activate-environment: 'upload'
 
+      - name: Remove defaults channel
+        run: conda config --remove channels defaults
+
       - name: Install anaconda-client
-        run: conda install anaconda-client
+        run: mamba install anaconda-client
 
       - name: Package version
         run: echo "PACKAGE_VERSION=$(basename ${{ env.PACKAGE_NAME }}-*.tar.bz2 | sed 's/^${{ env.PACKAGE_NAME }}-\([^-]*\).*/\1/')" >> $GITHUB_ENV
@@ -469,13 +499,19 @@ jobs:
     steps:
       - uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3.0.4
         with:
-          run-post: false
-          channel-priority: "disabled"
+          miniforge-variant: Mambaforge
+          miniforge-version: latest
+          use-mamba: true
           channels: conda-forge
+          run-post: false
           python-version: '3.11'
+          activate-environment: 'cleanup'
+
+      - name: Remove defaults channel
+        run: conda config --remove channels defaults
 
       - name: Install anaconda-client
-        run: conda install anaconda-client
+        run: mamba install anaconda-client
 
       - name: Checkout repo
         uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
diff --git a/.github/workflows/generate_coverage.yaml b/.github/workflows/generate_coverage.yaml
index bd966a79095b..d0f65e9729b2 100644
--- a/.github/workflows/generate_coverage.yaml
+++ b/.github/workflows/generate_coverage.yaml
@@ -59,27 +59,40 @@ jobs:
       - name: Setup miniconda
         uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3.0.4
         with:
-          auto-update-conda: true
+          miniforge-variant: Mambaforge
+          miniforge-version: latest
+          use-mamba: true
+          channels: conda-forge
           python-version: ${{ env.python-ver }}
-          miniconda-version: 'latest'
           activate-environment: 'coverage'
 
+      # Here is an issue in conda gh-12356 causing adding defaults to the list of channels
+      # upon running `conda config --append channels conda-forge`, while mamba requires to have only conda-forge channel
+      - name: Remove defaults channel
+        run: |
+          conda config --remove channels defaults
+          conda config --show
+
+      # Sometimes `mamba install ...` fails due to slow download speed rate, so disable the check in mamba
+      - name: Disable speed limit check in mamba
+        run: echo "MAMBA_NO_LOW_SPEED_LIMIT=1" >> $GITHUB_ENV
+
       - name: Install dpnp dependencies
         if: env.INSTALL_ONE_API == 'yes'
         run: |
-          conda install cython llvm cmake">=3.21" scikit-build ninja pytest pytest-cov coverage[toml] \
+          mamba install cython llvm cmake">=3.21" scikit-build ninja pytest pytest-cov coverage[toml] \
               dpctl">=0.17.0dev0" onedpl-devel ${{ env.CHANNELS }}
 
       - name: Install dpnp dependencies
         if: env.INSTALL_ONE_API != 'yes'
         run: |
-          conda install cython llvm cmake">=3.21" scikit-build ninja pytest pytest-cov coverage[toml] \
+          mamba install cython llvm cmake">=3.21" scikit-build ninja pytest pytest-cov coverage[toml] \
               dpctl">=0.17.0dev0" dpcpp_linux-64 mkl-devel-dpcpp tbb-devel onedpl-devel ${{ env.CHANNELS }}
 
       - name: Conda info
         run: |
-          conda info
-          conda list
+          mamba info
+          mamba list
 
       - name: Build dpnp with coverage
         id: build_coverage

From 59be03de5d96cb49e7334d55da50a5f5517a2df2 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Wed, 29 May 2024 15:25:17 +0200
Subject: [PATCH 09/49] Enable pre-commit pylint check in fft module (#1860)

---
 .pre-commit-config.yaml    |  2 +-
 dpnp/fft/__init__.py       | 23 ++++++++++++
 dpnp/fft/dpnp_iface_fft.py | 75 +++++++++++++++++++++++++++++---------
 3 files changed, 82 insertions(+), 18 deletions(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index b996291c155e..e5e77c677681 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -100,4 +100,4 @@ repos:
             "--disable=redefined-builtin",
             "--disable=unused-wildcard-import"
             ]
-        files: '^dpnp/(dpnp_iface.*|linalg)'
+        files: '^dpnp/(dpnp_iface.*|fft|linalg)'
diff --git a/dpnp/fft/__init__.py b/dpnp/fft/__init__.py
index 1b743518d790..811e9b23ad06 100644
--- a/dpnp/fft/__init__.py
+++ b/dpnp/fft/__init__.py
@@ -24,6 +24,29 @@
 # THE POSSIBILITY OF SUCH DAMAGE.
 # *****************************************************************************
 
+"""
+``dpnp.fft``
+===========================
+Discrete Fourier Transform.
+
+Fourier analysis is fundamentally a method for expressing a function as a sum
+of periodic components, and for recovering the function from those components.
+When both the function and its Fourier transform are replaced with discretized
+counterparts, it is called the discrete Fourier transform (DFT). The DFT has
+become a mainstay of numerical computing in part because of a very fast
+algorithm for computing it, called the Fast Fourier Transform (FFT), which was
+known to Gauss (1805) and was brought to light in its current form by Cooley
+and Tukey.
+
+Because the discrete Fourier transform separates its input into components
+that contribute at discrete frequencies, it has a great number of applications
+in digital signal processing, e.g., for filtering, and in this context the
+discretized input to the transform is customarily referred to as a *signal*,
+which exists in the *time domain*. The output is called a *spectrum* or
+*transform* and exists in the *frequency domain*.
+
+"""
+
 from dpnp.fft.dpnp_iface_fft import *
 from dpnp.fft.dpnp_iface_fft import __all__ as __all__fft
 
diff --git a/dpnp/fft/dpnp_iface_fft.py b/dpnp/fft/dpnp_iface_fft.py
index f86092937015..c8064e0122ea 100644
--- a/dpnp/fft/dpnp_iface_fft.py
+++ b/dpnp/fft/dpnp_iface_fft.py
@@ -39,14 +39,23 @@
 
 """
 
+# pylint: disable=invalid-name
 
 from enum import Enum
 
 import numpy
 
 import dpnp
-from dpnp.dpnp_utils import *
-from dpnp.fft.dpnp_algo_fft import *
+
+# pylint: disable=no-name-in-module
+from dpnp.dpnp_utils import (
+    call_origin,
+    checker_throw_axis_error,
+)
+from dpnp.fft.dpnp_algo_fft import (
+    dpnp_fft,
+    dpnp_rfft,
+)
 
 __all__ = [
     "fft",
@@ -70,12 +79,16 @@
 ]
 
 
+# TODO: remove pylint disable, once new implementation is ready
+# pylint: disable=missing-class-docstring
 class Norm(Enum):
     backward = 0
     forward = 1
     ortho = 2
 
 
+# TODO: remove pylint disable, once new implementation is ready
+# pylint: disable=missing-function-docstring
 def get_validated_norm(norm):
     if norm is None or norm == "backward":
         return Norm.backward
@@ -98,8 +111,10 @@ def fft(x, n=None, axis=-1, norm=None):
     Parameter `axis` is supported with its default value.
     Only `dpnp.float64`, `dpnp.float32`, `dpnp.int64`, `dpnp.int32`,
     `dpnp.complex128`, `dpnp.complex64` data types are supported.
-    The `dpnp.bool` data type is not supported and will raise a `TypeError` exception.
+    The `dpnp.bool` data type is not supported and will raise a `TypeError`
+    exception.
     Otherwise the function will be executed sequentially on CPU.
+
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
@@ -205,12 +220,12 @@ def fftn(x, s=None, axes=None, norm=None):
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
     if x_desc:
         if s is None:
-            boundaries = tuple([x_desc.shape[i] for i in range(x_desc.ndim)])
+            boundaries = tuple(x_desc.shape[i] for i in range(x_desc.ndim))
         else:
             boundaries = s
 
         if axes is None:
-            axes_param = tuple([i for i in range(x_desc.ndim)])
+            axes_param = list(range(x_desc.ndim))
         else:
             axes_param = axes
 
@@ -256,6 +271,8 @@ def fftshift(x, axes=None):
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
+    # TODO: enable implementation
+    # pylint: disable=condition-evals-to-constant
     if x_desc and 0:
         norm_ = Norm.backward
 
@@ -267,6 +284,9 @@ def fftshift(x, axes=None):
         if x_desc.size < 1:
             pass  # let fallback to handle exception
         else:
+            input_boundarie = x_desc.shape[axis_param]
+            output_boundarie = input_boundarie
+
             return dpnp_fft(
                 x_desc,
                 input_boundarie,
@@ -281,7 +301,8 @@ def fftshift(x, axes=None):
 
 def hfft(x, n=None, axis=-1, norm=None):
     """
-    Compute the one-dimensional discrete Fourier Transform of a signal that has Hermitian symmetry.
+    Compute the one-dimensional discrete Fourier Transform of a signal that has
+    Hermitian symmetry.
 
     For full documentation refer to :obj:`numpy.fft.hfft`.
 
@@ -296,6 +317,8 @@ def hfft(x, n=None, axis=-1, norm=None):
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
+    # TODO: enable implementation
+    # pylint: disable=condition-evals-to-constant
     if x_desc and 0:
         norm_ = get_validated_norm(norm)
 
@@ -342,7 +365,8 @@ def ifft(x, n=None, axis=-1, norm=None):
     Parameter `axis` is supported with its default value.
     Only `dpnp.float64`, `dpnp.float32`, `dpnp.int64`, `dpnp.int32`,,
     `dpnp.complex128`, `dpnp.complex64` data types are supported.
-    The `dpnp.bool` data type is not supported and will raise a `TypeError` exception.
+    The `dpnp.bool` data type is not supported and will raise a `TypeError`
+    exception.
     Otherwise the function will be executed sequentially on CPU.
 
     """
@@ -430,6 +454,8 @@ def ifftshift(x, axes=None):
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
+    # TODO: enable implementation
+    # pylint: disable=condition-evals-to-constant
     if x_desc and 0:
         norm_ = Norm.backward
 
@@ -478,14 +504,16 @@ def ifftn(x, s=None, axes=None, norm=None):
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
+    # TODO: enable implementation
+    # pylint: disable=condition-evals-to-constant
     if x_desc and 0:
         if s is None:
-            boundaries = tuple([x_desc.shape[i] for i in range(x_desc.ndim)])
+            boundaries = tuple(x_desc.shape[i] for i in range(x_desc.ndim))
         else:
             boundaries = s
 
         if axes is None:
-            axes_param = tuple([i for i in range(x_desc.ndim)])
+            axes_param = list(range(x_desc.ndim))
         else:
             axes_param = axes
 
@@ -522,7 +550,8 @@ def ifftn(x, s=None, axes=None, norm=None):
 
 def ihfft(x, n=None, axis=-1, norm=None):
     """
-    Compute inverse one-dimensional discrete Fourier Transform of a signal that has Hermitian symmetry.
+    Compute inverse one-dimensional discrete Fourier Transform of a signal that
+    has Hermitian symmetry.
 
     For full documentation refer to :obj:`numpy.fft.ihfft`.
 
@@ -537,6 +566,8 @@ def ihfft(x, n=None, axis=-1, norm=None):
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
+    # TODO: enable implementation
+    # pylint: disable=condition-evals-to-constant
     if x_desc and 0:
         norm_ = get_validated_norm(norm)
 
@@ -575,7 +606,8 @@ def ihfft(x, n=None, axis=-1, norm=None):
 
 def irfft(x, n=None, axis=-1, norm=None):
     """
-    Compute the one-dimensional inverse discrete Fourier Transform for real input.
+    Compute the one-dimensional inverse discrete Fourier Transform for real
+    input.
 
     For full documentation refer to :obj:`numpy.fft.irfft`.
 
@@ -590,6 +622,8 @@ def irfft(x, n=None, axis=-1, norm=None):
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
+    # TODO: enable implementation
+    # pylint: disable=condition-evals-to-constant
     if x_desc and 0:
         norm_ = get_validated_norm(norm)
 
@@ -622,7 +656,8 @@ def irfft(x, n=None, axis=-1, norm=None):
                 True,
                 norm_.value,
             ).get_pyobj()
-            # TODO tmp = utils.create_output_array(result_shape, result_c_type, out)
+            # TODO:
+            # tmp = utils.create_output_array(result_shape, result_c_type, out)
             # tmp = dparray(result.shape, dtype=dpnp.float64)
             # for it in range(tmp.size):
             #     tmp[it] = result[it].real
@@ -678,14 +713,16 @@ def irfftn(x, s=None, axes=None, norm=None):
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
+    # TODO: enable implementation
+    # pylint: disable=condition-evals-to-constant
     if x_desc and 0:
         if s is None:
-            boundaries = tuple([x_desc.shape[i] for i in range(x_desc.ndim)])
+            boundaries = tuple(x_desc.shape[i] for i in range(x_desc.ndim))
         else:
             boundaries = s
 
         if axes is None:
-            axes_param = tuple([i for i in range(x_desc.ndim)])
+            axes_param = list(range(x_desc.ndim))
         else:
             axes_param = axes
 
@@ -732,8 +769,10 @@ def rfft(x, n=None, axis=-1, norm=None):
     Parameter `norm` is unsupported.
     Only `dpnp.float64`, `dpnp.float32`, `dpnp.int64`, `dpnp.int32`,
     `dpnp.complex128` data types are supported.
-    The `dpnp.bool` data type is not supported and will raise a `TypeError` exception.
+    The `dpnp.bool` data type is not supported and will raise a `TypeError`
+    exception.
     Otherwise the function will be executed sequentially on CPU.
+
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
@@ -844,14 +883,16 @@ def rfftn(x, s=None, axes=None, norm=None):
     """
 
     x_desc = dpnp.get_dpnp_descriptor(x, copy_when_nondefault_queue=False)
+    # TODO: enable implementation
+    # pylint: disable=condition-evals-to-constant
     if x_desc and 0:
         if s is None:
-            boundaries = tuple([x_desc.shape[i] for i in range(x_desc.ndim)])
+            boundaries = tuple(x_desc.shape[i] for i in range(x_desc.ndim))
         else:
             boundaries = s
 
         if axes is None:
-            axes_param = tuple([i for i in range(x_desc.ndim)])
+            axes_param = list(range(x_desc.ndim))
         else:
             axes_param = axes
 

From 841664c9fa0f46df227020bb6192e58c51ac404d Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Fri, 31 May 2024 11:33:59 +0200
Subject: [PATCH 10/49] Implement `dpnp.gradient` function (#1859)

* Implement dpnp.gradient function

* Resolve pre-commit issues

* Update dpnp/dpnp_iface_mathematical.py

Co-authored-by: vtavana <120411540+vtavana@users.noreply.github.com>

* Update dpnp/dpnp_iface_mathematical.py

Co-authored-by: vtavana <120411540+vtavana@users.noreply.github.com>

---------

Co-authored-by: vtavana <120411540+vtavana@users.noreply.github.com>
---
 dpnp/dpnp_algo/dpnp_algo_mathematical.pxi     |  31 --
 dpnp/dpnp_iface_mathematical.py               | 372 ++++++++++++++--
 tests/skipped_tests_gpu_no_fp64.tbl           |   4 -
 tests/test_mathematical.py                    | 421 +++++++++++++++---
 tests/test_sycl_queue.py                      |   7 +-
 tests/test_usm_type.py                        |   4 +
 .../cupy/math_tests/test_sumprod.py           |  13 +-
 7 files changed, 734 insertions(+), 118 deletions(-)

diff --git a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
index f2111e4e6710..2b8d63c6d2dd 100644
--- a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
+++ b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
@@ -39,7 +39,6 @@ __all__ += [
     "dpnp_ediff1d",
     "dpnp_fabs",
     "dpnp_fmod",
-    "dpnp_gradient",
     "dpnp_fmax",
     "dpnp_fmin",
     "dpnp_modf",
@@ -123,36 +122,6 @@ cpdef utils.dpnp_descriptor dpnp_fmod(utils.dpnp_descriptor x1_obj,
     return call_fptr_2in_1out_strides(DPNP_FN_FMOD_EXT, x1_obj, x2_obj, dtype, out, where)
 
 
-cpdef utils.dpnp_descriptor dpnp_gradient(utils.dpnp_descriptor y1, int dx=1):
-
-    cdef size_t size = y1.size
-
-    y1_obj = y1.get_array()
-
-    # create result array with type given by FPTR data
-    cdef shape_type_c result_shape = utils._object_to_tuple(size)
-    cdef utils.dpnp_descriptor result = utils_py.create_output_descriptor_py(result_shape,
-                                                                             dpnp.default_float_type(y1_obj.sycl_queue),
-                                                                             None,
-                                                                             device=y1_obj.sycl_device,
-                                                                             usm_type=y1_obj.usm_type,
-                                                                             sycl_queue=y1_obj.sycl_queue)
-
-    cdef double cur = (y1.get_pyobj()[1] - y1.get_pyobj()[0]) / dx
-
-    result.get_pyobj().flat[0] = cur
-
-    cur = (y1.get_pyobj()[-1] - y1.get_pyobj()[-2]) / dx
-
-    result.get_pyobj().flat[size - 1] = cur
-
-    for i in range(1, size - 1):
-        cur = (y1.get_pyobj()[i + 1] - y1.get_pyobj()[i - 1]) / (2 * dx)
-        result.get_pyobj().flat[i] = cur
-
-    return result
-
-
 cpdef utils.dpnp_descriptor dpnp_fmax(utils.dpnp_descriptor x1_obj,
                                          utils.dpnp_descriptor x2_obj,
                                          object dtype=None,
diff --git a/dpnp/dpnp_iface_mathematical.py b/dpnp/dpnp_iface_mathematical.py
index 716696cacbdc..b0d0c7b61237 100644
--- a/dpnp/dpnp_iface_mathematical.py
+++ b/dpnp/dpnp_iface_mathematical.py
@@ -46,6 +46,7 @@
 import dpctl.tensor as dpt
 import dpctl.tensor._tensor_elementwise_impl as ti
 import dpctl.tensor._type_utils as dtu
+import dpctl.utils as dpu
 import numpy
 from dpctl.tensor._type_utils import _acceptance_fn_divide
 from numpy.core.numeric import (
@@ -63,7 +64,6 @@
     dpnp_fmax,
     dpnp_fmin,
     dpnp_fmod,
-    dpnp_gradient,
     dpnp_modf,
     dpnp_trapz,
 )
@@ -168,6 +168,169 @@ def _get_reduction_res_dt(a, dtype, _out):
     return dtu._to_device_supported_dtype(dtype, a.sycl_device)
 
 
+def _gradient_build_dx(f, axes, *varargs):
+    """Build an array with distance per each dimension."""
+
+    len_axes = len(axes)
+    n = len(varargs)
+    if n == 0:
+        # no spacing argument - use 1 in all axes
+        dx = [1.0] * len_axes
+    elif n == 1 and numpy.ndim(varargs[0]) == 0:
+        dpnp.check_supported_arrays_type(
+            varargs[0], scalar_type=True, all_scalars=True
+        )
+
+        # single scalar for all axes
+        dx = varargs * len_axes
+    elif n == len_axes:
+        # scalar or 1d array for each axis
+        dx = list(varargs)
+        for i, distances in enumerate(dx):
+            dpnp.check_supported_arrays_type(
+                distances, scalar_type=True, all_scalars=True
+            )
+
+            if numpy.ndim(distances) == 0:
+                continue
+            if distances.ndim != 1:
+                raise ValueError("distances must be either scalars or 1d")
+
+            if len(distances) != f.shape[axes[i]]:
+                raise ValueError(
+                    "when 1d, distances must match "
+                    "the length of the corresponding dimension"
+                )
+
+            if dpnp.issubdtype(distances.dtype, dpnp.integer):
+                # Convert integer types to default float type to avoid modular
+                # arithmetic in dpnp.diff(distances).
+                distances = distances.astype(dpnp.default_float_type())
+            diffx = dpnp.diff(distances)
+
+            # if distances are constant reduce to the scalar case
+            # since it brings a consistent speedup
+            if (diffx == diffx[0]).all():
+                diffx = diffx[0]
+            dx[i] = diffx
+    else:
+        raise TypeError("invalid number of arguments")
+    return dx
+
+
+def _gradient_num_diff_2nd_order_interior(
+    f, ax_dx, out, slices, axis, uniform_spacing
+):
+    """Numerical differentiation: 2nd order interior."""
+
+    slice1, slice2, slice3, slice4 = slices
+    ndim = f.ndim
+
+    slice1[axis] = slice(1, -1)
+    slice2[axis] = slice(None, -2)
+    slice3[axis] = slice(1, -1)
+    slice4[axis] = slice(2, None)
+
+    if uniform_spacing:
+        out[tuple(slice1)] = (f[tuple(slice4)] - f[tuple(slice2)]) / (
+            2.0 * ax_dx
+        )
+    else:
+        dx1 = ax_dx[0:-1]
+        dx2 = ax_dx[1:]
+        a = -(dx2) / (dx1 * (dx1 + dx2))
+        b = (dx2 - dx1) / (dx1 * dx2)
+        c = dx1 / (dx2 * (dx1 + dx2))
+
+        # fix the shape for broadcasting
+        shape = [1] * ndim
+        shape[axis] = -1
+        # TODO: use shape.setter once dpctl#1699 is resolved
+        # a.shape = b.shape = c.shape = shape
+        a = a.reshape(shape)
+        b = b.reshape(shape)
+        c = c.reshape(shape)
+
+        # 1D equivalent -- out[1:-1] = a * f[:-2] + b * f[1:-1] + c * f[2:]
+        t1 = a * f[tuple(slice2)]
+        t2 = b * f[tuple(slice3)]
+        t3 = c * f[tuple(slice4)]
+        t4 = t1 + t2 + t3
+
+        out[tuple(slice1)] = t4
+        out[tuple(slice1)] = (
+            a * f[tuple(slice2)] + b * f[tuple(slice3)] + c * f[tuple(slice4)]
+        )
+
+
+def _gradient_num_diff_edges(
+    f, ax_dx, out, slices, axis, uniform_spacing, edge_order
+):
+    """Numerical differentiation: 1st and 2nd order edges."""
+
+    slice1, slice2, slice3, slice4 = slices
+
+    # Numerical differentiation: 1st order edges
+    if edge_order == 1:
+        slice1[axis] = 0
+        slice2[axis] = 1
+        slice3[axis] = 0
+        dx_0 = ax_dx if uniform_spacing else ax_dx[0]
+
+        # 1D equivalent -- out[0] = (f[1] - f[0]) / (x[1] - x[0])
+        out[tuple(slice1)] = (f[tuple(slice2)] - f[tuple(slice3)]) / dx_0
+
+        slice1[axis] = -1
+        slice2[axis] = -1
+        slice3[axis] = -2
+        dx_n = ax_dx if uniform_spacing else ax_dx[-1]
+
+        # 1D equivalent -- out[-1] = (f[-1] - f[-2]) / (x[-1] - x[-2])
+        out[tuple(slice1)] = (f[tuple(slice2)] - f[tuple(slice3)]) / dx_n
+
+    # Numerical differentiation: 2nd order edges
+    else:
+        slice1[axis] = 0
+        slice2[axis] = 0
+        slice3[axis] = 1
+        slice4[axis] = 2
+        if uniform_spacing:
+            a = -1.5 / ax_dx
+            b = 2.0 / ax_dx
+            c = -0.5 / ax_dx
+        else:
+            dx1 = ax_dx[0]
+            dx2 = ax_dx[1]
+            a = -(2.0 * dx1 + dx2) / (dx1 * (dx1 + dx2))
+            b = (dx1 + dx2) / (dx1 * dx2)
+            c = -dx1 / (dx2 * (dx1 + dx2))
+
+        # 1D equivalent -- out[0] = a * f[0] + b * f[1] + c * f[2]
+        out[tuple(slice1)] = (
+            a * f[tuple(slice2)] + b * f[tuple(slice3)] + c * f[tuple(slice4)]
+        )
+
+        slice1[axis] = -1
+        slice2[axis] = -3
+        slice3[axis] = -2
+        slice4[axis] = -1
+        if uniform_spacing:
+            a = 0.5 / ax_dx
+            b = -2.0 / ax_dx
+            c = 1.5 / ax_dx
+        else:
+            dx1 = ax_dx[-2]
+            dx2 = ax_dx[-1]
+            a = (dx2) / (dx1 * (dx1 + dx2))
+            b = -(dx2 + dx1) / (dx1 * dx2)
+            c = (2.0 * dx2 + dx1) / (dx2 * (dx1 + dx2))
+
+        # 1D equivalent -- out[-1] = a * f[-3] + b * f[-2] + c * f[-1]
+        out[tuple(slice1)] = (
+            a * f[tuple(slice2)] + b * f[tuple(slice3)] + c * f[tuple(slice4)]
+        )
+
+
 _ABS_DOCSTRING = """
 Calculates the absolute value for each element `x_i` of input array `x`.
 
@@ -1682,51 +1845,206 @@ def fmod(x1, x2, /, out=None, *, where=True, dtype=None, subok=True, **kwargs):
     )
 
 
-def gradient(x1, *varargs, **kwargs):
+def gradient(f, *varargs, axis=None, edge_order=1):
     """
-    Return the gradient of an array.
+    Return the gradient of an N-dimensional array.
+
+    The gradient is computed using second order accurate central differences
+    in the interior points and either first or second order accurate one-sides
+    (forward or backwards) differences at the boundaries.
+    The returned gradient hence has the same shape as the input array.
 
     For full documentation refer to :obj:`numpy.gradient`.
 
-    Limitations
-    -----------
-    Parameter `y1` is supported as :class:`dpnp.ndarray`.
-    Argument `varargs[0]` is supported as `int`.
-    Keyword argument `kwargs` is currently unsupported.
-    Otherwise the function will be executed sequentially on CPU.
-    Input array data types are limited by supported DPNP :ref:`Data types`.
+    Parameters
+    ----------
+    f : {dpnp.ndarray, usm_ndarray}
+        An N-dimensional array containing samples of a scalar function.
+    varargs : {scalar, list of scalars, list of arrays}, optional
+        Spacing between `f` values. Default unitary spacing for all dimensions.
+        Spacing can be specified using:
+
+        1. Single scalar to specify a sample distance for all dimensions.
+        2. N scalars to specify a constant sample distance for each dimension.
+           i.e. `dx`, `dy`, `dz`, ...
+        3. N arrays to specify the coordinates of the values along each
+           dimension of `f`. The length of the array must match the size of
+           the corresponding dimension
+        4. Any combination of N scalars/arrays with the meaning of 2. and 3.
+
+        If `axis` is given, the number of `varargs` must equal the number of
+        axes.
+        Default: ``1``.
+    axis : {None, int, tuple of ints}, optional
+        Gradient is calculated only along the given axis or axes.
+        The default is to calculate the gradient for all the axes of the input
+        array. `axis` may be negative, in which case it counts from the last to
+        the first axis.
+        Default: ``None``.
+    edge_order : {1, 2}, optional
+        Gradient is calculated using N-th order accurate differences
+        at the boundaries.
+        Default: ``1``.
+
+    Returns
+    -------
+    gradient : {dpnp.ndarray, list of ndarray}
+        A list of :class:`dpnp.ndarray` (or a single :class:`dpnp.ndarray` if
+        there is only one dimension) corresponding to the derivatives of `f`
+        with respect to each dimension.
+        Each derivative has the same shape as `f`.
 
     See Also
     --------
     :obj:`dpnp.diff` : Calculate the n-th discrete difference along the given
                        axis.
+    :obj:`dpnp.ediff1d` : Calculate the differences between consecutive
+                          elements of an array.
 
     Examples
     --------
     >>> import dpnp as np
-    >>> y = np.array([1, 2, 4, 7, 11, 16], dtype=float)
-    >>> result = np.gradient(y)
-    >>> [x for x in result]
-    [1.0, 1.5, 2.5, 3.5, 4.5, 5.0]
-    >>> result = np.gradient(y, 2)
-    >>> [x for x in result]
-    [0.5, 0.75, 1.25, 1.75, 2.25, 2.5]
+    >>> f = np.array([1, 2, 4, 7, 11, 16], dtype=float)
+    >>> np.gradient(f)
+    array([1. , 1.5, 2.5, 3.5, 4.5, 5. ])
+    >>> np.gradient(f, 2)
+    array([0.5 , 0.75, 1.25, 1.75, 2.25, 2.5 ])
+
+    Spacing can be also specified with an array that represents the coordinates
+    of the values `f` along the dimensions.
+    For instance a uniform spacing:
+
+    >>> x = np.arange(f.size)
+    >>> np.gradient(f, x)
+    array([1. , 1.5, 2.5, 3.5, 4.5, 5. ])
+
+    Or a non uniform one:
+
+    >>> x = np.array([0., 1., 1.5, 3.5, 4., 6.], dtype=float)
+    >>> np.gradient(f, x)
+    array([1. , 3. , 3.5, 6.7, 6.9, 2.5])
+
+    For two dimensional arrays, the return will be two arrays ordered by
+    axis. In this example the first array stands for the gradient in
+    rows and the second one in columns direction:
+
+    >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=float))
+    (array([[ 2.,  2., -1.],
+            [ 2.,  2., -1.]]),
+     array([[1. , 2.5, 4. ],
+            [1. , 1. , 1. ]]))
+
+    In this example the spacing is also specified:
+    uniform for axis=0 and non uniform for axis=1
+
+    >>> dx = 2.
+    >>> y = np.array([1., 1.5, 3.5])
+    >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=float), dx, y)
+    (array([[ 1. ,  1. , -0.5],
+            [ 1. ,  1. , -0.5]]),
+     array([[2. , 2. , 2. ],
+            [2. , 1.7, 0.5]]))
+
+    It is possible to specify how boundaries are treated using `edge_order`
+
+    >>> x = np.array([0, 1, 2, 3, 4])
+    >>> f = x**2
+    >>> np.gradient(f, edge_order=1)
+    array([1., 2., 4., 6., 7.])
+    >>> np.gradient(f, edge_order=2)
+    array([0., 2., 4., 6., 8.])
+
+    The `axis` keyword can be used to specify a subset of axes of which the
+    gradient is calculated
+
+    >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=float), axis=0)
+    array([[ 2.,  2., -1.],
+           [ 2.,  2., -1.]])
 
     """
 
-    x1_desc = dpnp.get_dpnp_descriptor(x1, copy_when_nondefault_queue=False)
-    if x1_desc and not kwargs:
-        if len(varargs) > 1:
-            pass
-        elif len(varargs) == 1 and not isinstance(varargs[0], int):
-            pass
+    dpnp.check_supported_arrays_type(f)
+    ndim = f.ndim  # number of dimensions
+
+    if axis is None:
+        axes = tuple(range(ndim))
+    else:
+        axes = normalize_axis_tuple(axis, ndim)
+
+    dx = _gradient_build_dx(f, axes, *varargs)
+    if edge_order > 2:
+        raise ValueError("'edge_order' greater than 2 not supported")
+
+    # Use central differences on interior and one-sided differences on the
+    # endpoints. This preserves second order-accuracy over the full domain.
+    outvals = []
+
+    # create slice objects --- initially all are [:, :, ..., :]
+    slice1 = [slice(None)] * ndim
+    slice2 = [slice(None)] * ndim
+    slice3 = [slice(None)] * ndim
+    slice4 = [slice(None)] * ndim
+
+    otype = f.dtype
+    if dpnp.issubdtype(otype, dpnp.inexact):
+        pass
+    else:
+        # All other types convert to floating point.
+        # First check if f is a dpnp integer type; if so, convert f to default
+        # float type to avoid modular arithmetic when computing changes in f.
+        if dpnp.issubdtype(otype, dpnp.integer):
+            f = f.astype(dpnp.default_float_type())
+        otype = dpnp.default_float_type()
+
+    for axis_, ax_dx in zip(axes, dx):
+        if f.shape[axis_] < edge_order + 1:
+            raise ValueError(
+                "Shape of array too small to calculate a numerical gradient, "
+                "at least (edge_order + 1) elements are required."
+            )
+
+        # result allocation
+        if dpnp.isscalar(ax_dx):
+            usm_type = f.usm_type
         else:
-            if len(varargs) == 0:
-                return dpnp_gradient(x1_desc).get_pyobj()
+            usm_type = dpu.get_coerced_usm_type([f.usm_type, ax_dx.usm_type])
+        out = dpnp.empty_like(f, dtype=otype, usm_type=usm_type)
+
+        # spacing for the current axis
+        uniform_spacing = numpy.ndim(ax_dx) == 0
+
+        # Numerical differentiation: 2nd order interior
+        _gradient_num_diff_2nd_order_interior(
+            f,
+            ax_dx,
+            out,
+            (slice1, slice2, slice3, slice4),
+            axis_,
+            uniform_spacing,
+        )
+
+        # Numerical differentiation: 1st and 2nd order edges
+        _gradient_num_diff_edges(
+            f,
+            ax_dx,
+            out,
+            (slice1, slice2, slice3, slice4),
+            axis_,
+            uniform_spacing,
+            edge_order,
+        )
+
+        outvals.append(out)
 
-            return dpnp_gradient(x1_desc, varargs[0]).get_pyobj()
+        # reset the slice object in this dimension to ":"
+        slice1[axis_] = slice(None)
+        slice2[axis_] = slice(None)
+        slice3[axis_] = slice(None)
+        slice4[axis_] = slice(None)
 
-    return call_origin(numpy.gradient, x1, *varargs, **kwargs)
+    if len(axes) == 1:
+        return outvals[0]
+    return tuple(outvals)
 
 
 _IMAG_DOCSTRING = """
diff --git a/tests/skipped_tests_gpu_no_fp64.tbl b/tests/skipped_tests_gpu_no_fp64.tbl
index 7a999c996178..c209c876df6b 100644
--- a/tests/skipped_tests_gpu_no_fp64.tbl
+++ b/tests/skipped_tests_gpu_no_fp64.tbl
@@ -1,7 +1,3 @@
-tests/test_mathematical.py::TestGradient::test_gradient_y1_dx[3.5-array0]
-tests/test_mathematical.py::TestGradient::test_gradient_y1_dx[3.5-array1]
-tests/test_mathematical.py::TestGradient::test_gradient_y1_dx[3.5-array2]
-
 tests/test_strides.py::test_strides_1arg[(10,)-int32-fabs]
 tests/test_strides.py::test_strides_1arg[(10,)-int64-fabs]
 tests/test_strides.py::test_strides_1arg[(10,)-None-fabs]
diff --git a/tests/test_mathematical.py b/tests/test_mathematical.py
index 69b590b386ce..4a86cdc081ed 100644
--- a/tests/test_mathematical.py
+++ b/tests/test_mathematical.py
@@ -9,6 +9,7 @@
     assert_array_equal,
     assert_equal,
     assert_raises,
+    assert_raises_regex,
 )
 
 import dpnp
@@ -23,7 +24,6 @@
     get_float_dtypes,
     get_integer_dtypes,
     has_support_aspect64,
-    is_cpu_device,
 )
 from .test_umath import (
     _get_numpy_arrays_1in_1out,
@@ -73,6 +73,35 @@ def test_angle_complex(self, dtype, deg):
         assert_dtype_allclose(result, expected)
 
 
+@pytest.mark.usefixtures("allow_fall_back_on_numpy")
+class TestConvolve:
+    def test_object(self):
+        d = [1.0] * 100
+        k = [1.0] * 3
+        assert_array_almost_equal(dpnp.convolve(d, k)[2:-2], dpnp.full(98, 3))
+
+    def test_no_overwrite(self):
+        d = dpnp.ones(100)
+        k = dpnp.ones(3)
+        dpnp.convolve(d, k)
+        assert_array_equal(d, dpnp.ones(100))
+        assert_array_equal(k, dpnp.ones(3))
+
+    def test_mode(self):
+        d = dpnp.ones(100)
+        k = dpnp.ones(3)
+        default_mode = dpnp.convolve(d, k, mode="full")
+        full_mode = dpnp.convolve(d, k, mode="f")
+        assert_array_equal(full_mode, default_mode)
+        # integer mode
+        with assert_raises(ValueError):
+            dpnp.convolve(d, k, mode=-1)
+        assert_array_equal(dpnp.convolve(d, k, mode=2), full_mode)
+        # illegal arguments
+        with assert_raises(TypeError):
+            dpnp.convolve(d, k, mode=None)
+
+
 class TestClip:
     @pytest.mark.parametrize(
         "dtype", get_all_dtypes(no_bool=True, no_none=True, no_complex=True)
@@ -582,33 +611,347 @@ def test_prepend_append_axis_error(self, xp):
         assert_raises(numpy.AxisError, xp.diff, a, axis=3, append=0)
 
 
-@pytest.mark.usefixtures("allow_fall_back_on_numpy")
-class TestConvolve:
-    def test_object(self):
-        d = [1.0] * 100
-        k = [1.0] * 3
-        assert_array_almost_equal(dpnp.convolve(d, k)[2:-2], dpnp.full(98, 3))
+class TestGradient:
+    @pytest.mark.parametrize("dt", get_all_dtypes(no_none=True, no_bool=True))
+    def test_basic(self, dt):
+        x = numpy.array([[1, 1], [3, 4]], dtype=dt)
+        ix = dpnp.array(x)
 
-    def test_no_overwrite(self):
-        d = dpnp.ones(100)
-        k = dpnp.ones(3)
-        dpnp.convolve(d, k)
-        assert_array_equal(d, dpnp.ones(100))
-        assert_array_equal(k, dpnp.ones(3))
+        expected = numpy.gradient(x)
+        result = dpnp.gradient(ix)
+        assert_array_equal(result, expected)
 
-    def test_mode(self):
-        d = dpnp.ones(100)
-        k = dpnp.ones(3)
-        default_mode = dpnp.convolve(d, k, mode="full")
-        full_mode = dpnp.convolve(d, k, mode="f")
-        assert_array_equal(full_mode, default_mode)
-        # integer mode
-        with assert_raises(ValueError):
-            dpnp.convolve(d, k, mode=-1)
-        assert_array_equal(dpnp.convolve(d, k, mode=2), full_mode)
-        # illegal arguments
-        with assert_raises(TypeError):
-            dpnp.convolve(d, k, mode=None)
+    @pytest.mark.parametrize(
+        "args",
+        [3.0, numpy.array(3.0), numpy.cumsum(numpy.ones(5))],
+        ids=["scalar", "array", "cumsum"],
+    )
+    @pytest.mark.parametrize("dt", get_all_dtypes(no_none=True, no_bool=True))
+    def test_args_1d(self, args, dt):
+        x = numpy.arange(5, dtype=dt)
+        ix = dpnp.array(x)
+
+        if numpy.isscalar(args):
+            iargs = args
+        else:
+            iargs = dpnp.array(args)
+
+        expected = numpy.gradient(x, args)
+        result = dpnp.gradient(ix, iargs)
+        assert_dtype_allclose(result, expected)
+
+    @pytest.mark.parametrize(
+        "args", [1.5, numpy.array(1.5)], ids=["scalar", "array"]
+    )
+    @pytest.mark.parametrize("dt", get_all_dtypes(no_none=True, no_bool=True))
+    def test_args_2d(self, args, dt):
+        x = numpy.arange(25, dtype=dt).reshape(5, 5)
+        ix = dpnp.array(x)
+
+        if numpy.isscalar(args):
+            iargs = args
+        else:
+            iargs = dpnp.array(args)
+
+        expected = numpy.gradient(x, args)
+        result = dpnp.gradient(ix, iargs)
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    @pytest.mark.parametrize("dt", get_all_dtypes(no_none=True, no_bool=True))
+    def test_args_2d_uneven(self, dt):
+        x = numpy.arange(25, dtype=dt).reshape(5, 5)
+        ix = dpnp.array(x)
+
+        dx = numpy.array([1.0, 2.0, 5.0, 9.0, 11.0])
+        idx = dpnp.array(dx)
+
+        expected = numpy.gradient(x, dx, dx)
+        result = dpnp.gradient(ix, idx, idx)
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    @pytest.mark.parametrize("dt", get_all_dtypes(no_none=True, no_bool=True))
+    def test_args_2d_mix_with_scalar(self, dt):
+        x = numpy.arange(25, dtype=dt).reshape(5, 5)
+        ix = dpnp.array(x)
+
+        dx = numpy.cumsum(numpy.ones(5))
+        idx = dpnp.array(dx)
+
+        expected = numpy.gradient(x, dx, 2)
+        result = dpnp.gradient(ix, idx, 2)
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    @pytest.mark.parametrize("dt", get_all_dtypes(no_none=True, no_bool=True))
+    def test_axis_args_2d(self, dt):
+        x = numpy.arange(25, dtype=dt).reshape(5, 5)
+        ix = dpnp.array(x)
+
+        dx = numpy.cumsum(numpy.ones(5))
+        idx = dpnp.array(dx)
+
+        expected = numpy.gradient(x, dx, axis=1)
+        result = dpnp.gradient(ix, idx, axis=1)
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    @pytest.mark.parametrize("xp", [numpy, dpnp])
+    def test_args_2d_error(self, xp):
+        x = xp.arange(25).reshape(5, 5)
+        dx = xp.cumsum(xp.ones(5))
+        assert_raises_regex(
+            ValueError,
+            ".*scalars or 1d",
+            xp.gradient,
+            x,
+            xp.stack([dx] * 2, axis=-1),
+            1,
+        )
+
+    @pytest.mark.parametrize("xp", [numpy, dpnp])
+    def test_badargs(self, xp):
+        x = xp.arange(25).reshape(5, 5)
+        dx = xp.cumsum(xp.ones(5))
+
+        # wrong sizes
+        assert_raises(ValueError, xp.gradient, x, x, xp.ones(2))
+        assert_raises(ValueError, xp.gradient, x, 1, xp.ones(2))
+        assert_raises(ValueError, xp.gradient, x, xp.ones(2), xp.ones(2))
+        # wrong number of arguments
+        assert_raises(TypeError, xp.gradient, x, x)
+        assert_raises(TypeError, xp.gradient, x, dx, axis=(0, 1))
+        assert_raises(TypeError, xp.gradient, x, dx, dx, dx)
+        assert_raises(TypeError, xp.gradient, x, 1, 1, 1)
+        assert_raises(TypeError, xp.gradient, x, dx, dx, axis=1)
+        assert_raises(TypeError, xp.gradient, x, 1, 1, axis=1)
+
+    @pytest.mark.parametrize(
+        "x",
+        [
+            numpy.linspace(0, 1, 10),
+            numpy.sort(numpy.random.RandomState(0).random(10)),
+        ],
+        ids=["linspace", "random_sorted"],
+    )
+    @pytest.mark.parametrize("dt", get_float_dtypes())
+    # testing that the relative numerical error is close to numpy
+    def test_second_order_accurate(self, x, dt):
+        x = x.astype(dt)
+        dx = x[1] - x[0]
+        y = 2 * x**3 + 4 * x**2 + 2 * x
+
+        iy = dpnp.array(y)
+        idx = dpnp.array(dx)
+
+        expected = numpy.gradient(y, dx, edge_order=2)
+        result = dpnp.gradient(iy, idx, edge_order=2)
+        assert_dtype_allclose(result, expected)
+
+    @pytest.mark.parametrize("edge_order", [1, 2])
+    @pytest.mark.parametrize("axis", [0, 1, (0, 1)])
+    @pytest.mark.parametrize("dt", get_float_dtypes())
+    def test_spacing_axis_scalar(self, edge_order, axis, dt):
+        x = numpy.array([0, 2.0, 3.0, 4.0, 5.0, 5.0], dtype=dt)
+        x = numpy.tile(x, (6, 1)) + x.reshape(-1, 1)
+        ix = dpnp.array(x)
+
+        expected = numpy.gradient(x, 1.0, axis=axis, edge_order=edge_order)
+        result = dpnp.gradient(ix, 1.0, axis=axis, edge_order=edge_order)
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    @pytest.mark.parametrize("edge_order", [1, 2])
+    @pytest.mark.parametrize("axis", [(0, 1), None])
+    @pytest.mark.parametrize("dt", get_float_dtypes())
+    @pytest.mark.parametrize(
+        "dx",
+        [numpy.arange(6.0), numpy.array([0.0, 0.5, 1.0, 3.0, 5.0, 7.0])],
+        ids=["even", "uneven"],
+    )
+    def test_spacing_axis_two_args(self, edge_order, axis, dt, dx):
+        x = numpy.array([0, 2.0, 3.0, 4.0, 5.0, 5.0], dtype=dt)
+        x = numpy.tile(x, (6, 1)) + x.reshape(-1, 1)
+
+        ix = dpnp.array(x)
+        idx = dpnp.array(dx)
+
+        expected = numpy.gradient(x, dx, dx, axis=axis, edge_order=edge_order)
+        result = dpnp.gradient(ix, idx, idx, axis=axis, edge_order=edge_order)
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    @pytest.mark.parametrize("edge_order", [1, 2])
+    @pytest.mark.parametrize("axis", [0, 1])
+    @pytest.mark.parametrize("dt", get_float_dtypes())
+    @pytest.mark.parametrize(
+        "dx",
+        [numpy.arange(6.0), numpy.array([0.0, 0.5, 1.0, 3.0, 5.0, 7.0])],
+        ids=["even", "uneven"],
+    )
+    def test_spacing_axis_args(self, edge_order, axis, dt, dx):
+        x = numpy.array([0, 2.0, 3.0, 4.0, 5.0, 5.0], dtype=dt)
+        x = numpy.tile(x, (6, 1)) + x.reshape(-1, 1)
+
+        ix = dpnp.array(x)
+        idx = dpnp.array(dx)
+
+        expected = numpy.gradient(x, dx, axis=axis, edge_order=edge_order)
+        result = dpnp.gradient(ix, idx, axis=axis, edge_order=edge_order)
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    @pytest.mark.parametrize("edge_order", [1, 2])
+    @pytest.mark.parametrize("dt", get_float_dtypes())
+    def test_spacing_mix_args(self, edge_order, dt):
+        x = numpy.array([0, 2.0, 3.0, 4.0, 5.0, 5.0], dtype=dt)
+        x = numpy.tile(x, (6, 1)) + x.reshape(-1, 1)
+        x_uneven = numpy.array([0.0, 0.5, 1.0, 3.0, 5.0, 7.0])
+        x_even = numpy.arange(6.0)
+
+        ix = dpnp.array(x)
+        ix_uneven = dpnp.array(x_uneven)
+        ix_even = dpnp.array(x_even)
+
+        expected = numpy.gradient(
+            x, x_even, x_uneven, axis=(0, 1), edge_order=edge_order
+        )
+        result = dpnp.gradient(
+            ix, ix_even, ix_uneven, axis=(0, 1), edge_order=edge_order
+        )
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+        expected = numpy.gradient(
+            x, x_uneven, x_even, axis=(1, 0), edge_order=edge_order
+        )
+        result = dpnp.gradient(
+            ix, ix_uneven, ix_even, axis=(1, 0), edge_order=edge_order
+        )
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    @pytest.mark.parametrize("axis", [0, 1, -1, (1, 0), None])
+    def test_specific_axes(self, axis):
+        x = numpy.array([[1, 1], [3, 4]])
+        ix = dpnp.array(x)
+
+        expected = numpy.gradient(x, axis=axis)
+        result = dpnp.gradient(ix, axis=axis)
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    def test_axis_scalar_args(self):
+        x = numpy.array([[1, 1], [3, 4]])
+        ix = dpnp.array(x)
+
+        expected = numpy.gradient(x, 2, 3, axis=(1, 0))
+        result = dpnp.gradient(ix, 2, 3, axis=(1, 0))
+        for gr, igr in zip(expected, result):
+            assert_dtype_allclose(igr, gr)
+
+    @pytest.mark.parametrize("xp", [numpy, dpnp])
+    def test_wrong_number_of_args(self, xp):
+        x = xp.array([[1, 1], [3, 4]])
+        assert_raises(TypeError, xp.gradient, x, 1, 2, axis=1)
+
+    @pytest.mark.parametrize("xp", [numpy, dpnp])
+    def test_wrong_axis(self, xp):
+        x = xp.array([[1, 1], [3, 4]])
+        assert_raises(numpy.AxisError, xp.gradient, x, axis=3)
+
+    @pytest.mark.parametrize(
+        "size, edge_order",
+        [
+            pytest.param(2, 1),
+            pytest.param(3, 2),
+        ],
+    )
+    def test_min_size_with_edge_order(self, size, edge_order):
+        x = numpy.arange(size)
+        ix = dpnp.array(x)
+
+        expected = numpy.gradient(x, edge_order=edge_order)
+        result = dpnp.gradient(ix, edge_order=edge_order)
+        assert_dtype_allclose(result, expected)
+
+    @pytest.mark.parametrize(
+        "size, edge_order",
+        [
+            pytest.param(0, 1),
+            pytest.param(0, 2),
+            pytest.param(1, 1),
+            pytest.param(1, 2),
+            pytest.param(2, 2),
+        ],
+    )
+    @pytest.mark.parametrize("xp", [numpy, dpnp])
+    def test_wrong_size_with_edge_order(self, size, edge_order, xp):
+        assert_raises(
+            ValueError, xp.gradient, xp.arange(size), edge_order=edge_order
+        )
+
+    @pytest.mark.parametrize(
+        "dt", [numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint64]
+    )
+    def test_f_decreasing_unsigned_int(self, dt):
+        x = numpy.array([5, 4, 3, 2, 1], dtype=dt)
+        ix = dpnp.array(x)
+
+        expected = numpy.gradient(x)
+        result = dpnp.gradient(ix)
+        assert_array_equal(result, expected)
+
+    @pytest.mark.parametrize(
+        "dt", [numpy.int8, numpy.int16, numpy.int32, numpy.int64]
+    )
+    def test_f_signed_int_big_jump(self, dt):
+        maxint = numpy.iinfo(dt).max
+        x = numpy.array([-1, maxint], dtype=dt)
+        dx = numpy.array([1, 3])
+
+        ix = dpnp.array(x)
+        idx = dpnp.array(dx)
+
+        expected = numpy.gradient(x, dx)
+        result = dpnp.gradient(ix, idx)
+        assert_array_equal(result, expected)
+
+    @pytest.mark.parametrize(
+        "dt", [numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint64]
+    )
+    def test_x_decreasing_unsigned(self, dt):
+        x = numpy.array([3, 2, 1], dtype=dt)
+        f = numpy.array([0, 2, 4])
+
+        dp_x = dpnp.array(x)
+        dp_f = dpnp.array(f)
+
+        expected = numpy.gradient(f, x)
+        result = dpnp.gradient(dp_f, dp_x)
+        assert_array_equal(result, expected)
+
+    @pytest.mark.parametrize(
+        "dt", [numpy.int8, numpy.int16, numpy.int32, numpy.int64]
+    )
+    def test_x_signed_int_big_jump(self, dt):
+        minint = numpy.iinfo(dt).min
+        maxint = numpy.iinfo(dt).max
+        x = numpy.array([-1, maxint], dtype=dt)
+        f = numpy.array([minint // 2, 0])
+
+        dp_x = dpnp.array(x)
+        dp_f = dpnp.array(f)
+
+        expected = numpy.gradient(f, x)
+        result = dpnp.gradient(dp_f, dp_x)
+        assert_array_equal(result, expected)
+
+    def test_return_type(self):
+        x = dpnp.array([[1, 2], [2, 3]])
+        res = dpnp.gradient(x)
+        assert type(res) is tuple
 
 
 @pytest.mark.parametrize("dtype1", get_all_dtypes())
@@ -1384,32 +1727,6 @@ def test_trapz_with_dx_params(self, y_array, dx):
         assert_array_equal(expected, result)
 
 
-class TestGradient:
-    @pytest.mark.parametrize(
-        "array", [[2, 3, 6, 8, 4, 9], [3.0, 4.0, 7.5, 9.0], [2, 6, 8, 10]]
-    )
-    def test_gradient_y1(self, array):
-        np_y = numpy.array(array)
-        dpnp_y = dpnp.array(array)
-
-        result = dpnp.gradient(dpnp_y)
-        expected = numpy.gradient(np_y)
-        assert_array_equal(expected, result)
-
-    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
-    @pytest.mark.parametrize(
-        "array", [[2, 3, 6, 8, 4, 9], [3.0, 4.0, 7.5, 9.0], [2, 6, 8, 10]]
-    )
-    @pytest.mark.parametrize("dx", [2, 3.5])
-    def test_gradient_y1_dx(self, array, dx):
-        np_y = numpy.array(array)
-        dpnp_y = dpnp.array(array)
-
-        result = dpnp.gradient(dpnp_y, dx)
-        expected = numpy.gradient(np_y, dx)
-        assert_array_equal(expected, result)
-
-
 class TestRoundingFuncs:
     @pytest.fixture(
         params=[
diff --git a/tests/test_sycl_queue.py b/tests/test_sycl_queue.py
index fae4dd52221c..e66c1a55b87b 100644
--- a/tests/test_sycl_queue.py
+++ b/tests/test_sycl_queue.py
@@ -625,6 +625,11 @@ def test_reduce_hypot(device):
             [-3.0, -2.0, -1.0, 1.0, 2.0, 3.0],
             [2.0, 2.0, 2.0, 2.0, 2.0, 2.0],
         ),
+        pytest.param(
+            "gradient",
+            [1.0, 2.0, 4.0, 7.0, 11.0, 16.0],
+            [0.0, 1.0, 1.5, 3.5, 4.0, 6.0],
+        ),
         pytest.param(
             "histogram_bin_edges",
             [0, 0, 0, 1, 2, 3, 3, 4, 5],
@@ -691,7 +696,7 @@ def test_2in_1out(func, data1, data2, device):
     x2 = dpnp.array(data2, device=device)
     result = getattr(dpnp, func)(x1, x2)
 
-    assert_allclose(result, expected)
+    assert_dtype_allclose(result, expected)
 
     assert_sycl_queue_equal(result.sycl_queue, x1.sycl_queue)
     assert_sycl_queue_equal(result.sycl_queue, x2.sycl_queue)
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index eab59cf001b6..f42b6a769bc9 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -539,6 +539,7 @@ def test_norm(usm_type, ord, axis):
         pytest.param("exp2", [0.0, 1.0, 2.0]),
         pytest.param("expm1", [1.0e-10, 1.0, 2.0, 4.0, 7.0]),
         pytest.param("floor", [-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0]),
+        pytest.param("gradient", [1, 2, 4, 7, 11, 16]),
         pytest.param("histogram_bin_edges", [0, 0, 0, 1, 2, 3, 3, 4, 5]),
         pytest.param(
             "imag", [complex(1.0, 2.0), complex(3.0, 4.0), complex(5.0, 6.0)]
@@ -622,6 +623,9 @@ def test_1in_1out(func, data, usm_type):
         pytest.param("dot", [3 + 2j, 4 + 1j, 5], [1, 2 + 3j, 3]),
         pytest.param("fmax", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
         pytest.param("fmin", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
+        pytest.param(
+            "gradient", [1, 2, 4, 7, 11, 16], [0.0, 1.0, 1.5, 3.5, 4.0, 6.0]
+        ),
         pytest.param(
             "hypot", [[1.0, 2.0, 3.0, 4.0]], [[-1.0, -2.0, -4.0, -5.0]]
         ),
diff --git a/tests/third_party/cupy/math_tests/test_sumprod.py b/tests/third_party/cupy/math_tests/test_sumprod.py
index 18a74a763305..f36086755e97 100644
--- a/tests/third_party/cupy/math_tests/test_sumprod.py
+++ b/tests/third_party/cupy/math_tests/test_sumprod.py
@@ -717,9 +717,15 @@ def test_diff_invalid_axis(self):
         ),
     )
 )
-@pytest.mark.skip("gradient() is not implemented yet")
 class TestGradient:
     def _gradient(self, xp, dtype, shape, spacing, axis, edge_order):
+        if (
+            not has_support_aspect64()
+            and shape == (10, 20, 30)
+            and spacing == "arrays"
+        ):
+            pytest.skip("too big values")
+
         x = testing.shaped_random(shape, xp, dtype=dtype)
         if axis is None:
             normalized_axes = tuple(range(x.ndim))
@@ -755,7 +761,9 @@ def test_gradient_floating(self, xp, dtype):
     # https://github.com/numpy/numpy/issues/15207
     @testing.with_requires("numpy>=1.18.1")
     @testing.for_int_dtypes(no_bool=True)
-    @testing.numpy_cupy_allclose(atol=1e-6, rtol=1e-5)
+    @testing.numpy_cupy_allclose(
+        atol=1e-6, rtol=1e-5, type_check=has_support_aspect64()
+    )
     def test_gradient_int(self, xp, dtype):
         return self._gradient(
             xp, dtype, self.shape, self.spacing, self.axis, self.edge_order
@@ -773,7 +781,6 @@ def test_gradient_float16(self, xp):
         )
 
 
-@pytest.mark.skip("gradient() is not implemented yet")
 class TestGradientErrors:
     def test_gradient_invalid_spacings1(self):
         # more spacings than axes

From 48d6191e1a03dfdbf66bfba63028201074e290a8 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Fri, 31 May 2024 15:12:36 +0200
Subject: [PATCH 11/49] Bump conda-build version to 24.5.1 (#1862)

---
 .github/workflows/conda-package.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/conda-package.yml b/.github/workflows/conda-package.yml
index 3b24acf77749..dde7e3d435aa 100644
--- a/.github/workflows/conda-package.yml
+++ b/.github/workflows/conda-package.yml
@@ -12,7 +12,7 @@ env:
   PACKAGE_NAME: dpnp
   MODULE_NAME: dpnp
   CHANNELS: '-c dppy/label/dev -c intel -c conda-forge --override-channels'
-  CONDA_BUILD_VERSION: '24.5.0'
+  CONDA_BUILD_VERSION: '24.5.1'
   CONDA_INDEX_VERSION: '0.4.0'
   TEST_ENV_NAME: 'test'
   TEST_SCOPE: >-

From 807dc14807d1e08b2643d15f8bb5500908b565aa Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 3 Jun 2024 12:38:17 +0200
Subject: [PATCH 12/49] Bump github/codeql-action from 3.25.6 to 3.25.7 (#1865)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.25.6 to 3.25.7.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/9fdb3e49720b44c48891d036bb502feb25684276...f079b8493333aace61c81488f8bd40919487bd9f)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
 .github/workflows/openssf-scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/openssf-scorecard.yml b/.github/workflows/openssf-scorecard.yml
index 726b817e2ffa..25f351b9c1e0 100644
--- a/.github/workflows/openssf-scorecard.yml
+++ b/.github/workflows/openssf-scorecard.yml
@@ -68,6 +68,6 @@ jobs:
 
       # Upload the results to GitHub's code scanning dashboard.
       - name: "Upload to code-scanning"
-        uses: github/codeql-action/upload-sarif@9fdb3e49720b44c48891d036bb502feb25684276 # v3.25.6
+        uses: github/codeql-action/upload-sarif@f079b8493333aace61c81488f8bd40919487bd9f # v3.25.7
         with:
           sarif_file: results.sarif

From 006ccf95392e86f8034955cae66e56e39880cad0 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Tue, 4 Jun 2024 17:10:44 +0200
Subject: [PATCH 13/49] Implement `dpnp.sort_complex` function (#1864)

* Implement dpnp.sort_complex

* Improve test coverage
---
 doc/reference/sorting.rst                     |   1 -
 dpnp/dpnp_iface_sorting.py                    |  40 +++++-
 tests/skipped_tests.tbl                       |  44 -------
 tests/skipped_tests_gpu.tbl                   |  44 -------
 tests/test_sort.py                            |  25 ++++
 tests/test_sycl_queue.py                      |   1 +
 tests/test_usm_type.py                        |   1 +
 .../cupy/sorting_tests/test_sort.py           | 123 ++++++++++--------
 8 files changed, 135 insertions(+), 144 deletions(-)

diff --git a/doc/reference/sorting.rst b/doc/reference/sorting.rst
index 170e0d33662a..d0a966c67316 100644
--- a/doc/reference/sorting.rst
+++ b/doc/reference/sorting.rst
@@ -13,7 +13,6 @@ Sorting
    dpnp.sort
    dpnp.lexsort
    dpnp.argsort
-   dpnp.msort
    dpnp.sort_complex
    dpnp.partition
    dpnp.argpartition
diff --git a/dpnp/dpnp_iface_sorting.py b/dpnp/dpnp_iface_sorting.py
index 3dff8adf485c..a6fed26d4064 100644
--- a/dpnp/dpnp_iface_sorting.py
+++ b/dpnp/dpnp_iface_sorting.py
@@ -51,9 +51,10 @@
 from .dpnp_array import dpnp_array
 from .dpnp_utils import (
     call_origin,
+    map_dtype_to_device,
 )
 
-__all__ = ["argsort", "partition", "sort"]
+__all__ = ["argsort", "partition", "sort", "sort_complex"]
 
 
 def argsort(a, axis=-1, kind=None, order=None):
@@ -263,3 +264,40 @@ def sort(a, axis=-1, kind=None, order=None):
     return dpnp_array._create_from_usm_ndarray(
         dpt.sort(dpnp.get_usm_ndarray(a), axis=axis)
     )
+
+
+def sort_complex(a):
+    """
+    Sort a complex array using the real part first, then the imaginary part.
+
+    For full documentation refer to :obj:`numpy.sort_complex`.
+
+    Parameters
+    ----------
+    a : {dpnp.ndarray, usm_ndarray}
+        Input array.
+
+    Returns
+    -------
+    out : dpnp.ndarray of complex dtype
+        Always returns a sorted complex array.
+
+    Examples
+    --------
+    >>> import dpnp as np
+    >>> a = np.array([5, 3, 6, 2, 1])
+    >>> np.sort_complex(a)
+    array([1.+0.j, 2.+0.j, 3.+0.j, 5.+0.j, 6.+0.j])
+
+    >>> a = np.array([1 + 2j, 2 - 1j, 3 - 2j, 3 - 3j, 3 + 5j])
+    >>> np.sort_complex(a)
+    array([1.+2.j, 2.-1.j, 3.-3.j, 3.-2.j, 3.+5.j])
+
+    """
+
+    b = dpnp.sort(a)
+    if not dpnp.issubsctype(b.dtype, dpnp.complexfloating):
+        if b.dtype.char in "bhBH":
+            return b.astype(dpnp.complex64)
+        return b.astype(map_dtype_to_device(dpnp.complex128, b.sycl_device))
+    return b
diff --git a/tests/skipped_tests.tbl b/tests/skipped_tests.tbl
index 7fa1510e8a54..5e012b3a4966 100644
--- a/tests/skipped_tests.tbl
+++ b/tests/skipped_tests.tbl
@@ -564,50 +564,6 @@ tests/third_party/cupy/random_tests/test_sample.py::TestRandomIntegers2::test_bo
 tests/third_party/cupy/random_tests/test_sample.py::TestRandomIntegers2::test_goodness_of_fit
 tests/third_party/cupy/random_tests/test_sample.py::TestRandomIntegers2::test_goodness_of_fit_2
 
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_axis1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_axis2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_negative_axis1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_negative_axis2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_negative_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_multi_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_negative_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_negative_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_non_contiguous
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_none_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_one_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_sequence_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_zero_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_axis1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_axis2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_negative_axis1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_negative_axis2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_negative_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_multi_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_negative_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_negative_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_non_contiguous
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_none_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_one_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_sequence_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_zero_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_F_order
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_lexsort_dtype
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_lexsort_three_or_more_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_nan1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_nan2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_nan3
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_view
-tests/third_party/cupy/sorting_tests/test_sort.py::TestMsort::test_msort_multi_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestMsort::test_msort_one_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestSort_complex::test_sort_complex_1dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestSort_complex::test_sort_complex_nan
-tests/third_party/cupy/sorting_tests/test_sort.py::TestSort_complex::test_sort_complex_ndim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestSort_complex::test_sort_complex_zero_dim
-
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef_diag_exception
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef_rowvar
diff --git a/tests/skipped_tests_gpu.tbl b/tests/skipped_tests_gpu.tbl
index 8791400846b0..e14b954abe63 100644
--- a/tests/skipped_tests_gpu.tbl
+++ b/tests/skipped_tests_gpu.tbl
@@ -570,50 +570,6 @@ tests/third_party/cupy/random_tests/test_sample.py::TestRandomIntegers2::test_bo
 tests/third_party/cupy/random_tests/test_sample.py::TestRandomIntegers2::test_goodness_of_fit
 tests/third_party/cupy/random_tests/test_sample.py::TestRandomIntegers2::test_goodness_of_fit_2
 
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_axis1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_axis2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_negative_axis1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_negative_axis2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_invalid_negative_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_multi_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_negative_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_negative_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_non_contiguous
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_none_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_one_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_sequence_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_0_{external=False}::test_argpartition_zero_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_axis1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_axis2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_negative_axis1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_negative_axis2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_invalid_negative_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_multi_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_negative_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_negative_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_non_contiguous
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_none_axis
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_one_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_sequence_kth
-tests/third_party/cupy/sorting_tests/test_sort.py::TestArgpartition_param_1_{external=True}::test_argpartition_zero_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_F_order
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_lexsort_dtype
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_lexsort_three_or_more_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_nan1
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_nan2
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_nan3
-tests/third_party/cupy/sorting_tests/test_sort.py::TestLexsort::test_view
-tests/third_party/cupy/sorting_tests/test_sort.py::TestMsort::test_msort_multi_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestMsort::test_msort_one_dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestSort_complex::test_sort_complex_1dim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestSort_complex::test_sort_complex_nan
-tests/third_party/cupy/sorting_tests/test_sort.py::TestSort_complex::test_sort_complex_ndim
-tests/third_party/cupy/sorting_tests/test_sort.py::TestSort_complex::test_sort_complex_zero_dim
-
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef_diag_exception
 tests/third_party/cupy/statistics_tests/test_correlation.py::TestCorrcoef::test_corrcoef_rowvar
diff --git a/tests/test_sort.py b/tests/test_sort.py
index e9e8afb44543..289f7c9716b7 100644
--- a/tests/test_sort.py
+++ b/tests/test_sort.py
@@ -340,6 +340,31 @@ def test_sort_notimplemented(self):
             dpnp.sort(dp_array, order=["age"])
 
 
+class TestSortComplex:
+    @pytest.mark.parametrize(
+        "dtype", get_all_dtypes(no_complex=True) + [numpy.int8, numpy.int16]
+    )
+    def test_real(self, dtype):
+        # sort_complex() type casting for real input types
+        a = numpy.array([5, 3, 6, 2, 1], dtype=dtype)
+        ia = dpnp.array(a)
+
+        result = dpnp.sort_complex(ia)
+        expected = numpy.sort_complex(a)
+        assert_dtype_allclose(result, expected)
+
+    @pytest.mark.parametrize("dtype", get_complex_dtypes())
+    def test_complex(self, dtype):
+        # sort_complex() handling of complex input
+        a = numpy.array([2 + 3j, 1 - 2j, 1 - 3j, 2 + 1j], dtype=dtype)
+        ia = dpnp.array(a)
+
+        result = dpnp.sort_complex(ia)
+        expected = numpy.sort_complex(a)
+        assert_equal(result, expected)
+        assert result.dtype == expected.dtype
+
+
 @pytest.mark.parametrize("kth", [0, 1], ids=["0", "1"])
 @pytest.mark.parametrize("dtype", get_all_dtypes(no_none=True))
 @pytest.mark.parametrize(
diff --git a/tests/test_sycl_queue.py b/tests/test_sycl_queue.py
index e66c1a55b87b..8332f26949ba 100644
--- a/tests/test_sycl_queue.py
+++ b/tests/test_sycl_queue.py
@@ -463,6 +463,7 @@ def test_meshgrid(device_x, device_y):
         ),
         pytest.param("sinh", [-5.0, -3.5, 0.0, 3.5, 5.0]),
         pytest.param("sort", [2.0, 1.0, 7.0, 4.0]),
+        pytest.param("sort_complex", [1 + 2j, 2 - 1j, 3 - 2j, 3 - 3j, 3 + 5j]),
         pytest.param("sqrt", [1.0, 3.0, 9.0]),
         pytest.param("square", [1.0, 3.0, 9.0]),
         pytest.param("std", [1.0, 2.0, 4.0, 7.0]),
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index f42b6a769bc9..f66017ea6e26 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -581,6 +581,7 @@ def test_norm(usm_type, ord, axis):
         ),
         pytest.param("sinh", [-5.0, -3.5, 0.0, 3.5, 5.0]),
         pytest.param("sort", [2.0, 1.0, 7.0, 4.0]),
+        pytest.param("sort_complex", [1 + 2j, 2 - 1j, 3 - 2j, 3 - 3j, 3 + 5j]),
         pytest.param("sqrt", [1.0, 3.0, 9.0]),
         pytest.param("square", [1.0, 3.0, 9.0]),
         pytest.param("std", [1.0, 2.0, 4.0, 7.0]),
diff --git a/tests/third_party/cupy/sorting_tests/test_sort.py b/tests/third_party/cupy/sorting_tests/test_sort.py
index 8715702db51b..154e0b4f5998 100644
--- a/tests/third_party/cupy/sorting_tests/test_sort.py
+++ b/tests/third_party/cupy/sorting_tests/test_sort.py
@@ -4,6 +4,7 @@
 import pytest
 
 import dpnp as cupy
+from tests.helper import has_support_aspect64
 from tests.third_party.cupy import testing
 
 
@@ -210,6 +211,7 @@ def test_large(self, xp):
         return xp.sort(a, axis=-1)
 
 
+@pytest.mark.skip("lexsort() is not implemented yet")
 class TestLexsort(unittest.TestCase):
     # Test ranks
 
@@ -221,12 +223,12 @@ def test_lexsort_zero_dim(self):
             with pytest.raises(numpy.AxisError):
                 return xp.lexsort(a)
 
-    @testing.numpy_cupy_array_equal
+    @testing.numpy_cupy_array_equal()
     def test_lexsort_one_dim(self, xp):
         a = testing.shaped_random((2,), xp)
         return xp.lexsort(a)
 
-    @testing.numpy_cupy_array_equal
+    @testing.numpy_cupy_array_equal()
     def test_lexsort_two_dim(self, xp):
         a = xp.array(
             [[9, 4, 0, 4, 0, 2, 1], [1, 5, 1, 4, 3, 4, 4]]
@@ -411,11 +413,10 @@ def test_nan2(self, xp, dtype):
         return self.argsort(a)
 
 
+@pytest.mark.skip("msort() is deprecated")
 class TestMsort(unittest.TestCase):
     # Test base cases
 
-    # TODO(niboshi): Fix xfail
-    @pytest.mark.xfail(reason="Explicit error types required")
     def test_msort_zero_dim(self):
         for xp in (numpy, cupy):
             a = testing.shaped_random((), xp)
@@ -443,19 +444,19 @@ def test_sort_complex_zero_dim(self):
                 xp.sort_complex(a)
 
     @testing.for_all_dtypes()
-    @testing.numpy_cupy_array_equal()
+    @testing.numpy_cupy_array_equal(type_check=has_support_aspect64())
     def test_sort_complex_1dim(self, xp, dtype):
         a = testing.shaped_random((100,), xp, dtype)
         return a, xp.sort_complex(a)
 
     @testing.for_all_dtypes()
-    @testing.numpy_cupy_array_equal()
+    @testing.numpy_cupy_array_equal(type_check=has_support_aspect64())
     def test_sort_complex_ndim(self, xp, dtype):
         a = testing.shaped_random((2, 5, 3), xp, dtype)
         return a, xp.sort_complex(a)
 
     @testing.for_dtypes("efdFD")
-    @testing.numpy_cupy_array_equal()
+    @testing.numpy_cupy_array_equal(type_check=has_support_aspect64())
     def test_sort_complex_nan(self, xp, dtype):
         a = testing.shaped_random((2, 3, 5), xp, dtype)
         a[0, 2, 1] = a[1, 0, 3] = xp.nan
@@ -618,6 +619,7 @@ def test_partition_invalid_negative_axis2(self):
         }
     )
 )
+@pytest.mark.skip("not fully supported yet")
 class TestArgpartition(unittest.TestCase):
     def argpartition(self, a, kth, axis=-1):
         if self.external:
@@ -641,9 +643,9 @@ def test_argpartition_one_dim(self, xp, dtype):
         a = testing.shaped_random((10,), xp, dtype, 100)
         kth = 2
         idx = self.argpartition(a, kth)
-        self.assertTrue((a[idx[:kth]] < a[idx[kth]]).all())
-        self.assertTrue((a[idx[kth]] < a[idx[kth + 1 :]]).all())
-        return idx[kth]
+        assert (a[idx[:kth]] <= a[idx[kth]]).all()
+        assert (a[idx[kth]] <= a[idx[kth + 1 :]]).all()
+        return a[idx[kth]]
 
     # TODO(leofang): test all dtypes -- this workaround needs to be kept,
     # likely due to #3287? Need investigation.
@@ -655,18 +657,39 @@ def test_argpartition_multi_dim(self, xp, dtype):
         idx = self.argpartition(a, kth)
         rows = [[[0]], [[1]], [[2]]]
         cols = [[[0], [1], [2]]]
-        self.assertTrue(
-            (
-                a[rows, cols, idx[:, :, :kth]]
-                < a[rows, cols, idx[:, :, kth : kth + 1]]
-            ).all()
-        )
-        self.assertTrue(
-            (
-                a[rows, cols, idx[:, :, kth : kth + 1]]
-                < a[rows, cols, idx[:, :, kth + 1 :]]
-            ).all()
-        )
+        assert (
+            a[rows, cols, idx[:, :, :kth]]
+            < a[rows, cols, idx[:, :, kth : kth + 1]]
+        ).all()
+        assert (
+            a[rows, cols, idx[:, :, kth : kth + 1]]
+            < a[rows, cols, idx[:, :, kth + 1 :]]
+        ).all()
+        return idx[:, :, kth : kth + 1]
+
+    @testing.for_all_dtypes(no_bool=True)
+    @testing.numpy_cupy_array_equal()
+    def test_argpartition_multi_dim_kernel(self, xp, dtype):
+        # Use a larger scale for shaped_random to avoid duplicated numbers,
+        # which may make different indices at kth between NumPy and CuPy. Skip
+        # if int8 and uint8 not to overflow.
+        if dtype in (xp.int8, xp.uint8):
+            pytest.skip()
+        a = testing.shaped_random((3, 3, 256), xp, dtype, 10000)
+        kth = 20
+        idx = self.argpartition(a, kth, axis=-1)
+
+        rows = [[[0]], [[1]], [[2]]]
+        cols = [[[0], [1], [2]]]
+
+        assert (
+            a[rows, cols, idx[:, :, :kth]]
+            <= a[rows, cols, idx[:, :, kth : kth + 1]]
+        ).all()
+        assert (
+            a[rows, cols, idx[:, :, kth : kth + 1]]
+            <= a[rows, cols, idx[:, :, kth + 1 :]]
+        ).all()
         return idx[:, :, kth : kth + 1]
 
     # Test non-contiguous array
@@ -676,8 +699,8 @@ def test_argpartition_non_contiguous(self, xp):
         a = testing.shaped_random((10,), xp, "i", 100)[::2]
         kth = 2
         idx = self.argpartition(a, kth)
-        self.assertTrue((a[idx[:kth]] < a[idx[kth]]).all())
-        self.assertTrue((a[idx[kth]] < a[idx[kth + 1 :]]).all())
+        assert (a[idx[:kth]] < a[idx[kth]]).all()
+        assert (a[idx[kth]] < a[idx[kth + 1 :]]).all()
         return idx[kth]
 
     # Test kth
@@ -688,8 +711,8 @@ def test_argpartition_sequence_kth(self, xp):
         kth = (2, 4)
         idx = self.argpartition(a, kth)
         for _kth in kth:
-            self.assertTrue((a[idx[:_kth]] < a[idx[_kth]]).all())
-            self.assertTrue((a[idx[_kth]] < a[idx[_kth + 1 :]]).all())
+            assert (a[idx[:_kth]] < a[idx[_kth]]).all()
+            assert (a[idx[_kth]] < a[idx[_kth + 1 :]]).all()
         return (idx[2], idx[4])
 
     @testing.numpy_cupy_equal()
@@ -697,8 +720,8 @@ def test_argpartition_negative_kth(self, xp):
         a = testing.shaped_random((10,), xp, scale=100)
         kth = -3
         idx = self.argpartition(a, kth)
-        self.assertTrue((a[idx[:kth]] < a[idx[kth]]).all())
-        self.assertTrue((a[idx[kth]] < a[idx[kth + 1 :]]).all())
+        assert (a[idx[:kth]] < a[idx[kth]]).all()
+        assert (a[idx[kth]] < a[idx[kth + 1 :]]).all()
         return idx[kth]
 
     def test_argpartition_invalid_kth(self):
@@ -725,18 +748,14 @@ def test_argpartition_axis(self, xp):
         idx = self.argpartition(a, kth, axis=axis)
         rows = [[[0], [1], [2]]]
         cols = [[[0, 1, 2]]]
-        self.assertTrue(
-            (
-                a[idx[:kth, :, :], rows, cols]
-                < a[idx[kth : kth + 1, :, :], rows, cols]
-            ).all()
-        )
-        self.assertTrue(
-            (
-                a[idx[kth : kth + 1, :, :], rows, cols]
-                < a[idx[kth + 1 :, :, :], rows, cols]
-            ).all()
-        )
+        assert (
+            a[idx[:kth, :, :], rows, cols]
+            < a[idx[kth : kth + 1, :, :], rows, cols]
+        ).all()
+        assert (
+            a[idx[kth : kth + 1, :, :], rows, cols]
+            < a[idx[kth + 1 :, :, :], rows, cols]
+        ).all()
         return idx[kth : kth + 1, :, :]
 
     @testing.numpy_cupy_array_equal()
@@ -747,18 +766,14 @@ def test_argpartition_negative_axis(self, xp):
         idx = self.argpartition(a, kth, axis=axis)
         rows = [[[0]], [[1]], [[2]]]
         cols = [[[0], [1], [2]]]
-        self.assertTrue(
-            (
-                a[rows, cols, idx[:, :, :kth]]
-                < a[rows, cols, idx[:, :, kth : kth + 1]]
-            ).all()
-        )
-        self.assertTrue(
-            (
-                a[rows, cols, idx[:, :, kth : kth + 1]]
-                < a[rows, cols, idx[:, :, kth + 1 :]]
-            ).all()
-        )
+        assert (
+            a[rows, cols, idx[:, :, :kth]]
+            < a[rows, cols, idx[:, :, kth : kth + 1]]
+        ).all()
+        assert (
+            a[rows, cols, idx[:, :, kth : kth + 1]]
+            < a[rows, cols, idx[:, :, kth + 1 :]]
+        ).all()
         return idx[:, :, kth : kth + 1]
 
     @testing.numpy_cupy_equal()
@@ -768,8 +783,8 @@ def test_argpartition_none_axis(self, xp):
         axis = None
         idx = self.argpartition(a, kth, axis=axis)
         a1 = a.flatten()
-        self.assertTrue((a1[idx[:kth]] < a1[idx[kth]]).all())
-        self.assertTrue((a1[idx[kth]] < a1[idx[kth + 1 :]]).all())
+        assert (a1[idx[:kth]] < a1[idx[kth]]).all()
+        assert (a1[idx[kth]] < a1[idx[kth + 1 :]]).all()
         return idx[kth]
 
     def test_argpartition_invalid_axis1(self):

From 062bbd7a81d8ebe104a8b00344866d20122cf076 Mon Sep 17 00:00:00 2001
From: vtavana <120411540+vtavana@users.noreply.github.com>
Date: Fri, 7 Jun 2024 08:49:52 -0500
Subject: [PATCH 14/49] minor updates for related to BLAS routines (#1869)

---
 dpnp/backend/extensions/blas/blas_py.cpp    | 14 ++++-----
 dpnp/backend/extensions/blas/gemm_batch.cpp | 16 +---------
 dpnp/dpnp_iface_linearalgebra.py            | 33 ++++++++++++---------
 dpnp/dpnp_utils/dpnp_utils_linearalgebra.py |  7 ++---
 tests/test_product.py                       | 25 ++++++++++------
 5 files changed, 45 insertions(+), 50 deletions(-)

diff --git a/dpnp/backend/extensions/blas/blas_py.cpp b/dpnp/backend/extensions/blas/blas_py.cpp
index 3fdfebe7c301..b5d83375f239 100644
--- a/dpnp/backend/extensions/blas/blas_py.cpp
+++ b/dpnp/backend/extensions/blas/blas_py.cpp
@@ -73,7 +73,7 @@ PYBIND11_MODULE(_blas_impl, m)
         };
 
         m.def("_dot", dot_pyapi,
-              "Call `dot` from OneMKL BLAS library to return "
+              "Call `dot` from OneMKL BLAS library to compute "
               "the dot product of two real-valued vectors.",
               py::arg("sycl_queue"), py::arg("vectorA"), py::arg("vectorB"),
               py::arg("result"), py::arg("depends") = py::list());
@@ -91,7 +91,7 @@ PYBIND11_MODULE(_blas_impl, m)
         };
 
         m.def("_dotc", dotc_pyapi,
-              "Call `dotc` from OneMKL BLAS library to return "
+              "Call `dotc` from OneMKL BLAS library to compute "
               "the dot product of two complex vectors, "
               "conjugating the first vector.",
               py::arg("sycl_queue"), py::arg("vectorA"), py::arg("vectorB"),
@@ -110,7 +110,7 @@ PYBIND11_MODULE(_blas_impl, m)
         };
 
         m.def("_dotu", dotu_pyapi,
-              "Call `dotu` from OneMKL BLAS library to return "
+              "Call `dotu` from OneMKL BLAS library to compute "
               "the dot product of two complex vectors.",
               py::arg("sycl_queue"), py::arg("vectorA"), py::arg("vectorB"),
               py::arg("result"), py::arg("depends") = py::list());
@@ -118,7 +118,7 @@ PYBIND11_MODULE(_blas_impl, m)
 
     {
         m.def("_gemm", &blas_ext::gemm,
-              "Call `gemm` from OneMKL BLAS library to return "
+              "Call `gemm` from OneMKL BLAS library to compute "
               "the matrix-matrix product with 2-D matrices.",
               py::arg("sycl_queue"), py::arg("matrixA"), py::arg("matrixB"),
               py::arg("resultC"), py::arg("depends") = py::list());
@@ -126,7 +126,7 @@ PYBIND11_MODULE(_blas_impl, m)
 
     {
         m.def("_gemm_batch", &blas_ext::gemm_batch,
-              "Call `gemm_batch` from OneMKL BLAS library to return "
+              "Call `gemm_batch` from OneMKL BLAS library to compute "
               "the matrix-matrix product for a batch of 2-D matrices.",
               py::arg("sycl_queue"), py::arg("matrixA"), py::arg("matrixB"),
               py::arg("resultC"), py::arg("depends") = py::list());
@@ -134,8 +134,8 @@ PYBIND11_MODULE(_blas_impl, m)
 
     {
         m.def("_gemv", &blas_ext::gemv,
-              "Call `gemv` from OneMKL BLAS library to return "
-              "the matrix-vector product using a general matrix.",
+              "Call `gemv` from OneMKL BLAS library to compute "
+              "the matrix-vector product with a general matrix.",
               py::arg("sycl_queue"), py::arg("matrixA"), py::arg("vectorX"),
               py::arg("vectorY"), py::arg("transpose"),
               py::arg("depends") = py::list());
diff --git a/dpnp/backend/extensions/blas/gemm_batch.cpp b/dpnp/backend/extensions/blas/gemm_batch.cpp
index 0d8ad1a67432..689ef77b786f 100644
--- a/dpnp/backend/extensions/blas/gemm_batch.cpp
+++ b/dpnp/backend/extensions/blas/gemm_batch.cpp
@@ -257,21 +257,7 @@ std::tuple<sycl::event, sycl::event, bool>
         throw py::value_error("The number of columns in B must be equal to "
                               "the number of columns in result array.");
     }
-
-    std::int64_t first_dim;
-    if (a_shape[0] == b_shape[0]) {
-        first_dim = a_shape[0];
-    }
-    else if (a_shape[0] == 1 || b_shape[0] == 1) {
-        first_dim = std::max(a_shape[0], b_shape[0]);
-    }
-    else {
-        throw py::value_error("Array shapes do not match.");
-    }
-    if (first_dim != c_shape[0]) {
-        throw py::value_error("Array shapes do not match.");
-    }
-    std::int64_t src_nelems = first_dim * m * n;
+    std::int64_t src_nelems = batch_size * m * n;
     dpctl::tensor::validation::CheckWritable::throw_if_not_writable(resultC);
     dpctl::tensor::validation::AmpleMemory::throw_if_not_ample(resultC,
                                                                src_nelems);
diff --git a/dpnp/dpnp_iface_linearalgebra.py b/dpnp/dpnp_iface_linearalgebra.py
index 033929443a5c..1af952388a6c 100644
--- a/dpnp/dpnp_iface_linearalgebra.py
+++ b/dpnp/dpnp_iface_linearalgebra.py
@@ -136,32 +136,29 @@ def dot(a, b, out=None):
             raise ValueError("Only C-contiguous array is acceptable.")
 
     if dpnp.isscalar(a) or dpnp.isscalar(b):
-        # TODO: investigate usage of axpy (axpy_batch) or scal
-        # functions from BLAS here instead of dpnp.multiply
+        # TODO: use specific scalar-vector kernel
         return dpnp.multiply(a, b, out=out)
 
     a_ndim = a.ndim
     b_ndim = b.ndim
     if a_ndim == 0 or b_ndim == 0:
-        # TODO: investigate usage of axpy (axpy_batch) or scal
-        # functions from BLAS here instead of dpnp.multiply
+        # TODO: use specific scalar-vector kernel
         return dpnp.multiply(a, b, out=out)
 
     if a_ndim == 1 and b_ndim == 1:
         return dpnp_dot(a, b, out=out)
 
+    # NumPy does not allow casting even if it is safe
+    # casting="no" is used in the following
     if a_ndim == 2 and b_ndim == 2:
-        # NumPy does not allow casting even if it is safe
         return dpnp.matmul(a, b, out=out, casting="no")
 
     if a_ndim == 1 or b_ndim == 1:
-        # NumPy does not allow casting even if it is safe
         return dpnp.matmul(a, b, out=out, casting="no")
 
     # TODO: investigate usage of matmul for some possible
     # use cases instead of dpnp.tensordot
     result = dpnp.tensordot(a, b, axes=(-1, -2))
-    # NumPy does not allow casting even if it is safe
     return dpnp.get_result_array(result, out, casting="no")
 
 
@@ -619,9 +616,11 @@ def inner(a, b):
     dpnp.check_supported_arrays_type(a, b, scalar_type=True)
 
     if dpnp.isscalar(a) or dpnp.isscalar(b):
+        # TODO: use specific scalar-vector kernel
         return dpnp.multiply(a, b)
 
     if a.ndim == 0 or b.ndim == 0:
+        # TODO: use specific scalar-vector kernel
         return dpnp.multiply(a, b)
 
     if a.shape[-1] != b.shape[-1]:
@@ -696,11 +695,13 @@ def kron(a, b):
     dpnp.check_supported_arrays_type(a, b, scalar_type=True)
 
     if dpnp.isscalar(a) or dpnp.isscalar(b):
+        # TODO: use specific scalar-vector kernel
         return dpnp.multiply(a, b)
 
     a_ndim = a.ndim
     b_ndim = b.ndim
     if a_ndim == 0 or b_ndim == 0:
+        # TODO: use specific scalar-vector kernel
         return dpnp.multiply(a, b)
 
     return dpnp_kron(a, b, a_ndim, b_ndim)
@@ -999,6 +1000,7 @@ def tensordot(a, b, axes=2):
             raise ValueError(
                 "One of the inputs is scalar, axes should be zero."
             )
+        # TODO: use specific scalar-vector kernel
         return dpnp.multiply(a, b)
 
     try:
@@ -1028,6 +1030,7 @@ def tensordot(a, b, axes=2):
     axes_b = normalize_axis_tuple(axes_b, b_ndim, "axis_b")
 
     if a.ndim == 0 or b.ndim == 0:
+        # TODO: use specific scalar-vector kernel
         return dpnp.multiply(a, b)
 
     a_shape = a.shape
@@ -1112,14 +1115,16 @@ def vdot(a, b):
 
     dpnp.check_supported_arrays_type(a, b, scalar_type=True)
 
-    if dpnp.isscalar(a) or dpnp.isscalar(b):
-        if dpnp.isscalar(b) and a.size != 1:
-            raise ValueError("The first array should be of size one.")
-        if dpnp.isscalar(a) and b.size != 1:
+    if dpnp.isscalar(a):
+        if b.size != 1:
             raise ValueError("The second array should be of size one.")
-        a_conj = numpy.conj(a) if dpnp.isscalar(a) else dpnp.conj(a)
-        # TODO: investigate usage of axpy (axpy_batch) or scal
-        # functions from BLAS here instead of dpnp.multiply
+        a_conj = numpy.conj(a)
+        return dpnp.multiply(a_conj, b)
+
+    if dpnp.isscalar(b):
+        if a.size != 1:
+            raise ValueError("The first array should be of size one.")
+        a_conj = dpnp.conj(a)
         return dpnp.multiply(a_conj, b)
 
     if a.ndim == 1 and b.ndim == 1:
diff --git a/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py b/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
index 43f6cc1f3fea..0b9686771c30 100644
--- a/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
+++ b/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
@@ -108,7 +108,7 @@ def _chr(label):
         return chr(label)
 
 
-def _compute_res_dtype(*arrays, dtype, casting, sycl_queue):
+def _compute_res_dtype(*arrays, sycl_queue, dtype=None, casting="no"):
     """
     Determines the output array data type and an intermediate data type
     used in performing calculations related to a specific math function.
@@ -1748,10 +1748,7 @@ def dpnp_dot(a, b, /, out=None, *, conjugate=False):
     res_usm_type, exec_q = get_usm_allocations([a, b])
 
     # Determine the appropriate data types
-    # casting is irrelevant here since dtype is `None`
-    dot_dtype, res_dtype = _compute_res_dtype(
-        a, b, dtype=None, casting="no", sycl_queue=exec_q
-    )
+    dot_dtype, res_dtype = _compute_res_dtype(a, b, sycl_queue=exec_q)
 
     result = _create_result_array(
         a, b, out, (), dot_dtype, res_usm_type, exec_q
diff --git a/tests/test_product.py b/tests/test_product.py
index ded938bda7f1..d9463a1546cc 100644
--- a/tests/test_product.py
+++ b/tests/test_product.py
@@ -1,6 +1,7 @@
 import dpctl
 import numpy
 import pytest
+from numpy.testing import assert_raises
 
 import dpnp
 
@@ -8,6 +9,11 @@
 
 
 def _assert_selective_dtype_allclose(result, expected, dtype):
+    # For numpy.dot, numpy.vdot, numpy.kron, numpy.inner, and numpy.tensordot,
+    # when inputs are an scalar (which has the default dtype of platform) and
+    # an array, the scalar dtype precision determines the output dtype
+    # precision. In dpnp, we rely on dpnp.multiply for scalar-array product
+    # and array (not scalar) determines output dtype precision of dpnp.multiply
     if dtype in [numpy.int32, numpy.float32, numpy.complex64]:
         assert_dtype_allclose(result, expected, check_only_type_kind=True)
     else:
@@ -467,21 +473,22 @@ def test_dot_sycl_queue_error(self):
         with pytest.raises(ValueError):
             dpnp.dot(a, b)
 
-    # NumPy does not raise an error for the following test.
-    # it just does not update the out keyword if it as not properly defined
-    @pytest.mark.parametrize("ia", [1, dpnp.ones((), dtype=dpnp.int32)])
+    @pytest.mark.parametrize("ia", [1, dpnp.ones((), dtype=dpnp.float32)])
     def test_dot_out_error_scalar(self, ia):
-        ib = dpnp.ones(10, dtype=dpnp.int32)
+        a = ia if dpnp.isscalar(ia) else ia.asnumpy()
+        ib = dpnp.ones(10, dtype=dpnp.float32)
+        b = ib.asnumpy()
 
         # output data type is incorrect
-        dp_out = dpnp.empty((10,), dtype=dpnp.int64)
-        with pytest.raises(ValueError):
-            dpnp.dot(ia, ib, out=dp_out)
+        dp_out = dpnp.empty((10,), dtype=dpnp.complex64)
+        out = numpy.empty((10,), dtype=numpy.complex64)
+        assert_raises(ValueError, dpnp.dot, ia, ib, out=dp_out)
+        assert_raises(ValueError, numpy.dot, a, b, out=out)
 
         # output shape is incorrect
         dp_out = dpnp.empty((2,), dtype=dpnp.int32)
-        with pytest.raises(ValueError):
-            dpnp.dot(ia, ib, out=dp_out)
+        assert_raises(ValueError, dpnp.dot, ia, ib, out=dp_out)
+        assert_raises(ValueError, numpy.dot, a, b, out=out)
 
     @pytest.mark.parametrize(
         "shape_pair",

From 0d326ea273bd064aa303481e59b39339d49b5b3b Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 10 Jun 2024 11:23:35 +0200
Subject: [PATCH 15/49] Bump github/codeql-action from 3.25.7 to 3.25.8 (#1876)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.25.7 to 3.25.8.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/f079b8493333aace61c81488f8bd40919487bd9f...2e230e8fe0ad3a14a340ad0815ddb96d599d2aff)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
 .github/workflows/openssf-scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/openssf-scorecard.yml b/.github/workflows/openssf-scorecard.yml
index 25f351b9c1e0..5d0d13d45fba 100644
--- a/.github/workflows/openssf-scorecard.yml
+++ b/.github/workflows/openssf-scorecard.yml
@@ -68,6 +68,6 @@ jobs:
 
       # Upload the results to GitHub's code scanning dashboard.
       - name: "Upload to code-scanning"
-        uses: github/codeql-action/upload-sarif@f079b8493333aace61c81488f8bd40919487bd9f # v3.25.7
+        uses: github/codeql-action/upload-sarif@2e230e8fe0ad3a14a340ad0815ddb96d599d2aff # v3.25.8
         with:
           sarif_file: results.sarif

From 97512e541dcb8b09383492a8dc67f841a4352e8f Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Mon, 10 Jun 2024 13:14:44 +0200
Subject: [PATCH 16/49] Limit rerun of the tests in GitHub action by 2 attempts
 (#1875)

* Limit rerun of the tests in GH action by 2 attempts

* Set retry limit to 2
---
 .github/workflows/conda-package.yml | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/conda-package.yml b/.github/workflows/conda-package.yml
index dde7e3d435aa..83c657a77c5c 100644
--- a/.github/workflows/conda-package.yml
+++ b/.github/workflows/conda-package.yml
@@ -14,6 +14,7 @@ env:
   CHANNELS: '-c dppy/label/dev -c intel -c conda-forge --override-channels'
   CONDA_BUILD_VERSION: '24.5.1'
   CONDA_INDEX_VERSION: '0.4.0'
+  RUN_TESTS_MAX_ATTEMPTS: 2
   TEST_ENV_NAME: 'test'
   TEST_SCOPE: >-
       test_absolute.py
@@ -264,7 +265,7 @@ jobs:
         with:
           shell: bash
           timeout_minutes: 10
-          max_attempts: 5
+          max_attempts: ${{ env.RUN_TESTS_MAX_ATTEMPTS }}
           retry_on: any
           command: |
             . $CONDA/etc/profile.d/conda.sh
@@ -420,7 +421,7 @@ jobs:
         with:
           shell: cmd
           timeout_minutes: 15
-          max_attempts: 5
+          max_attempts: ${{ env.RUN_TESTS_MAX_ATTEMPTS }}
           retry_on: any
           command: >-
             mamba activate ${{ env.TEST_ENV_NAME }}

From 114dff6df72b43b123d55d293c5db312b0102c0a Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Mon, 10 Jun 2024 14:47:28 +0200
Subject: [PATCH 17/49] Bump dpctl version for meta.yaml and dependecies of GH
 actions (#1874)

---
 .github/workflows/build-sphinx.yml       | 2 +-
 .github/workflows/generate_coverage.yaml | 4 ++--
 conda-recipe/meta.yaml                   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/build-sphinx.yml b/.github/workflows/build-sphinx.yml
index 5d0372fb48da..02d4be095413 100644
--- a/.github/workflows/build-sphinx.yml
+++ b/.github/workflows/build-sphinx.yml
@@ -125,7 +125,7 @@ jobs:
 
       - name: Install dpnp dependencies
         run: |
-          mamba install numpy"<1.24" dpctl">=0.17.0dev0" mkl-devel-dpcpp onedpl-devel tbb-devel dpcpp_linux-64 \
+          mamba install numpy"<1.24" dpctl">=0.18.0dev0" mkl-devel-dpcpp onedpl-devel tbb-devel dpcpp_linux-64 \
               cmake cython pytest ninja scikit-build ${{ env.CHANNELS }}
 
       - name: Install cuPy dependencies
diff --git a/.github/workflows/generate_coverage.yaml b/.github/workflows/generate_coverage.yaml
index d0f65e9729b2..905867478410 100644
--- a/.github/workflows/generate_coverage.yaml
+++ b/.github/workflows/generate_coverage.yaml
@@ -81,13 +81,13 @@ jobs:
         if: env.INSTALL_ONE_API == 'yes'
         run: |
           mamba install cython llvm cmake">=3.21" scikit-build ninja pytest pytest-cov coverage[toml] \
-              dpctl">=0.17.0dev0" onedpl-devel ${{ env.CHANNELS }}
+              dpctl">=0.18.0dev0" onedpl-devel ${{ env.CHANNELS }}
 
       - name: Install dpnp dependencies
         if: env.INSTALL_ONE_API != 'yes'
         run: |
           mamba install cython llvm cmake">=3.21" scikit-build ninja pytest pytest-cov coverage[toml] \
-              dpctl">=0.17.0dev0" dpcpp_linux-64 mkl-devel-dpcpp tbb-devel onedpl-devel ${{ env.CHANNELS }}
+              dpctl">=0.18.0dev0" dpcpp_linux-64 mkl-devel-dpcpp tbb-devel onedpl-devel ${{ env.CHANNELS }}
 
       - name: Conda info
         run: |
diff --git a/conda-recipe/meta.yaml b/conda-recipe/meta.yaml
index 80375eb26f0c..c10cd061345c 100644
--- a/conda-recipe/meta.yaml
+++ b/conda-recipe/meta.yaml
@@ -2,7 +2,7 @@
 {% set excluded_compiler_version1 = "2024.0.1" %}
 {% set excluded_compiler_version2 = "2024.0.2" %}
 {% set excluded_compiler_version3 = "2024.0.3" %}
-{% set required_dpctl_version = "0.16.0" %}
+{% set required_dpctl_version = "0.17.0" %}
 
 package:
     name: dpnp

From 896209a1715dba870867ee9d27ff585113c50d42 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Mon, 10 Jun 2024 16:22:16 +0200
Subject: [PATCH 18/49] Use dpctl from dedicated channel in coverage GH action
 (#1873)

---
 .github/workflows/generate_coverage.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/generate_coverage.yaml b/.github/workflows/generate_coverage.yaml
index 905867478410..22ec13da23af 100644
--- a/.github/workflows/generate_coverage.yaml
+++ b/.github/workflows/generate_coverage.yaml
@@ -21,7 +21,7 @@ jobs:
 
     env:
       python-ver: '3.10'
-      CHANNELS: '-c dppy/label/dev -c intel -c conda-forge --override-channels'
+      CHANNELS: '-c dppy/label/coverage -c intel -c conda-forge --override-channels'
       # Install the latest oneAPI compiler to work around an issue
       INSTALL_ONE_API: 'yes'
 

From d553611d9670fb1ee50e6b293ffea2853d0f5655 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Wed, 12 Jun 2024 21:26:04 +0200
Subject: [PATCH 19/49] Preparation to reuse future common dpctl f/w in
 functions from `vm` extension (#1868)

* Preparation to reuse common dpctl f/w for VM functions

* PoC to decouple abs implementation to separate source file

* Reuse typedef for function poiter from dpctl.tensor

* Define populating vectors by a separate macro

* Move implementation of utility functions from headers to source to resolve link issues

* Separated implementation of acos function

* Separated implementation of acosh function

* Use function to simplify strides from dpctl tensor headers

* PoC to decouple add implementation to separate source file

* Separated implementation of asin function

* Separated implementation of asinh function

* Separated implementation of atan, atan2, atanh functions

* Resolve issue with calling MKL function for undefined types

* Separated implementation of cbrt, ceil, conj, cos and cosh functions

* Separated implementation of div, exp, exp2, expm1, floor and hypot functions

* Separated implementation of ln, log1p, log2 and log10 functions

* Separated implementation of mul, pow, rint, sin and sinh functions

* Separated implementation of sqr, sqrt, sub, tan, tanh and trunc functions

* Removed unused header with types matrix

* Remove unused functions

* Use passing by reference in unary and binary funcs
---
 .../elementwise_functions.hpp                 |  824 +++++++++++++
 .../elementwise_functions_type_utils.cpp      |   87 ++
 .../elementwise_functions_type_utils.hpp      |   47 +
 .../simplify_iteration_space.cpp              |  205 ++++
 .../simplify_iteration_space.hpp              |   61 +
 dpnp/backend/extensions/vm/CMakeLists.txt     |   44 +-
 dpnp/backend/extensions/vm/abs.cpp            |  138 +++
 dpnp/backend/extensions/vm/abs.hpp            |   54 +-
 dpnp/backend/extensions/vm/acos.cpp           |  138 +++
 dpnp/backend/extensions/vm/acos.hpp           |   54 +-
 dpnp/backend/extensions/vm/acosh.cpp          |  138 +++
 dpnp/backend/extensions/vm/acosh.hpp          |   54 +-
 dpnp/backend/extensions/vm/add.cpp            |  171 +++
 dpnp/backend/extensions/vm/add.hpp            |   57 +-
 dpnp/backend/extensions/vm/asin.cpp           |  138 +++
 dpnp/backend/extensions/vm/asin.hpp           |   54 +-
 dpnp/backend/extensions/vm/asinh.cpp          |  138 +++
 dpnp/backend/extensions/vm/asinh.hpp          |   54 +-
 dpnp/backend/extensions/vm/atan.cpp           |  138 +++
 dpnp/backend/extensions/vm/atan.hpp           |   54 +-
 dpnp/backend/extensions/vm/atan2.cpp          |  160 +++
 dpnp/backend/extensions/vm/atan2.hpp          |   57 +-
 dpnp/backend/extensions/vm/atanh.cpp          |  138 +++
 dpnp/backend/extensions/vm/atanh.hpp          |   54 +-
 dpnp/backend/extensions/vm/cbrt.cpp           |  136 +++
 dpnp/backend/extensions/vm/cbrt.hpp           |   54 +-
 dpnp/backend/extensions/vm/ceil.cpp           |  136 +++
 dpnp/backend/extensions/vm/ceil.hpp           |   54 +-
 dpnp/backend/extensions/vm/common.hpp         |  374 ++----
 dpnp/backend/extensions/vm/conj.cpp           |  136 +++
 dpnp/backend/extensions/vm/conj.hpp           |   54 +-
 dpnp/backend/extensions/vm/cos.cpp            |  138 +++
 dpnp/backend/extensions/vm/cos.hpp            |   54 +-
 dpnp/backend/extensions/vm/cosh.cpp           |  138 +++
 dpnp/backend/extensions/vm/cosh.hpp           |   54 +-
 dpnp/backend/extensions/vm/div.cpp            |  171 +++
 dpnp/backend/extensions/vm/div.hpp            |   57 +-
 dpnp/backend/extensions/vm/exp.cpp            |  138 +++
 dpnp/backend/extensions/vm/exp.hpp            |   54 +-
 dpnp/backend/extensions/vm/exp2.cpp           |  136 +++
 dpnp/backend/extensions/vm/exp2.hpp           |   54 +-
 dpnp/backend/extensions/vm/expm1.cpp          |  136 +++
 dpnp/backend/extensions/vm/expm1.hpp          |   54 +-
 dpnp/backend/extensions/vm/floor.cpp          |  136 +++
 dpnp/backend/extensions/vm/floor.hpp          |   54 +-
 dpnp/backend/extensions/vm/hypot.cpp          |  160 +++
 dpnp/backend/extensions/vm/hypot.hpp          |   57 +-
 dpnp/backend/extensions/vm/ln.cpp             |  138 +++
 dpnp/backend/extensions/vm/ln.hpp             |   53 +-
 dpnp/backend/extensions/vm/log10.cpp          |  138 +++
 dpnp/backend/extensions/vm/log10.hpp          |   54 +-
 dpnp/backend/extensions/vm/log1p.cpp          |  136 +++
 dpnp/backend/extensions/vm/log1p.hpp          |   54 +-
 dpnp/backend/extensions/vm/log2.cpp           |  136 +++
 dpnp/backend/extensions/vm/log2.hpp           |   54 +-
 dpnp/backend/extensions/vm/mul.cpp            |  171 +++
 dpnp/backend/extensions/vm/mul.hpp            |   57 +-
 dpnp/backend/extensions/vm/pow.cpp            |  171 +++
 dpnp/backend/extensions/vm/pow.hpp            |   57 +-
 dpnp/backend/extensions/vm/rint.cpp           |  136 +++
 .../extensions/vm/{round.hpp => rint.hpp}     |   54 +-
 dpnp/backend/extensions/vm/sin.cpp            |  138 +++
 dpnp/backend/extensions/vm/sin.hpp            |   54 +-
 dpnp/backend/extensions/vm/sinh.cpp           |  138 +++
 dpnp/backend/extensions/vm/sinh.hpp           |   54 +-
 dpnp/backend/extensions/vm/sqr.cpp            |  136 +++
 dpnp/backend/extensions/vm/sqr.hpp            |   54 +-
 dpnp/backend/extensions/vm/sqrt.cpp           |  139 +++
 dpnp/backend/extensions/vm/sqrt.hpp           |   54 +-
 dpnp/backend/extensions/vm/sub.cpp            |  171 +++
 dpnp/backend/extensions/vm/sub.hpp            |   57 +-
 dpnp/backend/extensions/vm/tan.cpp            |  138 +++
 dpnp/backend/extensions/vm/tan.hpp            |   54 +-
 dpnp/backend/extensions/vm/tanh.cpp           |  138 +++
 dpnp/backend/extensions/vm/tanh.hpp           |   54 +-
 dpnp/backend/extensions/vm/trunc.cpp          |  136 +++
 dpnp/backend/extensions/vm/trunc.hpp          |   54 +-
 dpnp/backend/extensions/vm/types_matrix.hpp   |  659 ----------
 dpnp/backend/extensions/vm/vm_py.cpp          | 1081 +----------------
 79 files changed, 6629 insertions(+), 3681 deletions(-)
 create mode 100644 dpnp/backend/extensions/elementwise_functions/elementwise_functions.hpp
 create mode 100644 dpnp/backend/extensions/elementwise_functions/elementwise_functions_type_utils.cpp
 create mode 100644 dpnp/backend/extensions/elementwise_functions/elementwise_functions_type_utils.hpp
 create mode 100644 dpnp/backend/extensions/elementwise_functions/simplify_iteration_space.cpp
 create mode 100644 dpnp/backend/extensions/elementwise_functions/simplify_iteration_space.hpp
 create mode 100644 dpnp/backend/extensions/vm/abs.cpp
 create mode 100644 dpnp/backend/extensions/vm/acos.cpp
 create mode 100644 dpnp/backend/extensions/vm/acosh.cpp
 create mode 100644 dpnp/backend/extensions/vm/add.cpp
 create mode 100644 dpnp/backend/extensions/vm/asin.cpp
 create mode 100644 dpnp/backend/extensions/vm/asinh.cpp
 create mode 100644 dpnp/backend/extensions/vm/atan.cpp
 create mode 100644 dpnp/backend/extensions/vm/atan2.cpp
 create mode 100644 dpnp/backend/extensions/vm/atanh.cpp
 create mode 100644 dpnp/backend/extensions/vm/cbrt.cpp
 create mode 100644 dpnp/backend/extensions/vm/ceil.cpp
 create mode 100644 dpnp/backend/extensions/vm/conj.cpp
 create mode 100644 dpnp/backend/extensions/vm/cos.cpp
 create mode 100644 dpnp/backend/extensions/vm/cosh.cpp
 create mode 100644 dpnp/backend/extensions/vm/div.cpp
 create mode 100644 dpnp/backend/extensions/vm/exp.cpp
 create mode 100644 dpnp/backend/extensions/vm/exp2.cpp
 create mode 100644 dpnp/backend/extensions/vm/expm1.cpp
 create mode 100644 dpnp/backend/extensions/vm/floor.cpp
 create mode 100644 dpnp/backend/extensions/vm/hypot.cpp
 create mode 100644 dpnp/backend/extensions/vm/ln.cpp
 create mode 100644 dpnp/backend/extensions/vm/log10.cpp
 create mode 100644 dpnp/backend/extensions/vm/log1p.cpp
 create mode 100644 dpnp/backend/extensions/vm/log2.cpp
 create mode 100644 dpnp/backend/extensions/vm/mul.cpp
 create mode 100644 dpnp/backend/extensions/vm/pow.cpp
 create mode 100644 dpnp/backend/extensions/vm/rint.cpp
 rename dpnp/backend/extensions/vm/{round.hpp => rint.hpp} (53%)
 create mode 100644 dpnp/backend/extensions/vm/sin.cpp
 create mode 100644 dpnp/backend/extensions/vm/sinh.cpp
 create mode 100644 dpnp/backend/extensions/vm/sqr.cpp
 create mode 100644 dpnp/backend/extensions/vm/sqrt.cpp
 create mode 100644 dpnp/backend/extensions/vm/sub.cpp
 create mode 100644 dpnp/backend/extensions/vm/tan.cpp
 create mode 100644 dpnp/backend/extensions/vm/tanh.cpp
 create mode 100644 dpnp/backend/extensions/vm/trunc.cpp
 delete mode 100644 dpnp/backend/extensions/vm/types_matrix.hpp

diff --git a/dpnp/backend/extensions/elementwise_functions/elementwise_functions.hpp b/dpnp/backend/extensions/elementwise_functions/elementwise_functions.hpp
new file mode 100644
index 000000000000..01013d10f5df
--- /dev/null
+++ b/dpnp/backend/extensions/elementwise_functions/elementwise_functions.hpp
@@ -0,0 +1,824 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+#include <pybind11/numpy.h>
+#include <pybind11/pybind11.h>
+#include <pybind11/stl.h>
+
+#include "elementwise_functions_type_utils.hpp"
+#include "simplify_iteration_space.hpp"
+
+// dpctl tensor headers
+#include "kernels/alignment.hpp"
+// #include "kernels/dpctl_tensor_types.hpp"
+#include "utils/memory_overlap.hpp"
+#include "utils/offset_utils.hpp"
+#include "utils/output_validation.hpp"
+#include "utils/type_dispatch.hpp"
+
+namespace py = pybind11;
+namespace td_ns = dpctl::tensor::type_dispatch;
+
+static_assert(std::is_same_v<py::ssize_t, dpctl::tensor::ssize_t>);
+
+namespace dpnp::extensions::py_internal
+{
+
+using dpctl::tensor::kernels::alignment_utils::is_aligned;
+using dpctl::tensor::kernels::alignment_utils::required_alignment;
+
+/*! @brief Template implementing Python API for unary elementwise functions */
+template <typename output_typesT,
+          typename contig_dispatchT,
+          typename strided_dispatchT>
+std::pair<sycl::event, sycl::event>
+    py_unary_ufunc(const dpctl::tensor::usm_ndarray &src,
+                   const dpctl::tensor::usm_ndarray &dst,
+                   sycl::queue &q,
+                   const std::vector<sycl::event> &depends,
+                   //
+                   const output_typesT &output_type_vec,
+                   const contig_dispatchT &contig_dispatch_vector,
+                   const strided_dispatchT &strided_dispatch_vector)
+{
+    int src_typenum = src.get_typenum();
+    int dst_typenum = dst.get_typenum();
+
+    const auto &array_types = td_ns::usm_ndarray_types();
+    int src_typeid = array_types.typenum_to_lookup_id(src_typenum);
+    int dst_typeid = array_types.typenum_to_lookup_id(dst_typenum);
+
+    int func_output_typeid = output_type_vec[src_typeid];
+
+    // check that types are supported
+    if (dst_typeid != func_output_typeid) {
+        throw py::value_error(
+            "Destination array has unexpected elemental data type.");
+    }
+
+    // check that queues are compatible
+    if (!dpctl::utils::queues_are_compatible(q, {src, dst})) {
+        throw py::value_error(
+            "Execution queue is not compatible with allocation queues");
+    }
+
+    dpctl::tensor::validation::CheckWritable::throw_if_not_writable(dst);
+
+    // check that dimensions are the same
+    int src_nd = src.get_ndim();
+    if (src_nd != dst.get_ndim()) {
+        throw py::value_error("Array dimensions are not the same.");
+    }
+
+    // check that shapes are the same
+    const py::ssize_t *src_shape = src.get_shape_raw();
+    const py::ssize_t *dst_shape = dst.get_shape_raw();
+    bool shapes_equal(true);
+    size_t src_nelems(1);
+
+    for (int i = 0; i < src_nd; ++i) {
+        src_nelems *= static_cast<size_t>(src_shape[i]);
+        shapes_equal = shapes_equal && (src_shape[i] == dst_shape[i]);
+    }
+    if (!shapes_equal) {
+        throw py::value_error("Array shapes are not the same.");
+    }
+
+    // if nelems is zero, return
+    if (src_nelems == 0) {
+        return std::make_pair(sycl::event(), sycl::event());
+    }
+
+    dpctl::tensor::validation::AmpleMemory::throw_if_not_ample(dst, src_nelems);
+
+    // check memory overlap
+    auto const &overlap = dpctl::tensor::overlap::MemoryOverlap();
+    auto const &same_logical_tensors =
+        dpctl::tensor::overlap::SameLogicalTensors();
+    if (overlap(src, dst) && !same_logical_tensors(src, dst)) {
+        throw py::value_error("Arrays index overlapping segments of memory");
+    }
+
+    const char *src_data = src.get_data();
+    char *dst_data = dst.get_data();
+
+    // handle contiguous inputs
+    bool is_src_c_contig = src.is_c_contiguous();
+    bool is_src_f_contig = src.is_f_contiguous();
+
+    bool is_dst_c_contig = dst.is_c_contiguous();
+    bool is_dst_f_contig = dst.is_f_contiguous();
+
+    bool both_c_contig = (is_src_c_contig && is_dst_c_contig);
+    bool both_f_contig = (is_src_f_contig && is_dst_f_contig);
+
+    if (both_c_contig || both_f_contig) {
+        auto contig_fn = contig_dispatch_vector[src_typeid];
+
+        if (contig_fn == nullptr) {
+            throw std::runtime_error(
+                "Contiguous implementation is missing for src_typeid=" +
+                std::to_string(src_typeid));
+        }
+
+        auto comp_ev = contig_fn(q, src_nelems, src_data, dst_data, depends);
+        sycl::event ht_ev =
+            dpctl::utils::keep_args_alive(q, {src, dst}, {comp_ev});
+
+        return std::make_pair(ht_ev, comp_ev);
+    }
+
+    // simplify iteration space
+    //     if 1d with strides 1 - input is contig
+    //     dispatch to strided
+
+    auto const &src_strides = src.get_strides_vector();
+    auto const &dst_strides = dst.get_strides_vector();
+
+    using shT = std::vector<py::ssize_t>;
+    shT simplified_shape;
+    shT simplified_src_strides;
+    shT simplified_dst_strides;
+    py::ssize_t src_offset(0);
+    py::ssize_t dst_offset(0);
+
+    int nd = src_nd;
+    const py::ssize_t *shape = src_shape;
+
+    simplify_iteration_space(nd, shape, src_strides, dst_strides,
+                             // output
+                             simplified_shape, simplified_src_strides,
+                             simplified_dst_strides, src_offset, dst_offset);
+
+    if (nd == 1 && simplified_src_strides[0] == 1 &&
+        simplified_dst_strides[0] == 1) {
+        // Special case of contiguous data
+        auto contig_fn = contig_dispatch_vector[src_typeid];
+
+        if (contig_fn == nullptr) {
+            throw std::runtime_error(
+                "Contiguous implementation is missing for src_typeid=" +
+                std::to_string(src_typeid));
+        }
+
+        int src_elem_size = src.get_elemsize();
+        int dst_elem_size = dst.get_elemsize();
+        auto comp_ev =
+            contig_fn(q, src_nelems, src_data + src_elem_size * src_offset,
+                      dst_data + dst_elem_size * dst_offset, depends);
+
+        sycl::event ht_ev =
+            dpctl::utils::keep_args_alive(q, {src, dst}, {comp_ev});
+
+        return std::make_pair(ht_ev, comp_ev);
+    }
+
+    // Strided implementation
+    auto strided_fn = strided_dispatch_vector[src_typeid];
+
+    if (strided_fn == nullptr) {
+        throw std::runtime_error(
+            "Strided implementation is missing for src_typeid=" +
+            std::to_string(src_typeid));
+    }
+
+    using dpctl::tensor::offset_utils::device_allocate_and_pack;
+
+    std::vector<sycl::event> host_tasks{};
+    host_tasks.reserve(2);
+
+    const auto &ptr_size_event_triple_ = device_allocate_and_pack<py::ssize_t>(
+        q, host_tasks, simplified_shape, simplified_src_strides,
+        simplified_dst_strides);
+    py::ssize_t *shape_strides = std::get<0>(ptr_size_event_triple_);
+    const sycl::event &copy_shape_ev = std::get<2>(ptr_size_event_triple_);
+
+    if (shape_strides == nullptr) {
+        throw std::runtime_error("Device memory allocation failed");
+    }
+
+    sycl::event strided_fn_ev =
+        strided_fn(q, src_nelems, nd, shape_strides, src_data, src_offset,
+                   dst_data, dst_offset, depends, {copy_shape_ev});
+
+    // async free of shape_strides temporary
+    auto ctx = q.get_context();
+    sycl::event tmp_cleanup_ev = q.submit([&](sycl::handler &cgh) {
+        cgh.depends_on(strided_fn_ev);
+        cgh.host_task(
+            [ctx, shape_strides]() { sycl::free(shape_strides, ctx); });
+    });
+    host_tasks.push_back(tmp_cleanup_ev);
+
+    return std::make_pair(
+        dpctl::utils::keep_args_alive(q, {src, dst}, host_tasks),
+        strided_fn_ev);
+}
+
+/*! @brief Template implementing Python API for querying of type support by
+ *         unary elementwise functions */
+template <typename output_typesT>
+py::object py_unary_ufunc_result_type(const py::dtype &input_dtype,
+                                      const output_typesT &output_types)
+{
+    int tn = input_dtype.num(); // NumPy type numbers are the same as in dpctl
+    int src_typeid = -1;
+
+    auto array_types = td_ns::usm_ndarray_types();
+
+    try {
+        src_typeid = array_types.typenum_to_lookup_id(tn);
+    } catch (const std::exception &e) {
+        throw py::value_error(e.what());
+    }
+
+    using type_utils::_result_typeid;
+    int dst_typeid = _result_typeid(src_typeid, output_types);
+
+    if (dst_typeid < 0) {
+        auto res = py::none();
+        return py::cast<py::object>(res);
+    }
+    else {
+        using type_utils::_dtype_from_typenum;
+
+        auto dst_typenum_t = static_cast<td_ns::typenum_t>(dst_typeid);
+        auto dt = _dtype_from_typenum(dst_typenum_t);
+
+        return py::cast<py::object>(dt);
+    }
+}
+
+// ======================== Binary functions ===========================
+
+namespace
+{
+template <class Container, class T>
+bool isEqual(Container const &c, std::initializer_list<T> const &l)
+{
+    return std::equal(std::begin(c), std::end(c), std::begin(l), std::end(l));
+}
+} // namespace
+
+/*! @brief Template implementing Python API for binary elementwise
+ *         functions */
+template <typename output_typesT,
+          typename contig_dispatchT,
+          typename strided_dispatchT,
+          typename contig_matrix_row_dispatchT,
+          typename contig_row_matrix_dispatchT>
+std::pair<sycl::event, sycl::event> py_binary_ufunc(
+    const dpctl::tensor::usm_ndarray &src1,
+    const dpctl::tensor::usm_ndarray &src2,
+    const dpctl::tensor::usm_ndarray &dst, // dst = op(src1, src2), elementwise
+    sycl::queue &exec_q,
+    const std::vector<sycl::event> depends,
+    //
+    const output_typesT &output_type_table,
+    const contig_dispatchT &contig_dispatch_table,
+    const strided_dispatchT &strided_dispatch_table,
+    const contig_matrix_row_dispatchT
+        &contig_matrix_row_broadcast_dispatch_table,
+    const contig_row_matrix_dispatchT
+        &contig_row_matrix_broadcast_dispatch_table)
+{
+    // check type_nums
+    int src1_typenum = src1.get_typenum();
+    int src2_typenum = src2.get_typenum();
+    int dst_typenum = dst.get_typenum();
+
+    auto array_types = td_ns::usm_ndarray_types();
+    int src1_typeid = array_types.typenum_to_lookup_id(src1_typenum);
+    int src2_typeid = array_types.typenum_to_lookup_id(src2_typenum);
+    int dst_typeid = array_types.typenum_to_lookup_id(dst_typenum);
+
+    int output_typeid = output_type_table[src1_typeid][src2_typeid];
+
+    if (output_typeid != dst_typeid) {
+        throw py::value_error(
+            "Destination array has unexpected elemental data type.");
+    }
+
+    // check that queues are compatible
+    if (!dpctl::utils::queues_are_compatible(exec_q, {src1, src2, dst})) {
+        throw py::value_error(
+            "Execution queue is not compatible with allocation queues");
+    }
+
+    dpctl::tensor::validation::CheckWritable::throw_if_not_writable(dst);
+
+    // check shapes, broadcasting is assumed done by caller
+    // check that dimensions are the same
+    int dst_nd = dst.get_ndim();
+    if (dst_nd != src1.get_ndim() || dst_nd != src2.get_ndim()) {
+        throw py::value_error("Array dimensions are not the same.");
+    }
+
+    // check that shapes are the same
+    const py::ssize_t *src1_shape = src1.get_shape_raw();
+    const py::ssize_t *src2_shape = src2.get_shape_raw();
+    const py::ssize_t *dst_shape = dst.get_shape_raw();
+    bool shapes_equal(true);
+    size_t src_nelems(1);
+
+    for (int i = 0; i < dst_nd; ++i) {
+        src_nelems *= static_cast<size_t>(src1_shape[i]);
+        shapes_equal = shapes_equal && (src1_shape[i] == dst_shape[i] &&
+                                        src2_shape[i] == dst_shape[i]);
+    }
+    if (!shapes_equal) {
+        throw py::value_error("Array shapes are not the same.");
+    }
+
+    // if nelems is zero, return
+    if (src_nelems == 0) {
+        return std::make_pair(sycl::event(), sycl::event());
+    }
+
+    dpctl::tensor::validation::AmpleMemory::throw_if_not_ample(dst, src_nelems);
+
+    auto const &overlap = dpctl::tensor::overlap::MemoryOverlap();
+    auto const &same_logical_tensors =
+        dpctl::tensor::overlap::SameLogicalTensors();
+    if ((overlap(src1, dst) && !same_logical_tensors(src1, dst)) ||
+        (overlap(src2, dst) && !same_logical_tensors(src2, dst)))
+    {
+        throw py::value_error("Arrays index overlapping segments of memory");
+    }
+    // check memory overlap
+    const char *src1_data = src1.get_data();
+    const char *src2_data = src2.get_data();
+    char *dst_data = dst.get_data();
+
+    // handle contiguous inputs
+    bool is_src1_c_contig = src1.is_c_contiguous();
+    bool is_src1_f_contig = src1.is_f_contiguous();
+
+    bool is_src2_c_contig = src2.is_c_contiguous();
+    bool is_src2_f_contig = src2.is_f_contiguous();
+
+    bool is_dst_c_contig = dst.is_c_contiguous();
+    bool is_dst_f_contig = dst.is_f_contiguous();
+
+    bool all_c_contig =
+        (is_src1_c_contig && is_src2_c_contig && is_dst_c_contig);
+    bool all_f_contig =
+        (is_src1_f_contig && is_src2_f_contig && is_dst_f_contig);
+
+    // dispatch for contiguous inputs
+    if (all_c_contig || all_f_contig) {
+        auto contig_fn = contig_dispatch_table[src1_typeid][src2_typeid];
+
+        if (contig_fn != nullptr) {
+            auto comp_ev = contig_fn(exec_q, src_nelems, src1_data, 0,
+                                     src2_data, 0, dst_data, 0, depends);
+            sycl::event ht_ev = dpctl::utils::keep_args_alive(
+                exec_q, {src1, src2, dst}, {comp_ev});
+
+            return std::make_pair(ht_ev, comp_ev);
+        }
+    }
+
+    // simplify strides
+    auto const &src1_strides = src1.get_strides_vector();
+    auto const &src2_strides = src2.get_strides_vector();
+    auto const &dst_strides = dst.get_strides_vector();
+
+    using shT = std::vector<py::ssize_t>;
+    shT simplified_shape;
+    shT simplified_src1_strides;
+    shT simplified_src2_strides;
+    shT simplified_dst_strides;
+    py::ssize_t src1_offset(0);
+    py::ssize_t src2_offset(0);
+    py::ssize_t dst_offset(0);
+
+    int nd = dst_nd;
+    const py::ssize_t *shape = src1_shape;
+
+    simplify_iteration_space_3(
+        nd, shape, src1_strides, src2_strides, dst_strides,
+        // outputs
+        simplified_shape, simplified_src1_strides, simplified_src2_strides,
+        simplified_dst_strides, src1_offset, src2_offset, dst_offset);
+
+    std::vector<sycl::event> host_tasks{};
+    if (nd < 3) {
+        static constexpr auto unit_stride =
+            std::initializer_list<py::ssize_t>{1};
+
+        if ((nd == 1) && isEqual(simplified_src1_strides, unit_stride) &&
+            isEqual(simplified_src2_strides, unit_stride) &&
+            isEqual(simplified_dst_strides, unit_stride))
+        {
+            auto contig_fn = contig_dispatch_table[src1_typeid][src2_typeid];
+
+            if (contig_fn != nullptr) {
+                auto comp_ev = contig_fn(exec_q, src_nelems, src1_data,
+                                         src1_offset, src2_data, src2_offset,
+                                         dst_data, dst_offset, depends);
+                sycl::event ht_ev = dpctl::utils::keep_args_alive(
+                    exec_q, {src1, src2, dst}, {comp_ev});
+
+                return std::make_pair(ht_ev, comp_ev);
+            }
+        }
+        if (nd == 2) {
+            static constexpr auto zero_one_strides =
+                std::initializer_list<py::ssize_t>{0, 1};
+            static constexpr auto one_zero_strides =
+                std::initializer_list<py::ssize_t>{1, 0};
+            constexpr py::ssize_t one{1};
+            // special case of C-contiguous matrix and a row
+            if (isEqual(simplified_src2_strides, zero_one_strides) &&
+                isEqual(simplified_src1_strides, {simplified_shape[1], one}) &&
+                isEqual(simplified_dst_strides, {simplified_shape[1], one}))
+            {
+                auto matrix_row_broadcast_fn =
+                    contig_matrix_row_broadcast_dispatch_table[src1_typeid]
+                                                              [src2_typeid];
+                if (matrix_row_broadcast_fn != nullptr) {
+                    int src1_itemsize = src1.get_elemsize();
+                    int src2_itemsize = src2.get_elemsize();
+                    int dst_itemsize = dst.get_elemsize();
+
+                    if (is_aligned<required_alignment>(
+                            src1_data + src1_offset * src1_itemsize) &&
+                        is_aligned<required_alignment>(
+                            src2_data + src2_offset * src2_itemsize) &&
+                        is_aligned<required_alignment>(
+                            dst_data + dst_offset * dst_itemsize))
+                    {
+                        size_t n0 = simplified_shape[0];
+                        size_t n1 = simplified_shape[1];
+                        sycl::event comp_ev = matrix_row_broadcast_fn(
+                            exec_q, host_tasks, n0, n1, src1_data, src1_offset,
+                            src2_data, src2_offset, dst_data, dst_offset,
+                            depends);
+
+                        return std::make_pair(
+                            dpctl::utils::keep_args_alive(
+                                exec_q, {src1, src2, dst}, host_tasks),
+                            comp_ev);
+                    }
+                }
+            }
+            if (isEqual(simplified_src1_strides, one_zero_strides) &&
+                isEqual(simplified_src2_strides, {one, simplified_shape[0]}) &&
+                isEqual(simplified_dst_strides, {one, simplified_shape[0]}))
+            {
+                auto row_matrix_broadcast_fn =
+                    contig_row_matrix_broadcast_dispatch_table[src1_typeid]
+                                                              [src2_typeid];
+                if (row_matrix_broadcast_fn != nullptr) {
+
+                    int src1_itemsize = src1.get_elemsize();
+                    int src2_itemsize = src2.get_elemsize();
+                    int dst_itemsize = dst.get_elemsize();
+
+                    if (is_aligned<required_alignment>(
+                            src1_data + src1_offset * src1_itemsize) &&
+                        is_aligned<required_alignment>(
+                            src2_data + src2_offset * src2_itemsize) &&
+                        is_aligned<required_alignment>(
+                            dst_data + dst_offset * dst_itemsize))
+                    {
+                        size_t n0 = simplified_shape[1];
+                        size_t n1 = simplified_shape[0];
+                        sycl::event comp_ev = row_matrix_broadcast_fn(
+                            exec_q, host_tasks, n0, n1, src1_data, src1_offset,
+                            src2_data, src2_offset, dst_data, dst_offset,
+                            depends);
+
+                        return std::make_pair(
+                            dpctl::utils::keep_args_alive(
+                                exec_q, {src1, src2, dst}, host_tasks),
+                            comp_ev);
+                    }
+                }
+            }
+        }
+    }
+
+    // dispatch to strided code
+    auto strided_fn = strided_dispatch_table[src1_typeid][src2_typeid];
+
+    if (strided_fn == nullptr) {
+        throw std::runtime_error(
+            "Strided implementation is missing for src1_typeid=" +
+            std::to_string(src1_typeid) +
+            " and src2_typeid=" + std::to_string(src2_typeid));
+    }
+
+    using dpctl::tensor::offset_utils::device_allocate_and_pack;
+    const auto &ptr_sz_event_triple_ = device_allocate_and_pack<py::ssize_t>(
+        exec_q, host_tasks, simplified_shape, simplified_src1_strides,
+        simplified_src2_strides, simplified_dst_strides);
+
+    py::ssize_t *shape_strides = std::get<0>(ptr_sz_event_triple_);
+    const sycl::event &copy_shape_ev = std::get<2>(ptr_sz_event_triple_);
+
+    if (shape_strides == nullptr) {
+        throw std::runtime_error("Unabled to allocate device memory");
+    }
+
+    sycl::event strided_fn_ev = strided_fn(
+        exec_q, src_nelems, nd, shape_strides, src1_data, src1_offset,
+        src2_data, src2_offset, dst_data, dst_offset, depends, {copy_shape_ev});
+
+    // async free of shape_strides temporary
+    auto ctx = exec_q.get_context();
+
+    sycl::event tmp_cleanup_ev = exec_q.submit([&](sycl::handler &cgh) {
+        cgh.depends_on(strided_fn_ev);
+        cgh.host_task(
+            [ctx, shape_strides]() { sycl::free(shape_strides, ctx); });
+    });
+
+    host_tasks.push_back(tmp_cleanup_ev);
+
+    return std::make_pair(
+        dpctl::utils::keep_args_alive(exec_q, {src1, src2, dst}, host_tasks),
+        strided_fn_ev);
+}
+
+/*! @brief Type querying for binary elementwise functions */
+template <typename output_typesT>
+py::object py_binary_ufunc_result_type(const py::dtype &input1_dtype,
+                                       const py::dtype &input2_dtype,
+                                       const output_typesT &output_types_table)
+{
+    int tn1 = input1_dtype.num(); // NumPy type numbers are the same as in dpctl
+    int tn2 = input2_dtype.num(); // NumPy type numbers are the same as in dpctl
+    int src1_typeid = -1;
+    int src2_typeid = -1;
+
+    auto array_types = td_ns::usm_ndarray_types();
+
+    try {
+        src1_typeid = array_types.typenum_to_lookup_id(tn1);
+        src2_typeid = array_types.typenum_to_lookup_id(tn2);
+    } catch (const std::exception &e) {
+        throw py::value_error(e.what());
+    }
+
+    if (src1_typeid < 0 || src1_typeid >= td_ns::num_types || src2_typeid < 0 ||
+        src2_typeid >= td_ns::num_types)
+    {
+        throw std::runtime_error("binary output type lookup failed");
+    }
+    int dst_typeid = output_types_table[src1_typeid][src2_typeid];
+
+    if (dst_typeid < 0) {
+        auto res = py::none();
+        return py::cast<py::object>(res);
+    }
+    else {
+        using type_utils::_dtype_from_typenum;
+
+        auto dst_typenum_t = static_cast<td_ns::typenum_t>(dst_typeid);
+        auto dt = _dtype_from_typenum(dst_typenum_t);
+
+        return py::cast<py::object>(dt);
+    }
+}
+
+// ==================== Inplace binary functions =======================
+
+template <typename output_typesT,
+          typename contig_dispatchT,
+          typename strided_dispatchT,
+          typename contig_row_matrix_dispatchT>
+std::pair<sycl::event, sycl::event>
+    py_binary_inplace_ufunc(const dpctl::tensor::usm_ndarray &lhs,
+                            const dpctl::tensor::usm_ndarray &rhs,
+                            sycl::queue &exec_q,
+                            const std::vector<sycl::event> depends,
+                            //
+                            const output_typesT &output_type_table,
+                            const contig_dispatchT &contig_dispatch_table,
+                            const strided_dispatchT &strided_dispatch_table,
+                            const contig_row_matrix_dispatchT
+                                &contig_row_matrix_broadcast_dispatch_table)
+{
+    dpctl::tensor::validation::CheckWritable::throw_if_not_writable(lhs);
+
+    // check type_nums
+    int rhs_typenum = rhs.get_typenum();
+    int lhs_typenum = lhs.get_typenum();
+
+    auto array_types = td_ns::usm_ndarray_types();
+    int rhs_typeid = array_types.typenum_to_lookup_id(rhs_typenum);
+    int lhs_typeid = array_types.typenum_to_lookup_id(lhs_typenum);
+
+    int output_typeid = output_type_table[rhs_typeid][lhs_typeid];
+
+    if (output_typeid != lhs_typeid) {
+        throw py::value_error(
+            "Left-hand side array has unexpected elemental data type.");
+    }
+
+    // check that queues are compatible
+    if (!dpctl::utils::queues_are_compatible(exec_q, {rhs, lhs})) {
+        throw py::value_error(
+            "Execution queue is not compatible with allocation queues");
+    }
+
+    // check shapes, broadcasting is assumed done by caller
+    // check that dimensions are the same
+    int lhs_nd = lhs.get_ndim();
+    if (lhs_nd != rhs.get_ndim()) {
+        throw py::value_error("Array dimensions are not the same.");
+    }
+
+    // check that shapes are the same
+    const py::ssize_t *rhs_shape = rhs.get_shape_raw();
+    const py::ssize_t *lhs_shape = lhs.get_shape_raw();
+    bool shapes_equal(true);
+    size_t rhs_nelems(1);
+
+    for (int i = 0; i < lhs_nd; ++i) {
+        rhs_nelems *= static_cast<size_t>(rhs_shape[i]);
+        shapes_equal = shapes_equal && (rhs_shape[i] == lhs_shape[i]);
+    }
+    if (!shapes_equal) {
+        throw py::value_error("Array shapes are not the same.");
+    }
+
+    // if nelems is zero, return
+    if (rhs_nelems == 0) {
+        return std::make_pair(sycl::event(), sycl::event());
+    }
+
+    dpctl::tensor::validation::AmpleMemory::throw_if_not_ample(lhs, rhs_nelems);
+
+    // check memory overlap
+    auto const &same_logical_tensors =
+        dpctl::tensor::overlap::SameLogicalTensors();
+    auto const &overlap = dpctl::tensor::overlap::MemoryOverlap();
+    if (overlap(rhs, lhs) && !same_logical_tensors(rhs, lhs)) {
+        throw py::value_error("Arrays index overlapping segments of memory");
+    }
+    // check memory overlap
+    const char *rhs_data = rhs.get_data();
+    char *lhs_data = lhs.get_data();
+
+    // handle contiguous inputs
+    bool is_rhs_c_contig = rhs.is_c_contiguous();
+    bool is_rhs_f_contig = rhs.is_f_contiguous();
+
+    bool is_lhs_c_contig = lhs.is_c_contiguous();
+    bool is_lhs_f_contig = lhs.is_f_contiguous();
+
+    bool both_c_contig = (is_rhs_c_contig && is_lhs_c_contig);
+    bool both_f_contig = (is_rhs_f_contig && is_lhs_f_contig);
+
+    // dispatch for contiguous inputs
+    if (both_c_contig || both_f_contig) {
+        auto contig_fn = contig_dispatch_table[rhs_typeid][lhs_typeid];
+
+        if (contig_fn != nullptr) {
+            auto comp_ev = contig_fn(exec_q, rhs_nelems, rhs_data, 0, lhs_data,
+                                     0, depends);
+            sycl::event ht_ev =
+                dpctl::utils::keep_args_alive(exec_q, {rhs, lhs}, {comp_ev});
+
+            return std::make_pair(ht_ev, comp_ev);
+        }
+    }
+
+    // simplify strides
+    auto const &rhs_strides = rhs.get_strides_vector();
+    auto const &lhs_strides = lhs.get_strides_vector();
+
+    using shT = std::vector<py::ssize_t>;
+    shT simplified_shape;
+    shT simplified_rhs_strides;
+    shT simplified_lhs_strides;
+    py::ssize_t rhs_offset(0);
+    py::ssize_t lhs_offset(0);
+
+    int nd = lhs_nd;
+    const py::ssize_t *shape = rhs_shape;
+
+    simplify_iteration_space(nd, shape, rhs_strides, lhs_strides,
+                             // outputs
+                             simplified_shape, simplified_rhs_strides,
+                             simplified_lhs_strides, rhs_offset, lhs_offset);
+
+    std::vector<sycl::event> host_tasks{};
+    if (nd < 3) {
+        static constexpr auto unit_stride =
+            std::initializer_list<py::ssize_t>{1};
+
+        if ((nd == 1) && isEqual(simplified_rhs_strides, unit_stride) &&
+            isEqual(simplified_lhs_strides, unit_stride))
+        {
+            auto contig_fn = contig_dispatch_table[rhs_typeid][lhs_typeid];
+
+            if (contig_fn != nullptr) {
+                auto comp_ev =
+                    contig_fn(exec_q, rhs_nelems, rhs_data, rhs_offset,
+                              lhs_data, lhs_offset, depends);
+                sycl::event ht_ev = dpctl::utils::keep_args_alive(
+                    exec_q, {rhs, lhs}, {comp_ev});
+
+                return std::make_pair(ht_ev, comp_ev);
+            }
+        }
+        if (nd == 2) {
+            static constexpr auto one_zero_strides =
+                std::initializer_list<py::ssize_t>{1, 0};
+            constexpr py::ssize_t one{1};
+            // special case of C-contiguous matrix and a row
+            if (isEqual(simplified_rhs_strides, one_zero_strides) &&
+                isEqual(simplified_lhs_strides, {one, simplified_shape[0]}))
+            {
+                auto row_matrix_broadcast_fn =
+                    contig_row_matrix_broadcast_dispatch_table[rhs_typeid]
+                                                              [lhs_typeid];
+                if (row_matrix_broadcast_fn != nullptr) {
+                    size_t n0 = simplified_shape[1];
+                    size_t n1 = simplified_shape[0];
+                    sycl::event comp_ev = row_matrix_broadcast_fn(
+                        exec_q, host_tasks, n0, n1, rhs_data, rhs_offset,
+                        lhs_data, lhs_offset, depends);
+
+                    return std::make_pair(dpctl::utils::keep_args_alive(
+                                              exec_q, {lhs, rhs}, host_tasks),
+                                          comp_ev);
+                }
+            }
+        }
+    }
+
+    // dispatch to strided code
+    auto strided_fn = strided_dispatch_table[rhs_typeid][lhs_typeid];
+
+    if (strided_fn == nullptr) {
+        throw std::runtime_error(
+            "Strided implementation is missing for rhs_typeid=" +
+            std::to_string(rhs_typeid) +
+            " and lhs_typeid=" + std::to_string(lhs_typeid));
+    }
+
+    using dpctl::tensor::offset_utils::device_allocate_and_pack;
+    const auto &ptr_sz_event_triple_ = device_allocate_and_pack<py::ssize_t>(
+        exec_q, host_tasks, simplified_shape, simplified_rhs_strides,
+        simplified_lhs_strides);
+
+    py::ssize_t *shape_strides = std::get<0>(ptr_sz_event_triple_);
+    const sycl::event &copy_shape_ev = std::get<2>(ptr_sz_event_triple_);
+
+    if (shape_strides == nullptr) {
+        throw std::runtime_error("Unabled to allocate device memory");
+    }
+
+    sycl::event strided_fn_ev =
+        strided_fn(exec_q, rhs_nelems, nd, shape_strides, rhs_data, rhs_offset,
+                   lhs_data, lhs_offset, depends, {copy_shape_ev});
+
+    // async free of shape_strides temporary
+    auto ctx = exec_q.get_context();
+
+    sycl::event tmp_cleanup_ev = exec_q.submit([&](sycl::handler &cgh) {
+        cgh.depends_on(strided_fn_ev);
+        cgh.host_task(
+            [ctx, shape_strides]() { sycl::free(shape_strides, ctx); });
+    });
+
+    host_tasks.push_back(tmp_cleanup_ev);
+
+    return std::make_pair(
+        dpctl::utils::keep_args_alive(exec_q, {rhs, lhs}, host_tasks),
+        strided_fn_ev);
+}
+
+} // namespace dpnp::extensions::py_internal
diff --git a/dpnp/backend/extensions/elementwise_functions/elementwise_functions_type_utils.cpp b/dpnp/backend/extensions/elementwise_functions/elementwise_functions_type_utils.cpp
new file mode 100644
index 000000000000..3f88f735a710
--- /dev/null
+++ b/dpnp/backend/extensions/elementwise_functions/elementwise_functions_type_utils.cpp
@@ -0,0 +1,87 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include "dpctl4pybind11.hpp"
+
+#include <pybind11/numpy.h>
+#include <pybind11/pybind11.h>
+#include <sycl/sycl.hpp>
+
+#include "elementwise_functions_type_utils.hpp"
+
+// dpctl tensor headers
+#include "utils/type_dispatch.hpp"
+
+namespace py = pybind11;
+namespace td_ns = dpctl::tensor::type_dispatch;
+
+namespace dpnp::extensions::py_internal::type_utils
+{
+py::dtype _dtype_from_typenum(td_ns::typenum_t dst_typenum_t)
+{
+    switch (dst_typenum_t) {
+    case td_ns::typenum_t::BOOL:
+        return py::dtype("?");
+    case td_ns::typenum_t::INT8:
+        return py::dtype("i1");
+    case td_ns::typenum_t::UINT8:
+        return py::dtype("u1");
+    case td_ns::typenum_t::INT16:
+        return py::dtype("i2");
+    case td_ns::typenum_t::UINT16:
+        return py::dtype("u2");
+    case td_ns::typenum_t::INT32:
+        return py::dtype("i4");
+    case td_ns::typenum_t::UINT32:
+        return py::dtype("u4");
+    case td_ns::typenum_t::INT64:
+        return py::dtype("i8");
+    case td_ns::typenum_t::UINT64:
+        return py::dtype("u8");
+    case td_ns::typenum_t::HALF:
+        return py::dtype("f2");
+    case td_ns::typenum_t::FLOAT:
+        return py::dtype("f4");
+    case td_ns::typenum_t::DOUBLE:
+        return py::dtype("f8");
+    case td_ns::typenum_t::CFLOAT:
+        return py::dtype("c8");
+    case td_ns::typenum_t::CDOUBLE:
+        return py::dtype("c16");
+    default:
+        throw py::value_error("Unrecognized dst_typeid");
+    }
+}
+
+int _result_typeid(int arg_typeid, const int *fn_output_id)
+{
+    if (arg_typeid < 0 || arg_typeid >= td_ns::num_types) {
+        throw py::value_error("Input typeid " + std::to_string(arg_typeid) +
+                              " is outside of expected bounds.");
+    }
+
+    return fn_output_id[arg_typeid];
+}
+} // namespace dpnp::extensions::py_internal::type_utils
diff --git a/dpnp/backend/extensions/elementwise_functions/elementwise_functions_type_utils.hpp b/dpnp/backend/extensions/elementwise_functions/elementwise_functions_type_utils.hpp
new file mode 100644
index 000000000000..ede4ea35fad7
--- /dev/null
+++ b/dpnp/backend/extensions/elementwise_functions/elementwise_functions_type_utils.hpp
@@ -0,0 +1,47 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include "dpctl4pybind11.hpp"
+#include <pybind11/numpy.h>
+#include <pybind11/pybind11.h>
+#include <pybind11/stl.h>
+
+// dpctl tensor headers
+#include "utils/type_dispatch.hpp"
+
+namespace py = pybind11;
+namespace td_ns = dpctl::tensor::type_dispatch;
+
+namespace dpnp::extensions::py_internal::type_utils
+{
+/*! @brief Produce dtype from a type number */
+extern py::dtype _dtype_from_typenum(td_ns::typenum_t);
+
+/*! @brief Lookup typeid of the result from typeid of
+ *         argument and the mapping table */
+extern int _result_typeid(int, const int *);
+} // namespace dpnp::extensions::py_internal::type_utils
diff --git a/dpnp/backend/extensions/elementwise_functions/simplify_iteration_space.cpp b/dpnp/backend/extensions/elementwise_functions/simplify_iteration_space.cpp
new file mode 100644
index 000000000000..a3ab0b99b7a2
--- /dev/null
+++ b/dpnp/backend/extensions/elementwise_functions/simplify_iteration_space.cpp
@@ -0,0 +1,205 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include "dpctl4pybind11.hpp"
+
+#include <pybind11/pybind11.h>
+#include <vector>
+
+#include "simplify_iteration_space.hpp"
+
+// dpctl tensor headers
+#include "utils/strided_iters.hpp"
+
+namespace dpnp::extensions::py_internal
+{
+namespace py = pybind11;
+namespace st_ns = dpctl::tensor::strides;
+
+void simplify_iteration_space(int &nd,
+                              const py::ssize_t *const &shape,
+                              std::vector<py::ssize_t> const &src_strides,
+                              std::vector<py::ssize_t> const &dst_strides,
+                              // output
+                              std::vector<py::ssize_t> &simplified_shape,
+                              std::vector<py::ssize_t> &simplified_src_strides,
+                              std::vector<py::ssize_t> &simplified_dst_strides,
+                              py::ssize_t &src_offset,
+                              py::ssize_t &dst_offset)
+{
+    if (nd > 1) {
+        // Simplify iteration space to reduce dimensionality
+        // and improve access pattern
+        simplified_shape.reserve(nd);
+        simplified_shape.insert(std::begin(simplified_shape), shape,
+                                shape + nd);
+        assert(simplified_shape.size() == static_cast<size_t>(nd));
+
+        simplified_src_strides.reserve(nd);
+        simplified_src_strides.insert(std::end(simplified_src_strides),
+                                      std::begin(src_strides),
+                                      std::end(src_strides));
+        assert(simplified_src_strides.size() == static_cast<size_t>(nd));
+
+        simplified_dst_strides.reserve(nd);
+        simplified_dst_strides.insert(std::end(simplified_dst_strides),
+                                      std::begin(dst_strides),
+                                      std::end(dst_strides));
+        assert(simplified_dst_strides.size() == static_cast<size_t>(nd));
+
+        int contracted_nd = st_ns::simplify_iteration_two_strides(
+            nd, simplified_shape.data(), simplified_src_strides.data(),
+            simplified_dst_strides.data(),
+            src_offset, // modified by reference
+            dst_offset  // modified by reference
+        );
+        simplified_shape.resize(contracted_nd);
+        simplified_src_strides.resize(contracted_nd);
+        simplified_dst_strides.resize(contracted_nd);
+
+        nd = contracted_nd;
+    }
+    else if (nd == 1) {
+        src_offset = 0;
+        dst_offset = 0;
+        // Populate vectors
+        simplified_shape.reserve(nd);
+        simplified_shape.push_back(shape[0]);
+        assert(simplified_shape.size() == static_cast<size_t>(nd));
+
+        simplified_src_strides.reserve(nd);
+        simplified_dst_strides.reserve(nd);
+
+        if (src_strides[0] < 0 && dst_strides[0] < 0) {
+            simplified_src_strides.push_back(-src_strides[0]);
+            simplified_dst_strides.push_back(-dst_strides[0]);
+            if (shape[0] > 1) {
+                src_offset += (shape[0] - 1) * src_strides[0];
+                dst_offset += (shape[0] - 1) * dst_strides[0];
+            }
+        }
+        else {
+            simplified_src_strides.push_back(src_strides[0]);
+            simplified_dst_strides.push_back(dst_strides[0]);
+        }
+
+        assert(simplified_src_strides.size() == static_cast<size_t>(nd));
+        assert(simplified_dst_strides.size() == static_cast<size_t>(nd));
+    }
+}
+
+void simplify_iteration_space_3(
+    int &nd,
+    const py::ssize_t *const &shape,
+    // src1
+    std::vector<py::ssize_t> const &src1_strides,
+    // src2
+    std::vector<py::ssize_t> const &src2_strides,
+    // dst
+    std::vector<py::ssize_t> const &dst_strides,
+    // output
+    std::vector<py::ssize_t> &simplified_shape,
+    std::vector<py::ssize_t> &simplified_src1_strides,
+    std::vector<py::ssize_t> &simplified_src2_strides,
+    std::vector<py::ssize_t> &simplified_dst_strides,
+    py::ssize_t &src1_offset,
+    py::ssize_t &src2_offset,
+    py::ssize_t &dst_offset)
+{
+    if (nd > 1) {
+        // Simplify iteration space to reduce dimensionality
+        // and improve access pattern
+        simplified_shape.reserve(nd);
+        simplified_shape.insert(std::end(simplified_shape), shape, shape + nd);
+        assert(simplified_shape.size() == static_cast<size_t>(nd));
+
+        simplified_src1_strides.reserve(nd);
+        simplified_src1_strides.insert(std::end(simplified_src1_strides),
+                                       std::begin(src1_strides),
+                                       std::end(src1_strides));
+        assert(simplified_src1_strides.size() == static_cast<size_t>(nd));
+
+        simplified_src2_strides.reserve(nd);
+        simplified_src2_strides.insert(std::end(simplified_src2_strides),
+                                       std::begin(src2_strides),
+                                       std::end(src2_strides));
+        assert(simplified_src2_strides.size() == static_cast<size_t>(nd));
+
+        simplified_dst_strides.reserve(nd);
+        simplified_dst_strides.insert(std::end(simplified_dst_strides),
+                                      std::begin(dst_strides),
+                                      std::end(dst_strides));
+        assert(simplified_dst_strides.size() == static_cast<size_t>(nd));
+
+        int contracted_nd = st_ns::simplify_iteration_three_strides(
+            nd, simplified_shape.data(), simplified_src1_strides.data(),
+            simplified_src2_strides.data(), simplified_dst_strides.data(),
+            src1_offset, // modified by reference
+            src2_offset, // modified by reference
+            dst_offset   // modified by reference
+        );
+        simplified_shape.resize(contracted_nd);
+        simplified_src1_strides.resize(contracted_nd);
+        simplified_src2_strides.resize(contracted_nd);
+        simplified_dst_strides.resize(contracted_nd);
+
+        nd = contracted_nd;
+    }
+    else if (nd == 1) {
+        src1_offset = 0;
+        src2_offset = 0;
+        dst_offset = 0;
+        // Populate vectors
+        simplified_shape.reserve(nd);
+        simplified_shape.push_back(shape[0]);
+        assert(simplified_shape.size() == static_cast<size_t>(nd));
+
+        simplified_src1_strides.reserve(nd);
+        simplified_src2_strides.reserve(nd);
+        simplified_dst_strides.reserve(nd);
+
+        if ((src1_strides[0] < 0) && (src2_strides[0] < 0) &&
+            (dst_strides[0] < 0)) {
+            simplified_src1_strides.push_back(-src1_strides[0]);
+            simplified_src2_strides.push_back(-src2_strides[0]);
+            simplified_dst_strides.push_back(-dst_strides[0]);
+            if (shape[0] > 1) {
+                src1_offset += src1_strides[0] * (shape[0] - 1);
+                src2_offset += src2_strides[0] * (shape[0] - 1);
+                dst_offset += dst_strides[0] * (shape[0] - 1);
+            }
+        }
+        else {
+            simplified_src1_strides.push_back(src1_strides[0]);
+            simplified_src2_strides.push_back(src2_strides[0]);
+            simplified_dst_strides.push_back(dst_strides[0]);
+        }
+
+        assert(simplified_src1_strides.size() == static_cast<size_t>(nd));
+        assert(simplified_src2_strides.size() == static_cast<size_t>(nd));
+        assert(simplified_dst_strides.size() == static_cast<size_t>(nd));
+    }
+}
+} // namespace dpnp::extensions::py_internal
diff --git a/dpnp/backend/extensions/elementwise_functions/simplify_iteration_space.hpp b/dpnp/backend/extensions/elementwise_functions/simplify_iteration_space.hpp
new file mode 100644
index 000000000000..111050ae59a6
--- /dev/null
+++ b/dpnp/backend/extensions/elementwise_functions/simplify_iteration_space.hpp
@@ -0,0 +1,61 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <pybind11/pybind11.h>
+#include <vector>
+
+namespace dpnp::extensions::py_internal
+{
+namespace py = pybind11;
+
+void simplify_iteration_space(int &,
+                              const py::ssize_t *const &,
+                              std::vector<py::ssize_t> const &,
+                              std::vector<py::ssize_t> const &,
+                              std::vector<py::ssize_t> &,
+                              std::vector<py::ssize_t> &,
+                              std::vector<py::ssize_t> &,
+                              py::ssize_t &,
+                              py::ssize_t &);
+
+void simplify_iteration_space_3(int &,
+                                const py::ssize_t *const &,
+                                // src1
+                                std::vector<py::ssize_t> const &,
+                                // src2
+                                std::vector<py::ssize_t> const &,
+                                // dst
+                                std::vector<py::ssize_t> const &,
+                                // output
+                                std::vector<py::ssize_t> &,
+                                std::vector<py::ssize_t> &,
+                                std::vector<py::ssize_t> &,
+                                std::vector<py::ssize_t> &,
+                                py::ssize_t &,
+                                py::ssize_t &,
+                                py::ssize_t &);
+} // namespace dpnp::extensions::py_internal
diff --git a/dpnp/backend/extensions/vm/CMakeLists.txt b/dpnp/backend/extensions/vm/CMakeLists.txt
index 1fa895f4e696..ba1e46ea0ed8 100644
--- a/dpnp/backend/extensions/vm/CMakeLists.txt
+++ b/dpnp/backend/extensions/vm/CMakeLists.txt
@@ -23,12 +23,54 @@
 # THE POSSIBILITY OF SUCH DAMAGE.
 # *****************************************************************************
 
+set(_elementwise_sources
+    ${CMAKE_CURRENT_SOURCE_DIR}/abs.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/acos.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/acosh.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/add.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/asin.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/asinh.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/atan.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/atan2.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/atanh.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/cbrt.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/ceil.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/conj.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/cos.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/cosh.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/div.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/exp.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/exp2.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/expm1.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/floor.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/hypot.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/ln.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/log10.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/log1p.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/log2.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/mul.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/pow.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/rint.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/sin.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/sinh.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/sqr.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/sqrt.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/sub.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/tan.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/tanh.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/trunc.cpp
+)
 
-set(python_module_name _vm_impl)
 set(_module_src
+    # TODO: remove sources from `elementwise_functions` folder
+    ${CMAKE_CURRENT_SOURCE_DIR}/../elementwise_functions/elementwise_functions_type_utils.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/../elementwise_functions/simplify_iteration_space.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/vm_py.cpp
+    ${_elementwise_sources}
 )
 
+set(python_module_name _vm_impl)
+
 pybind11_add_module(${python_module_name} MODULE ${_module_src})
 add_sycl_to_target(TARGET ${python_module_name} SOURCES ${_module_src})
 
diff --git a/dpnp/backend/extensions/vm/abs.cpp b/dpnp/backend/extensions/vm/abs.cpp
new file mode 100644
index 000000000000..7eb7086de85e
--- /dev/null
+++ b/dpnp/backend/extensions/vm/abs.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "abs.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::abs<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>, double>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>, float>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event abs_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   char *out_y,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::abs(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(abs);
+} // namespace impl
+
+void init_abs(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto abs_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                         const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_abs", abs_pyapi,
+          "Call `abs` function from OneMKL VM library to compute "
+          "the absolute value of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto abs_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                      const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_abs_to_call", abs_need_to_call_pyapi,
+          "Check input arguments to answer if `abs` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/abs.hpp b/dpnp/backend/extensions/vm/abs.hpp
index bb5e55010b4f..9e074bc1ac88 100644
--- a/dpnp/backend/extensions/vm/abs.hpp
+++ b/dpnp/backend/extensions/vm/abs.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event abs_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::AbsOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::abs(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct AbsContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::AbsOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return abs_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_abs(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/acos.cpp b/dpnp/backend/extensions/vm/acos.cpp
new file mode 100644
index 000000000000..ab744bf99c44
--- /dev/null
+++ b/dpnp/backend/extensions/vm/acos.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "acos.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::acos<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event acos_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::acos(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(acos);
+} // namespace impl
+
+void init_acos(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto acos_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_acos", acos_pyapi,
+          "Call `acos` function from OneMKL VM library to compute "
+          "the inverse cosine of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto acos_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_acos_to_call", acos_need_to_call_pyapi,
+          "Check input arguments to answer if `acos` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/acos.hpp b/dpnp/backend/extensions/vm/acos.hpp
index 029a9d9c886a..2bfb2a71d6b8 100644
--- a/dpnp/backend/extensions/vm/acos.hpp
+++ b/dpnp/backend/extensions/vm/acos.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event acos_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::AcosOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::acos(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct AcosContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::AcosOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return acos_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_acos(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/acosh.cpp b/dpnp/backend/extensions/vm/acosh.cpp
new file mode 100644
index 000000000000..2cab39313d20
--- /dev/null
+++ b/dpnp/backend/extensions/vm/acosh.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "acosh.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::acosh<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event acosh_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     char *out_y,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::acosh(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(acosh);
+} // namespace impl
+
+void init_acosh(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto acosh_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                           const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_acosh", acosh_pyapi,
+          "Call `acosh` function from OneMKL VM library to compute "
+          "the inverse hyperbolic cosine of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto acosh_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                        const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_acosh_to_call", acosh_need_to_call_pyapi,
+          "Check input arguments to answer if `acosh` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/acosh.hpp b/dpnp/backend/extensions/vm/acosh.hpp
index 9f86ae589cf5..6cfde12cbcb3 100644
--- a/dpnp/backend/extensions/vm/acosh.hpp
+++ b/dpnp/backend/extensions/vm/acosh.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event acosh_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::AcoshOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::acosh(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct AcoshContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::AcoshOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return acosh_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_acosh(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/add.cpp b/dpnp/backend/extensions/vm/add.cpp
new file mode 100644
index 000000000000..c43f07bbcde1
--- /dev/null
+++ b/dpnp/backend/extensions/vm/add.cpp
@@ -0,0 +1,171 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "add.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::add<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<double>,
+                                        T2,
+                                        std::complex<double>,
+                                        std::complex<double>>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<float>,
+                                        T2,
+                                        std::complex<float>,
+                                        std::complex<float>>,
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event add_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   ssize_t a_offset,
+                                   const char *in_b,
+                                   ssize_t b_offset,
+                                   char *out_y,
+                                   ssize_t out_offset,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::add(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing 1st input vector of size n
+                       b, // pointer `b` containing 2nd input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(add);
+} // namespace impl
+
+void init_add(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto add_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                         const arrayT &src2, const arrayT &dst,
+                         const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_add", add_pyapi,
+          "Call `add` function from OneMKL VM library to performs element "
+          "by element addition of vector `src1` by vector `src2` "
+          "to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto add_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                      const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_add_to_call", add_need_to_call_pyapi,
+          "Check input arguments to answer if `add` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/add.hpp b/dpnp/backend/extensions/vm/add.hpp
index 47ff60ed96a6..824fb649f2d0 100644
--- a/dpnp/backend/extensions/vm/add.hpp
+++ b/dpnp/backend/extensions/vm/add.hpp
@@ -25,58 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event add_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            const char *in_b,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    const T *b = reinterpret_cast<const T *>(in_b);
-    using resTy = typename types::AddOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::add(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing 1st input vector of size n
-                       b, // pointer `b` containing 2nd input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct AddContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::AddOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return add_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_add(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/asin.cpp b/dpnp/backend/extensions/vm/asin.cpp
new file mode 100644
index 000000000000..afbb868e8cca
--- /dev/null
+++ b/dpnp/backend/extensions/vm/asin.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "asin.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::asin<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event asin_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::asin(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(asin);
+} // namespace impl
+
+void init_asin(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto asin_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_asin", asin_pyapi,
+          "Call `asin` function from OneMKL VM library to compute "
+          "the inverse sine of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto asin_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_asin_to_call", asin_need_to_call_pyapi,
+          "Check input arguments to answer if `asin` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/asin.hpp b/dpnp/backend/extensions/vm/asin.hpp
index 5e44aa5bde68..a37bff38fbc7 100644
--- a/dpnp/backend/extensions/vm/asin.hpp
+++ b/dpnp/backend/extensions/vm/asin.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event asin_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::AsinOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::asin(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct AsinContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::AsinOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return asin_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_asin(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/asinh.cpp b/dpnp/backend/extensions/vm/asinh.cpp
new file mode 100644
index 000000000000..0f70c3cb5010
--- /dev/null
+++ b/dpnp/backend/extensions/vm/asinh.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "asinh.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::asinh<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event asinh_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     char *out_y,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::asinh(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(asinh);
+} // namespace impl
+
+void init_asinh(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto asinh_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                           const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_asinh", asinh_pyapi,
+          "Call `asinh` function from OneMKL VM library to compute "
+          "the inverse hyperbolic sine of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto asinh_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                        const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_asinh_to_call", asinh_need_to_call_pyapi,
+          "Check input arguments to answer if `asinh` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/asinh.hpp b/dpnp/backend/extensions/vm/asinh.hpp
index 58e2815e3f7f..ad40f0d4efb4 100644
--- a/dpnp/backend/extensions/vm/asinh.hpp
+++ b/dpnp/backend/extensions/vm/asinh.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event asinh_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::AsinhOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::asinh(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct AsinhContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::AsinhOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return asinh_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_asinh(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/atan.cpp b/dpnp/backend/extensions/vm/atan.cpp
new file mode 100644
index 000000000000..59f7064ef156
--- /dev/null
+++ b/dpnp/backend/extensions/vm/atan.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "atan.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::atan<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event atan_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::atan(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(atan);
+} // namespace impl
+
+void init_atan(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto atan_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_atan", atan_pyapi,
+          "Call `atan` function from OneMKL VM library to compute "
+          "the inverse tangent of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto atan_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_atan_to_call", atan_need_to_call_pyapi,
+          "Check input arguments to answer if `atan` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/atan.hpp b/dpnp/backend/extensions/vm/atan.hpp
index b36abc161383..90547e92c8d9 100644
--- a/dpnp/backend/extensions/vm/atan.hpp
+++ b/dpnp/backend/extensions/vm/atan.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event atan_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::AtanOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::atan(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct AtanContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::AtanOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return atan_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_atan(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/atan2.cpp b/dpnp/backend/extensions/vm/atan2.cpp
new file mode 100644
index 000000000000..30bb59c9c422
--- /dev/null
+++ b/dpnp/backend/extensions/vm/atan2.cpp
@@ -0,0 +1,160 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "atan2.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::atan2<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event atan2_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     ssize_t a_offset,
+                                     const char *in_b,
+                                     ssize_t b_offset,
+                                     char *out_y,
+                                     ssize_t out_offset,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::atan2(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing 1st input vector of size n
+                         b, // pointer `b` containing 2nd input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(atan2);
+} // namespace impl
+
+void init_atan2(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto atan2_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                           const arrayT &src2, const arrayT &dst,
+                           const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_atan2", atan2_pyapi,
+          "Call `atan2` function from OneMKL VM library to compute element "
+          "by element inverse tangent of `x1/x2`",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto atan2_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                        const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_atan2_to_call", atan2_need_to_call_pyapi,
+          "Check input arguments to answer if `atan2` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/atan2.hpp b/dpnp/backend/extensions/vm/atan2.hpp
index 19a66e877ac4..cd0e259914c8 100644
--- a/dpnp/backend/extensions/vm/atan2.hpp
+++ b/dpnp/backend/extensions/vm/atan2.hpp
@@ -25,58 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event atan2_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              const char *in_b,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    const T *b = reinterpret_cast<const T *>(in_b);
-    using resTy = typename types::Atan2OutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::atan2(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing 1st input vector of size n
-                         b, // pointer `b` containing 2nd input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct Atan2ContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::Atan2OutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return atan2_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_atan2(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/atanh.cpp b/dpnp/backend/extensions/vm/atanh.cpp
new file mode 100644
index 000000000000..bd32d25f2a6b
--- /dev/null
+++ b/dpnp/backend/extensions/vm/atanh.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "atanh.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::atanh<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event atanh_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     char *out_y,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::atanh(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(atanh);
+} // namespace impl
+
+void init_atanh(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto atanh_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                           const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_atanh", atanh_pyapi,
+          "Call `atanh` function from OneMKL VM library to compute "
+          "the inverse hyperbolic tangent of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto atanh_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                        const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_atanh_to_call", atanh_need_to_call_pyapi,
+          "Check input arguments to answer if `atanh` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/atanh.hpp b/dpnp/backend/extensions/vm/atanh.hpp
index 9764df84ce31..afe404adf9bd 100644
--- a/dpnp/backend/extensions/vm/atanh.hpp
+++ b/dpnp/backend/extensions/vm/atanh.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event atanh_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::AtanhOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::atanh(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct AtanhContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::AtanhOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return atanh_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_atanh(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/cbrt.cpp b/dpnp/backend/extensions/vm/cbrt.cpp
new file mode 100644
index 000000000000..88bc82824180
--- /dev/null
+++ b/dpnp/backend/extensions/vm/cbrt.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "cbrt.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::cbrt<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event cbrt_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::cbrt(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(cbrt);
+} // namespace impl
+
+void init_cbrt(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto cbrt_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_cbrt", cbrt_pyapi,
+          "Call `cbrt` function from OneMKL VM library to compute "
+          "the element-wise cube root of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto cbrt_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_cbrt_to_call", cbrt_need_to_call_pyapi,
+          "Check input arguments to answer if `cbrt` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/cbrt.hpp b/dpnp/backend/extensions/vm/cbrt.hpp
index 5c0a0adc53e0..d4eb052a65b6 100644
--- a/dpnp/backend/extensions/vm/cbrt.hpp
+++ b/dpnp/backend/extensions/vm/cbrt.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event cbrt_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::CbrtOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::cbrt(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct CbrtContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::CbrtOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return cbrt_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_cbrt(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/ceil.cpp b/dpnp/backend/extensions/vm/ceil.cpp
new file mode 100644
index 000000000000..14e7234a54c9
--- /dev/null
+++ b/dpnp/backend/extensions/vm/ceil.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "ceil.hpp"
+#include "common.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::ceil<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event ceil_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::ceil(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(ceil);
+} // namespace impl
+
+void init_ceil(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto ceil_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_ceil", ceil_pyapi,
+          "Call `ceil` function from OneMKL VM library to compute "
+          "the ceiling of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto ceil_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_ceil_to_call", ceil_need_to_call_pyapi,
+          "Check input arguments to answer if `ceil` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/ceil.hpp b/dpnp/backend/extensions/vm/ceil.hpp
index fd4f3a8680ce..dd9006d1b184 100644
--- a/dpnp/backend/extensions/vm/ceil.hpp
+++ b/dpnp/backend/extensions/vm/ceil.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event ceil_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::CeilOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::ceil(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct CeilContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::CeilOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return ceil_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_ceil(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/common.hpp b/dpnp/backend/extensions/vm/common.hpp
index b53b9b0881ca..74e9f81fa0f7 100644
--- a/dpnp/backend/extensions/vm/common.hpp
+++ b/dpnp/backend/extensions/vm/common.hpp
@@ -25,252 +25,46 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
-#include <pybind11/numpy.h>
 #include <pybind11/pybind11.h>
 
 // dpctl tensor headers
 #include "utils/memory_overlap.hpp"
 #include "utils/type_dispatch.hpp"
-#include "utils/type_utils.hpp"
 
 #include "dpnp_utils.hpp"
 
 static_assert(INTEL_MKL_VERSION >= __INTEL_MKL_2023_2_0_VERSION_REQUIRED,
               "OneMKL does not meet minimum version requirement");
 
-// OneMKL namespace with VM functions
-namespace mkl_vm = oneapi::mkl::vm;
-
-// dpctl namespace for type utils
-namespace type_utils = dpctl::tensor::type_utils;
-
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-typedef sycl::event (*unary_impl_fn_ptr_t)(sycl::queue,
-                                           const std::int64_t,
-                                           const char *,
-                                           char *,
-                                           const std::vector<sycl::event> &);
-
-typedef sycl::event (*binary_impl_fn_ptr_t)(sycl::queue,
-                                            const std::int64_t,
-                                            const char *,
-                                            const char *,
-                                            char *,
-                                            const std::vector<sycl::event> &);
-
-namespace dpctl_td_ns = dpctl::tensor::type_dispatch;
 namespace py = pybind11;
+namespace td_ns = dpctl::tensor::type_dispatch;
 
-template <typename dispatchT>
-std::pair<sycl::event, sycl::event>
-    unary_ufunc(sycl::queue exec_q,
-                dpctl::tensor::usm_ndarray src,
-                dpctl::tensor::usm_ndarray dst, // dst = op(src), elementwise
-                const std::vector<sycl::event> &depends,
-                const dispatchT &dispatch_vector)
+namespace dpnp::extensions::vm::py_internal
 {
-    // check type_nums
-    int src_typenum = src.get_typenum();
-    auto array_types = dpctl_td_ns::usm_ndarray_types();
-    int src_typeid = array_types.typenum_to_lookup_id(src_typenum);
-
-    // check that queues are compatible
-    if (!dpctl::utils::queues_are_compatible(exec_q, {src, dst})) {
-        throw py::value_error(
-            "Execution queue is not compatible with allocation queues.");
-    }
-
-    // check that dimensions are the same
-    int dst_nd = dst.get_ndim();
-    if (dst_nd != src.get_ndim()) {
-        throw py::value_error(
-            "Input and output arrays have have different dimensions.");
-    }
-
-    // check that shapes are the same
-    const py::ssize_t *src_shape = src.get_shape_raw();
-    const py::ssize_t *dst_shape = dst.get_shape_raw();
-    bool shapes_equal(true);
-    size_t src_nelems(1);
-
-    for (int i = 0; i < dst_nd; ++i) {
-        src_nelems *= static_cast<size_t>(src_shape[i]);
-        shapes_equal = shapes_equal && (src_shape[i] == dst_shape[i]);
-    }
-    if (!shapes_equal) {
-        throw py::value_error("Input and output arrays have different shapes.");
-    }
-
-    // if nelems is zero, return
-    if (src_nelems == 0) {
-        return std::make_pair(sycl::event(), sycl::event());
-    }
-
-    // ensure that output is ample enough to accommodate all elements
-    auto dst_offsets = dst.get_minmax_offsets();
-    // destination must be ample enough to accommodate all elements
-    {
-        size_t range =
-            static_cast<size_t>(dst_offsets.second - dst_offsets.first);
-        if (range + 1 < src_nelems) {
-            throw py::value_error(
-                "Destination array can not accommodate all the elements "
-                "of source array.");
-        }
-    }
-
-    // check memory overlap
-    auto const &overlap = dpctl::tensor::overlap::MemoryOverlap();
-    if (overlap(src, dst)) {
-        throw py::value_error("Arrays index overlapping segments of memory.");
-    }
-
-    const char *src_data = src.get_data();
-    char *dst_data = dst.get_data();
-
-    // handle contiguous inputs
-    bool is_src_c_contig = src.is_c_contiguous();
-    bool is_dst_c_contig = dst.is_c_contiguous();
-
-    bool all_c_contig = (is_src_c_contig && is_dst_c_contig);
-    if (!all_c_contig) {
-        throw py::value_error("Input and outpur arrays must be C-contiguous.");
-    }
-
-    auto dispatch_fn = dispatch_vector[src_typeid];
-    if (dispatch_fn == nullptr) {
-        throw py::value_error("No implementation is defined for ufunc.");
-    }
-    sycl::event comp_ev =
-        dispatch_fn(exec_q, src_nelems, src_data, dst_data, depends);
-
-    sycl::event ht_ev =
-        dpctl::utils::keep_args_alive(exec_q, {src, dst}, {comp_ev});
-    return std::make_pair(ht_ev, comp_ev);
-}
-
-template <typename dispatchT>
-std::pair<sycl::event, sycl::event> binary_ufunc(
-    sycl::queue exec_q,
-    dpctl::tensor::usm_ndarray src1,
-    dpctl::tensor::usm_ndarray src2,
-    dpctl::tensor::usm_ndarray dst, // dst = op(src1, src2), elementwise
-    const std::vector<sycl::event> &depends,
-    const dispatchT &dispatch_vector)
+template <typename output_typesT, typename contig_dispatchT>
+bool need_to_call_unary_ufunc(sycl::queue &exec_q,
+                              const dpctl::tensor::usm_ndarray &src,
+                              const dpctl::tensor::usm_ndarray &dst,
+                              const output_typesT &output_type_vec,
+                              const contig_dispatchT &contig_dispatch_vector)
 {
     // check type_nums
-    int src1_typenum = src1.get_typenum();
-    int src2_typenum = src2.get_typenum();
-
-    auto array_types = dpctl_td_ns::usm_ndarray_types();
-    int src1_typeid = array_types.typenum_to_lookup_id(src1_typenum);
-    int src2_typeid = array_types.typenum_to_lookup_id(src2_typenum);
-
-    if (src1_typeid != src2_typeid) {
-        throw py::value_error("Input arrays have different types.");
-    }
-
-    // check that queues are compatible
-    if (!dpctl::utils::queues_are_compatible(exec_q, {src1, src2, dst})) {
-        throw py::value_error(
-            "Execution queue is not compatible with allocation queues.");
-    }
-
-    // check shapes, broadcasting is assumed done by caller
-    // check that dimensions are the same
-    int dst_nd = dst.get_ndim();
-    if (dst_nd != src1.get_ndim() || dst_nd != src2.get_ndim()) {
-        throw py::value_error("Array dimensions are not the same.");
-    }
-
-    // check that shapes are the same
-    const py::ssize_t *src1_shape = src1.get_shape_raw();
-    const py::ssize_t *src2_shape = src2.get_shape_raw();
-    const py::ssize_t *dst_shape = dst.get_shape_raw();
-    bool shapes_equal(true);
-    size_t src_nelems(1);
-
-    for (int i = 0; i < dst_nd; ++i) {
-        src_nelems *= static_cast<size_t>(src1_shape[i]);
-        shapes_equal = shapes_equal && (src1_shape[i] == dst_shape[i] &&
-                                        src2_shape[i] == dst_shape[i]);
-    }
-    if (!shapes_equal) {
-        throw py::value_error("Array shapes are not the same.");
-    }
-
-    // if nelems is zero, return
-    if (src_nelems == 0) {
-        return std::make_pair(sycl::event(), sycl::event());
-    }
-
-    // ensure that output is ample enough to accommodate all elements
-    auto dst_offsets = dst.get_minmax_offsets();
-    // destination must be ample enough to accommodate all elements
-    {
-        size_t range =
-            static_cast<size_t>(dst_offsets.second - dst_offsets.first);
-        if (range + 1 < src_nelems) {
-            throw py::value_error(
-                "Destination array can not accommodate all the "
-                "elements of source array.");
-        }
-    }
-
-    // check memory overlap
-    auto const &overlap = dpctl::tensor::overlap::MemoryOverlap();
-    if (overlap(src1, dst) || overlap(src2, dst)) {
-        throw py::value_error("Arrays index overlapping segments of memory.");
-    }
-
-    const char *src1_data = src1.get_data();
-    const char *src2_data = src2.get_data();
-    char *dst_data = dst.get_data();
-
-    // handle contiguous inputs
-    bool is_src1_c_contig = src1.is_c_contiguous();
-    bool is_src2_c_contig = src2.is_c_contiguous();
-    bool is_dst_c_contig = dst.is_c_contiguous();
+    int src_typenum = src.get_typenum();
+    int dst_typenum = dst.get_typenum();
 
-    bool all_c_contig =
-        (is_src1_c_contig && is_src2_c_contig && is_dst_c_contig);
-    if (!all_c_contig) {
-        throw py::value_error("Input and outpur arrays must be C-contiguous.");
-    }
+    auto array_types = td_ns::usm_ndarray_types();
+    int src_typeid = array_types.typenum_to_lookup_id(src_typenum);
+    int dst_typeid = array_types.typenum_to_lookup_id(dst_typenum);
 
-    auto dispatch_fn = dispatch_vector[src1_typeid];
-    if (dispatch_fn == nullptr) {
-        throw py::value_error("No implementation is defined for ufunc.");
+    // check that types are supported
+    int func_output_typeid = output_type_vec[src_typeid];
+    if (dst_typeid != func_output_typeid) {
+        return false;
     }
-    sycl::event comp_ev = dispatch_fn(exec_q, src_nelems, src1_data, src2_data,
-                                      dst_data, depends);
-
-    sycl::event ht_ev =
-        dpctl::utils::keep_args_alive(exec_q, {src1, src2, dst}, {comp_ev});
-    return std::make_pair(ht_ev, comp_ev);
-}
-
-template <typename dispatchT>
-bool need_to_call_unary_ufunc(sycl::queue exec_q,
-                              dpctl::tensor::usm_ndarray src,
-                              dpctl::tensor::usm_ndarray dst,
-                              const dispatchT &dispatch_vector)
-{
-    // check type_nums
-    int src_typenum = src.get_typenum();
-    auto array_types = dpctl_td_ns::usm_ndarray_types();
-    int src_typeid = array_types.typenum_to_lookup_id(src_typenum);
 
     // OneMKL VM functions perform a copy on host if no double type support
     if (!exec_q.get_device().has(sycl::aspect::fp64)) {
@@ -338,26 +132,35 @@ bool need_to_call_unary_ufunc(sycl::queue exec_q,
     }
 
     // MKL function is not defined for the type
-    if (dispatch_vector[src_typeid] == nullptr) {
+    if (contig_dispatch_vector[src_typeid] == nullptr) {
         return false;
     }
     return true;
 }
 
-template <typename dispatchT>
-bool need_to_call_binary_ufunc(sycl::queue exec_q,
-                               dpctl::tensor::usm_ndarray src1,
-                               dpctl::tensor::usm_ndarray src2,
-                               dpctl::tensor::usm_ndarray dst,
-                               const dispatchT &dispatch_vector)
+template <typename output_typesT, typename contig_dispatchT>
+bool need_to_call_binary_ufunc(sycl::queue &exec_q,
+                               const dpctl::tensor::usm_ndarray &src1,
+                               const dpctl::tensor::usm_ndarray &src2,
+                               const dpctl::tensor::usm_ndarray &dst,
+                               const output_typesT &output_type_table,
+                               const contig_dispatchT &contig_dispatch_table)
 {
     // check type_nums
     int src1_typenum = src1.get_typenum();
     int src2_typenum = src2.get_typenum();
+    int dst_typenum = dst.get_typenum();
 
-    auto array_types = dpctl_td_ns::usm_ndarray_types();
+    auto array_types = td_ns::usm_ndarray_types();
     int src1_typeid = array_types.typenum_to_lookup_id(src1_typenum);
     int src2_typeid = array_types.typenum_to_lookup_id(src2_typenum);
+    int dst_typeid = array_types.typenum_to_lookup_id(dst_typenum);
+
+    // check that types are supported
+    int output_typeid = output_type_table[src1_typeid][src2_typeid];
+    if (output_typeid != dst_typeid) {
+        return false;
+    }
 
     // types must be the same
     if (src1_typeid != src2_typeid) {
@@ -434,23 +237,110 @@ bool need_to_call_binary_ufunc(sycl::queue exec_q,
     }
 
     // MKL function is not defined for the type
-    if (dispatch_vector[src1_typeid] == nullptr) {
+    if (contig_dispatch_table[src1_typeid] == nullptr) {
         return false;
     }
     return true;
 }
 
+/**
+ * @brief A macro used to define factories and a populating unary functions
+ * to dispatch to a callback with proper OneMKL function within VM extension
+ * scope.
+ */
+#define MACRO_POPULATE_DISPATCH_VECTORS(__name__)                              \
+    template <typename fnT, typename T>                                        \
+    struct ContigFactory                                                       \
+    {                                                                          \
+        fnT get()                                                              \
+        {                                                                      \
+            if constexpr (std::is_same_v<typename OutputType<T>::value_type,   \
+                                         void>) {                              \
+                return nullptr;                                                \
+            }                                                                  \
+            else {                                                             \
+                return __name__##_contig_impl<T>;                              \
+            }                                                                  \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    template <typename fnT, typename T>                                        \
+    struct TypeMapFactory                                                      \
+    {                                                                          \
+        std::enable_if_t<std::is_same<fnT, int>::value, int> get()             \
+        {                                                                      \
+            using rT = typename OutputType<T>::value_type;                     \
+            return td_ns::GetTypeid<rT>{}.get();                               \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    static void populate_dispatch_vectors(void)                                \
+    {                                                                          \
+        py_internal::init_ufunc_dispatch_vector<int, TypeMapFactory>(          \
+            output_typeid_vector);                                             \
+        py_internal::init_ufunc_dispatch_vector<unary_contig_impl_fn_ptr_t,    \
+                                                ContigFactory>(                \
+            contig_dispatch_vector);                                           \
+    };
+
+/**
+ * @brief A macro used to define factories and a populating binary functions
+ * to dispatch to a callback with proper OneMKL function within VM extension
+ * scope.
+ */
+#define MACRO_POPULATE_DISPATCH_TABLES(__name__)                               \
+    template <typename fnT, typename T1, typename T2>                          \
+    struct ContigFactory                                                       \
+    {                                                                          \
+        fnT get()                                                              \
+        {                                                                      \
+            if constexpr (std::is_same_v<                                      \
+                              typename OutputType<T1, T2>::value_type, void>)  \
+            {                                                                  \
+                return nullptr;                                                \
+            }                                                                  \
+            else {                                                             \
+                return __name__##_contig_impl<T1, T2>;                         \
+            }                                                                  \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    template <typename fnT, typename T1, typename T2>                          \
+    struct TypeMapFactory                                                      \
+    {                                                                          \
+        std::enable_if_t<std::is_same<fnT, int>::value, int> get()             \
+        {                                                                      \
+            using rT = typename OutputType<T1, T2>::value_type;                \
+            return td_ns::GetTypeid<rT>{}.get();                               \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    static void populate_dispatch_tables(void)                                 \
+    {                                                                          \
+        py_internal::init_ufunc_dispatch_table<int, TypeMapFactory>(           \
+            output_typeid_vector);                                             \
+        py_internal::init_ufunc_dispatch_table<binary_contig_impl_fn_ptr_t,    \
+                                               ContigFactory>(                 \
+            contig_dispatch_vector);                                           \
+    };
+
 template <typename dispatchT,
           template <typename fnT, typename T>
-          typename factoryT>
+          typename factoryT,
+          int _num_types = td_ns::num_types>
 void init_ufunc_dispatch_vector(dispatchT dispatch_vector[])
 {
-    dpctl_td_ns::DispatchVectorBuilder<dispatchT, factoryT,
-                                       dpctl_td_ns::num_types>
-        contig;
-    contig.populate_dispatch_vector(dispatch_vector);
+    td_ns::DispatchVectorBuilder<dispatchT, factoryT, _num_types> dvb;
+    dvb.populate_dispatch_vector(dispatch_vector);
+}
+
+template <typename dispatchT,
+          template <typename fnT, typename D, typename S>
+          typename factoryT,
+          int _num_types = td_ns::num_types>
+void init_ufunc_dispatch_table(dispatchT dispatch_table[][_num_types])
+{
+    td_ns::DispatchTableBuilder<dispatchT, factoryT, _num_types> dtb;
+    dtb.populate_dispatch_table(dispatch_table);
 }
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::vm::py_internal
diff --git a/dpnp/backend/extensions/vm/conj.cpp b/dpnp/backend/extensions/vm/conj.cpp
new file mode 100644
index 000000000000..edfb4384dad0
--- /dev/null
+++ b/dpnp/backend/extensions/vm/conj.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "conj.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::conj<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event conj_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::conj(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(conj);
+} // namespace impl
+
+void init_conj(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto conj_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_conj", conj_pyapi,
+          "Call `conj` function from OneMKL VM library to compute "
+          "the conjugate of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto conj_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_conj_to_call", conj_need_to_call_pyapi,
+          "Check input arguments to answer if `conj` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/conj.hpp b/dpnp/backend/extensions/vm/conj.hpp
index af3acb3466ea..0ce61082ab6f 100644
--- a/dpnp/backend/extensions/vm/conj.hpp
+++ b/dpnp/backend/extensions/vm/conj.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event conj_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::ConjOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::conj(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct ConjContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::ConjOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return conj_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_conj(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/cos.cpp b/dpnp/backend/extensions/vm/cos.cpp
new file mode 100644
index 000000000000..e7925cc32987
--- /dev/null
+++ b/dpnp/backend/extensions/vm/cos.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "cos.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::cos<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event cos_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   char *out_y,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::cos(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(cos);
+} // namespace impl
+
+void init_cos(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto cos_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                         const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_cos", cos_pyapi,
+          "Call `cos` function from OneMKL VM library to compute "
+          "the cosine of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto cos_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                      const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_cos_to_call", cos_need_to_call_pyapi,
+          "Check input arguments to answer if `cos` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/cos.hpp b/dpnp/backend/extensions/vm/cos.hpp
index a085123ca143..59c92ad0fd8f 100644
--- a/dpnp/backend/extensions/vm/cos.hpp
+++ b/dpnp/backend/extensions/vm/cos.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event cos_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::CosOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::cos(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct CosContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::CosOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return cos_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_cos(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/cosh.cpp b/dpnp/backend/extensions/vm/cosh.cpp
new file mode 100644
index 000000000000..bb883c97c33e
--- /dev/null
+++ b/dpnp/backend/extensions/vm/cosh.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "cosh.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::cosh<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event cosh_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::cosh(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(cosh);
+} // namespace impl
+
+void init_cosh(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto cosh_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_cosh", cosh_pyapi,
+          "Call `cosh` function from OneMKL VM library to compute "
+          "the hyperbolic cosine of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto cosh_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_cosh_to_call", cosh_need_to_call_pyapi,
+          "Check input arguments to answer if `cosh` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/cosh.hpp b/dpnp/backend/extensions/vm/cosh.hpp
index 301a2fbeb22c..030ef945823b 100644
--- a/dpnp/backend/extensions/vm/cosh.hpp
+++ b/dpnp/backend/extensions/vm/cosh.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event cosh_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::CoshOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::cosh(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct CoshContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::CoshOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return cosh_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_cosh(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/div.cpp b/dpnp/backend/extensions/vm/div.cpp
new file mode 100644
index 000000000000..8cdb547feb4e
--- /dev/null
+++ b/dpnp/backend/extensions/vm/div.cpp
@@ -0,0 +1,171 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "div.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::div<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<double>,
+                                        T2,
+                                        std::complex<double>,
+                                        std::complex<double>>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<float>,
+                                        T2,
+                                        std::complex<float>,
+                                        std::complex<float>>,
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event div_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   ssize_t a_offset,
+                                   const char *in_b,
+                                   ssize_t b_offset,
+                                   char *out_y,
+                                   ssize_t out_offset,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::div(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing 1st input vector of size n
+                       b, // pointer `b` containing 2nd input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(div);
+} // namespace impl
+
+void init_div(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto div_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                         const arrayT &src2, const arrayT &dst,
+                         const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_div", div_pyapi,
+          "Call `div` function from OneMKL VM library to performs element "
+          "by element division of vector `src1` by vector `src2` "
+          "to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto div_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                      const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_div_to_call", div_need_to_call_pyapi,
+          "Check input arguments to answer if `div` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/div.hpp b/dpnp/backend/extensions/vm/div.hpp
index c13066604840..8095f0bb2cb6 100644
--- a/dpnp/backend/extensions/vm/div.hpp
+++ b/dpnp/backend/extensions/vm/div.hpp
@@ -25,58 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event div_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            const char *in_b,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    const T *b = reinterpret_cast<const T *>(in_b);
-    using resTy = typename types::DivOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::div(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing 1st input vector of size n
-                       b, // pointer `b` containing 2nd input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct DivContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::DivOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return div_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_div(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/exp.cpp b/dpnp/backend/extensions/vm/exp.cpp
new file mode 100644
index 000000000000..b7f8d4422d18
--- /dev/null
+++ b/dpnp/backend/extensions/vm/exp.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "exp.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::exp<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event exp_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   char *out_y,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::exp(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(exp);
+} // namespace impl
+
+void init_exp(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto exp_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                         const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_exp", exp_pyapi,
+          "Call `exp` function from OneMKL VM library to compute "
+          "the natural (base-e) exponential of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto exp_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                      const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_exp_to_call", exp_need_to_call_pyapi,
+          "Check input arguments to answer if `exp` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/exp.hpp b/dpnp/backend/extensions/vm/exp.hpp
index 936b6a5a0ce5..a1d88998fd4f 100644
--- a/dpnp/backend/extensions/vm/exp.hpp
+++ b/dpnp/backend/extensions/vm/exp.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event exp_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::ExpOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::exp(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct ExpContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::ExpOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return exp_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_exp(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/exp2.cpp b/dpnp/backend/extensions/vm/exp2.cpp
new file mode 100644
index 000000000000..8b5d7a7c5ff3
--- /dev/null
+++ b/dpnp/backend/extensions/vm/exp2.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "exp2.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::exp2<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event exp2_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::exp2(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(exp2);
+} // namespace impl
+
+void init_exp2(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto exp2_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_exp2", exp2_pyapi,
+          "Call `exp2` function from OneMKL VM library to compute "
+          "the base-2 exponential of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto exp2_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_exp2_to_call", exp2_need_to_call_pyapi,
+          "Check input arguments to answer if `exp2` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/exp2.hpp b/dpnp/backend/extensions/vm/exp2.hpp
index 362897fdbe63..fe0694c5181f 100644
--- a/dpnp/backend/extensions/vm/exp2.hpp
+++ b/dpnp/backend/extensions/vm/exp2.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event exp2_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::Exp2OutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::exp2(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct Exp2ContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::Exp2OutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return exp2_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_exp2(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/expm1.cpp b/dpnp/backend/extensions/vm/expm1.cpp
new file mode 100644
index 000000000000..b27668ba7c48
--- /dev/null
+++ b/dpnp/backend/extensions/vm/expm1.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "expm1.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::expm1<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event expm1_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     char *out_y,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::expm1(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(expm1);
+} // namespace impl
+
+void init_expm1(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto expm1_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                           const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_expm1", expm1_pyapi,
+          "Call `expm1` function from OneMKL VM library to compute "
+          "the subtraction of 1 from the exponential of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto expm1_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                        const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_expm1_to_call", expm1_need_to_call_pyapi,
+          "Check input arguments to answer if `expm1` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/expm1.hpp b/dpnp/backend/extensions/vm/expm1.hpp
index d0a94bca8e9d..7719d4948b44 100644
--- a/dpnp/backend/extensions/vm/expm1.hpp
+++ b/dpnp/backend/extensions/vm/expm1.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event expm1_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::Expm1OutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::expm1(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct Expm1ContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::Expm1OutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return expm1_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_expm1(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/floor.cpp b/dpnp/backend/extensions/vm/floor.cpp
new file mode 100644
index 000000000000..8a32f40e0ffb
--- /dev/null
+++ b/dpnp/backend/extensions/vm/floor.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "floor.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::floor<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event floor_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     char *out_y,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::floor(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(floor);
+} // namespace impl
+
+void init_floor(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto floor_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                           const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_floor", floor_pyapi,
+          "Call `floor` function from OneMKL VM library to compute "
+          "the floor of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto floor_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                        const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_floor_to_call", floor_need_to_call_pyapi,
+          "Check input arguments to answer if `floor` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/floor.hpp b/dpnp/backend/extensions/vm/floor.hpp
index c138b8b66782..4cc85f2bb897 100644
--- a/dpnp/backend/extensions/vm/floor.hpp
+++ b/dpnp/backend/extensions/vm/floor.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event floor_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::FloorOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::floor(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct FloorContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::FloorOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return floor_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_floor(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/hypot.cpp b/dpnp/backend/extensions/vm/hypot.cpp
new file mode 100644
index 000000000000..42dd81271111
--- /dev/null
+++ b/dpnp/backend/extensions/vm/hypot.cpp
@@ -0,0 +1,160 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "hypot.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::hypot<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event hypot_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     ssize_t a_offset,
+                                     const char *in_b,
+                                     ssize_t b_offset,
+                                     char *out_y,
+                                     ssize_t out_offset,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::hypot(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing 1st input vector of size n
+                         b, // pointer `b` containing 2nd input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(hypot);
+} // namespace impl
+
+void init_hypot(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto hypot_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                           const arrayT &src2, const arrayT &dst,
+                           const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_hypot", hypot_pyapi,
+          "Call `hypot` function from OneMKL VM library to compute "
+          "the square root of sum of squares elementwisely",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto hypot_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                        const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_hypot_to_call", hypot_need_to_call_pyapi,
+          "Check input arguments to answer if `hypot` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/hypot.hpp b/dpnp/backend/extensions/vm/hypot.hpp
index 19dd4345c36f..f7a171556d09 100644
--- a/dpnp/backend/extensions/vm/hypot.hpp
+++ b/dpnp/backend/extensions/vm/hypot.hpp
@@ -25,58 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event hypot_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              const char *in_b,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    const T *b = reinterpret_cast<const T *>(in_b);
-    using resTy = typename types::HypotOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::hypot(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing 1st input vector of size n
-                         b, // pointer `b` containing 2nd input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct HypotContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::HypotOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return hypot_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_hypot(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/ln.cpp b/dpnp/backend/extensions/vm/ln.cpp
new file mode 100644
index 000000000000..2eb321a3777a
--- /dev/null
+++ b/dpnp/backend/extensions/vm/ln.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "ln.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::ln<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event ln_contig_impl(sycl::queue &exec_q,
+                                  std::size_t in_n,
+                                  const char *in_a,
+                                  char *out_y,
+                                  const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::ln(exec_q,
+                      n, // number of elements to be calculated
+                      a, // pointer `a` containing input vector of size n
+                      y, // pointer `y` to the output vector of size n
+                      depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(ln);
+} // namespace impl
+
+void init_ln(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto ln_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                        const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_ln", ln_pyapi,
+          "Call `ln` function from OneMKL VM library to compute "
+          "the natural logarithm of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto ln_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                     const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_ln_to_call", ln_need_to_call_pyapi,
+          "Check input arguments to answer if `ln` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/ln.hpp b/dpnp/backend/extensions/vm/ln.hpp
index 574cc8fa33c9..7dadf76b2fdb 100644
--- a/dpnp/backend/extensions/vm/ln.hpp
+++ b/dpnp/backend/extensions/vm/ln.hpp
@@ -25,54 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event ln_contig_impl(sycl::queue exec_q,
-                           const std::int64_t n,
-                           const char *in_a,
-                           char *out_y,
-                           const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::LnOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::ln(exec_q,
-                      n, // number of elements to be calculated
-                      a, // pointer `a` containing input vector of size n
-                      y, // pointer `y` to the output vector of size n
-                      depends);
-}
-
-template <typename fnT, typename T>
-struct LnContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::LnOutputType<T>::value_type, void>) {
-            return nullptr;
-        }
-        else {
-            return ln_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_ln(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/log10.cpp b/dpnp/backend/extensions/vm/log10.cpp
new file mode 100644
index 000000000000..e685e5fce601
--- /dev/null
+++ b/dpnp/backend/extensions/vm/log10.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "log10.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::log10<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event log10_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     char *out_y,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::log10(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(log10);
+} // namespace impl
+
+void init_log10(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto log10_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                           const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_log10", log10_pyapi,
+          "Call `log10` function from OneMKL VM library to compute "
+          "the base-10 logarithm of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto log10_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                        const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_log10_to_call", log10_need_to_call_pyapi,
+          "Check input arguments to answer if `log10` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/log10.hpp b/dpnp/backend/extensions/vm/log10.hpp
index dc030817cdaa..c62ae122d356 100644
--- a/dpnp/backend/extensions/vm/log10.hpp
+++ b/dpnp/backend/extensions/vm/log10.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event log10_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::Log10OutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::log10(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct Log10ContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::Log10OutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return log10_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_log10(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/log1p.cpp b/dpnp/backend/extensions/vm/log1p.cpp
new file mode 100644
index 000000000000..2db1491e5ebd
--- /dev/null
+++ b/dpnp/backend/extensions/vm/log1p.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "log1p.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::log1p<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event log1p_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     char *out_y,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::log1p(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(log1p);
+} // namespace impl
+
+void init_log1p(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto log1p_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                           const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_log1p", log1p_pyapi,
+          "Call `log1p` function from OneMKL VM library to compute "
+          "the natural logarithm of 1 plus vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto log1p_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                        const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_log1p_to_call", log1p_need_to_call_pyapi,
+          "Check input arguments to answer if `log1p` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/log1p.hpp b/dpnp/backend/extensions/vm/log1p.hpp
index 39ab1b3a21cb..7cbfb1fe1873 100644
--- a/dpnp/backend/extensions/vm/log1p.hpp
+++ b/dpnp/backend/extensions/vm/log1p.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event log1p_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::Log1pOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::log1p(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct Log1pContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::Log1pOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return log1p_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_log1p(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/log2.cpp b/dpnp/backend/extensions/vm/log2.cpp
new file mode 100644
index 000000000000..a6800185c256
--- /dev/null
+++ b/dpnp/backend/extensions/vm/log2.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "log2.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::log2<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event log2_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::log2(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(log2);
+} // namespace impl
+
+void init_log2(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto log2_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_log2", log2_pyapi,
+          "Call `log2` function from OneMKL VM library to compute "
+          "the base-2 logarithm of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto log2_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_log2_to_call", log2_need_to_call_pyapi,
+          "Check input arguments to answer if `log2` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/log2.hpp b/dpnp/backend/extensions/vm/log2.hpp
index 2c419ac8ab26..34dd1a92136e 100644
--- a/dpnp/backend/extensions/vm/log2.hpp
+++ b/dpnp/backend/extensions/vm/log2.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event log2_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::Log2OutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::log2(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct Log2ContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::Log2OutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return log2_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_log2(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/mul.cpp b/dpnp/backend/extensions/vm/mul.cpp
new file mode 100644
index 000000000000..34007fbc07c2
--- /dev/null
+++ b/dpnp/backend/extensions/vm/mul.cpp
@@ -0,0 +1,171 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "mul.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::mul<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<double>,
+                                        T2,
+                                        std::complex<double>,
+                                        std::complex<double>>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<float>,
+                                        T2,
+                                        std::complex<float>,
+                                        std::complex<float>>,
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event mul_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   ssize_t a_offset,
+                                   const char *in_b,
+                                   ssize_t b_offset,
+                                   char *out_y,
+                                   ssize_t out_offset,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::mul(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing 1st input vector of size n
+                       b, // pointer `b` containing 2nd input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(mul);
+} // namespace impl
+
+void init_mul(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto mul_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                         const arrayT &src2, const arrayT &dst,
+                         const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_mul", mul_pyapi,
+          "Call `mul` function from OneMKL VM library to performs element "
+          "by element multiplication of vector `src1` by vector `src2` "
+          "to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto mul_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                      const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_mul_to_call", mul_need_to_call_pyapi,
+          "Check input arguments to answer if `mul` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/mul.hpp b/dpnp/backend/extensions/vm/mul.hpp
index 39ea8eec20ab..4dd138aea528 100644
--- a/dpnp/backend/extensions/vm/mul.hpp
+++ b/dpnp/backend/extensions/vm/mul.hpp
@@ -25,58 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event mul_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            const char *in_b,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    const T *b = reinterpret_cast<const T *>(in_b);
-    using resTy = typename types::MulOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::mul(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing 1st input vector of size n
-                       b, // pointer `b` containing 2nd input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct MulContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::MulOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return mul_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_mul(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/pow.cpp b/dpnp/backend/extensions/vm/pow.cpp
new file mode 100644
index 000000000000..65acd2ece44b
--- /dev/null
+++ b/dpnp/backend/extensions/vm/pow.cpp
@@ -0,0 +1,171 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "pow.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::pow<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<double>,
+                                        T2,
+                                        std::complex<double>,
+                                        std::complex<double>>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<float>,
+                                        T2,
+                                        std::complex<float>,
+                                        std::complex<float>>,
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event pow_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   ssize_t a_offset,
+                                   const char *in_b,
+                                   ssize_t b_offset,
+                                   char *out_y,
+                                   ssize_t out_offset,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::pow(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing 1st input vector of size n
+                       b, // pointer `b` containing 2nd input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(pow);
+} // namespace impl
+
+void init_pow(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto pow_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                         const arrayT &src2, const arrayT &dst,
+                         const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_pow", pow_pyapi,
+          "Call `pow` function from OneMKL VM library to performs element "
+          "by element exponentiation of vector `src1` raised to the power "
+          "of vector `src2` to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto pow_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                      const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_pow_to_call", pow_need_to_call_pyapi,
+          "Check input arguments to answer if `pow` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/pow.hpp b/dpnp/backend/extensions/vm/pow.hpp
index f5e946914bf3..ef6770d10651 100644
--- a/dpnp/backend/extensions/vm/pow.hpp
+++ b/dpnp/backend/extensions/vm/pow.hpp
@@ -25,58 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event pow_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            const char *in_b,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    const T *b = reinterpret_cast<const T *>(in_b);
-    using resTy = typename types::PowOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::pow(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing 1st input vector of size n
-                       b, // pointer `b` containing 2nd input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct PowContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::PowOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return pow_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_pow(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/rint.cpp b/dpnp/backend/extensions/vm/rint.cpp
new file mode 100644
index 000000000000..ee0edbecd23e
--- /dev/null
+++ b/dpnp/backend/extensions/vm/rint.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "rint.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::rint<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event rint_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::rint(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(rint);
+} // namespace impl
+
+void init_rint(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto rint_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_round", rint_pyapi,
+          "Call `rint` function from OneMKL VM library to compute "
+          "the rounded value of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto rint_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_round_to_call", rint_need_to_call_pyapi,
+          "Check input arguments to answer if `rint` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/round.hpp b/dpnp/backend/extensions/vm/rint.hpp
similarity index 53%
rename from dpnp/backend/extensions/vm/round.hpp
rename to dpnp/backend/extensions/vm/rint.hpp
index a2ae3b3bc528..ce493368788f 100644
--- a/dpnp/backend/extensions/vm/round.hpp
+++ b/dpnp/backend/extensions/vm/rint.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event round_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::RoundOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::rint(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct RoundContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::RoundOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return round_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_rint(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sin.cpp b/dpnp/backend/extensions/vm/sin.cpp
new file mode 100644
index 000000000000..55d9f8ed301e
--- /dev/null
+++ b/dpnp/backend/extensions/vm/sin.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "sin.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::sin<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event sin_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   char *out_y,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::sin(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(sin);
+} // namespace impl
+
+void init_sin(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto sin_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                         const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_sin", sin_pyapi,
+          "Call `sin` function from OneMKL VM library to compute "
+          "the sine of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto sin_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                      const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_sin_to_call", sin_need_to_call_pyapi,
+          "Check input arguments to answer if `sin` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sin.hpp b/dpnp/backend/extensions/vm/sin.hpp
index 0af14c68c876..dcda488e728f 100644
--- a/dpnp/backend/extensions/vm/sin.hpp
+++ b/dpnp/backend/extensions/vm/sin.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event sin_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::SinOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::sin(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct SinContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::SinOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return sin_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_sin(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sinh.cpp b/dpnp/backend/extensions/vm/sinh.cpp
new file mode 100644
index 000000000000..f8ddbc580ebc
--- /dev/null
+++ b/dpnp/backend/extensions/vm/sinh.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "sinh.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::sinh<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event sinh_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::sinh(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(sinh);
+} // namespace impl
+
+void init_sinh(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto sinh_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_sinh", sinh_pyapi,
+          "Call `sinh` function from OneMKL VM library to compute "
+          "the inverse cosine of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto sinh_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_sinh_to_call", sinh_need_to_call_pyapi,
+          "Check input arguments to answer if `sinh` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sinh.hpp b/dpnp/backend/extensions/vm/sinh.hpp
index 6fe53423c535..92f1e740a627 100644
--- a/dpnp/backend/extensions/vm/sinh.hpp
+++ b/dpnp/backend/extensions/vm/sinh.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event sinh_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::SinhOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::sinh(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct SinhContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::SinhOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return sinh_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_sinh(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sqr.cpp b/dpnp/backend/extensions/vm/sqr.cpp
new file mode 100644
index 000000000000..f42427ea00fc
--- /dev/null
+++ b/dpnp/backend/extensions/vm/sqr.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "sqr.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::sqr<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event sqr_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   char *out_y,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::sqr(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(sqr);
+} // namespace impl
+
+void init_sqr(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto sqr_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                         const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_sqr", sqr_pyapi,
+          "Call `sqr` from OneMKL VM library to performs element by element "
+          "operation of squaring of vector `src` to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto sqr_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                      const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_sqr_to_call", sqr_need_to_call_pyapi,
+          "Check input arguments to answer if `sqr` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sqr.hpp b/dpnp/backend/extensions/vm/sqr.hpp
index 8f1d4ac44fd5..2fe78ceead65 100644
--- a/dpnp/backend/extensions/vm/sqr.hpp
+++ b/dpnp/backend/extensions/vm/sqr.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event sqr_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::SqrOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::sqr(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct SqrContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::SqrOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return sqr_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_sqr(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sqrt.cpp b/dpnp/backend/extensions/vm/sqrt.cpp
new file mode 100644
index 000000000000..70ebbf298fd3
--- /dev/null
+++ b/dpnp/backend/extensions/vm/sqrt.cpp
@@ -0,0 +1,139 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "sqrt.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::sqrt<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event sqrt_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::sqrt(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(sqrt);
+} // namespace impl
+
+void init_sqrt(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto sqrt_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_sqrt", sqrt_pyapi,
+          "Call `sqrt` from OneMKL VM library to performs element by element "
+          "operation of extracting the square root "
+          "of vector `src` to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto sqrt_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_sqrt_to_call", sqrt_need_to_call_pyapi,
+          "Check input arguments to answer if `sqrt` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sqrt.hpp b/dpnp/backend/extensions/vm/sqrt.hpp
index e3984133628c..08d37049580d 100644
--- a/dpnp/backend/extensions/vm/sqrt.hpp
+++ b/dpnp/backend/extensions/vm/sqrt.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event sqrt_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::SqrtOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::sqrt(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct SqrtContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::SqrtOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return sqrt_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_sqrt(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sub.cpp b/dpnp/backend/extensions/vm/sub.cpp
new file mode 100644
index 000000000000..4ec1bdc36b50
--- /dev/null
+++ b/dpnp/backend/extensions/vm/sub.cpp
@@ -0,0 +1,171 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "sub.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::sub<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<double>,
+                                        T2,
+                                        std::complex<double>,
+                                        std::complex<double>>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::complex<float>,
+                                        T2,
+                                        std::complex<float>,
+                                        std::complex<float>>,
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event sub_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   ssize_t a_offset,
+                                   const char *in_b,
+                                   ssize_t b_offset,
+                                   char *out_y,
+                                   ssize_t out_offset,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::sub(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing 1st input vector of size n
+                       b, // pointer `b` containing 2nd input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(sub);
+} // namespace impl
+
+void init_sub(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto sub_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                         const arrayT &src2, const arrayT &dst,
+                         const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_sub", sub_pyapi,
+          "Call `sub` function from OneMKL VM library to performs element "
+          "by element subtraction of vector `src1` by vector `src2` "
+          "to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto sub_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                      const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_sub_to_call", sub_need_to_call_pyapi,
+          "Check input arguments to answer if `sub` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/sub.hpp b/dpnp/backend/extensions/vm/sub.hpp
index e1a2464b8675..059a78dcbda0 100644
--- a/dpnp/backend/extensions/vm/sub.hpp
+++ b/dpnp/backend/extensions/vm/sub.hpp
@@ -25,58 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event sub_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            const char *in_b,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    const T *b = reinterpret_cast<const T *>(in_b);
-    using resTy = typename types::SubOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::sub(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing 1st input vector of size n
-                       b, // pointer `b` containing 2nd input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct SubContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::SubOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return sub_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_sub(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/tan.cpp b/dpnp/backend/extensions/vm/tan.cpp
new file mode 100644
index 000000000000..250c38387227
--- /dev/null
+++ b/dpnp/backend/extensions/vm/tan.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "tan.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::tan<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event tan_contig_impl(sycl::queue &exec_q,
+                                   std::size_t in_n,
+                                   const char *in_a,
+                                   char *out_y,
+                                   const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::tan(exec_q,
+                       n, // number of elements to be calculated
+                       a, // pointer `a` containing input vector of size n
+                       y, // pointer `y` to the output vector of size n
+                       depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(tan);
+} // namespace impl
+
+void init_tan(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto tan_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                         const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_tan", tan_pyapi,
+          "Call `tan` function from OneMKL VM library to compute "
+          "the tangent of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto tan_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                      const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_tan_to_call", tan_need_to_call_pyapi,
+          "Check input arguments to answer if `tan` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/tan.hpp b/dpnp/backend/extensions/vm/tan.hpp
index d759ea46fe13..6fcfed9f8160 100644
--- a/dpnp/backend/extensions/vm/tan.hpp
+++ b/dpnp/backend/extensions/vm/tan.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event tan_contig_impl(sycl::queue exec_q,
-                            const std::int64_t n,
-                            const char *in_a,
-                            char *out_y,
-                            const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::TanOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::tan(exec_q,
-                       n, // number of elements to be calculated
-                       a, // pointer `a` containing input vector of size n
-                       y, // pointer `y` to the output vector of size n
-                       depends);
-}
-
-template <typename fnT, typename T>
-struct TanContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::TanOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return tan_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_tan(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/tanh.cpp b/dpnp/backend/extensions/vm/tanh.cpp
new file mode 100644
index 000000000000..d0e9ecc1669a
--- /dev/null
+++ b/dpnp/backend/extensions/vm/tanh.cpp
@@ -0,0 +1,138 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "tanh.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::tanh<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::TypeMapResultEntry<T, std::complex<double>>,
+        td_ns::TypeMapResultEntry<T, std::complex<float>>,
+        td_ns::TypeMapResultEntry<T, double>,
+        td_ns::TypeMapResultEntry<T, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event tanh_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    char *out_y,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::tanh(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(tanh);
+} // namespace impl
+
+void init_tanh(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto tanh_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                          const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_tanh", tanh_pyapi,
+          "Call `tanh` function from OneMKL VM library to compute "
+          "the hyperbolic tangent of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto tanh_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                       const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_tanh_to_call", tanh_need_to_call_pyapi,
+          "Check input arguments to answer if `tanh` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/tanh.hpp b/dpnp/backend/extensions/vm/tanh.hpp
index 98909685ff2b..9afbe1eb480b 100644
--- a/dpnp/backend/extensions/vm/tanh.hpp
+++ b/dpnp/backend/extensions/vm/tanh.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event tanh_contig_impl(sycl::queue exec_q,
-                             const std::int64_t n,
-                             const char *in_a,
-                             char *out_y,
-                             const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::TanhOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::tanh(exec_q,
-                        n, // number of elements to be calculated
-                        a, // pointer `a` containing input vector of size n
-                        y, // pointer `y` to the output vector of size n
-                        depends);
-}
-
-template <typename fnT, typename T>
-struct TanhContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::TanhOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return tanh_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_tanh(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/trunc.cpp b/dpnp/backend/extensions/vm/trunc.cpp
new file mode 100644
index 000000000000..f47da825719c
--- /dev/null
+++ b/dpnp/backend/extensions/vm/trunc.cpp
@@ -0,0 +1,136 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "trunc.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::trunc<T> function.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T>
+static sycl::event trunc_contig_impl(sycl::queue &exec_q,
+                                     std::size_t in_n,
+                                     const char *in_a,
+                                     char *out_y,
+                                     const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T>(exec_q);
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T *a = reinterpret_cast<const T *>(in_a);
+
+    using resTy = typename OutputType<T>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::trunc(exec_q,
+                         n, // number of elements to be calculated
+                         a, // pointer `a` containing input vector of size n
+                         y, // pointer `y` to the output vector of size n
+                         depends);
+}
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types];
+static unary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(trunc);
+} // namespace impl
+
+void init_trunc(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_vectors();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto trunc_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                           const arrayT &dst, const event_vecT &depends = {}) {
+        return py_int::py_unary_ufunc(
+            src, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrVector<impl::unary_strided_impl_fn_ptr_t>{});
+    };
+    m.def("_trunc", trunc_pyapi,
+          "Call `trunc` function from OneMKL VM library to compute "
+          "the truncated value of vector elements",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
+          py::arg("depends") = py::list());
+
+    auto trunc_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src,
+                                        const arrayT &dst) {
+        return py_internal::need_to_call_unary_ufunc(
+            exec_q, src, dst, output_typeid_vector, contig_dispatch_vector);
+    };
+    m.def("_mkl_trunc_to_call", trunc_need_to_call_pyapi,
+          "Check input arguments to answer if `trunc` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/trunc.hpp b/dpnp/backend/extensions/vm/trunc.hpp
index c06c7cf566fe..0b430fd1efc2 100644
--- a/dpnp/backend/extensions/vm/trunc.hpp
+++ b/dpnp/backend/extensions/vm/trunc.hpp
@@ -25,55 +25,11 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
+#include <pybind11/pybind11.h>
 
-#include "common.hpp"
-#include "types_matrix.hpp"
+namespace py = pybind11;
 
-namespace dpnp
+namespace dpnp::extensions::vm
 {
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-template <typename T>
-sycl::event trunc_contig_impl(sycl::queue exec_q,
-                              const std::int64_t n,
-                              const char *in_a,
-                              char *out_y,
-                              const std::vector<sycl::event> &depends)
-{
-    type_utils::validate_type_for_device<T>(exec_q);
-
-    const T *a = reinterpret_cast<const T *>(in_a);
-    using resTy = typename types::TruncOutputType<T>::value_type;
-    resTy *y = reinterpret_cast<resTy *>(out_y);
-
-    return mkl_vm::trunc(exec_q,
-                         n, // number of elements to be calculated
-                         a, // pointer `a` containing input vector of size n
-                         y, // pointer `y` to the output vector of size n
-                         depends);
-}
-
-template <typename fnT, typename T>
-struct TruncContigFactory
-{
-    fnT get()
-    {
-        if constexpr (std::is_same_v<
-                          typename types::TruncOutputType<T>::value_type, void>)
-        {
-            return nullptr;
-        }
-        else {
-            return trunc_contig_impl<T>;
-        }
-    }
-};
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+void init_trunc(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/types_matrix.hpp b/dpnp/backend/extensions/vm/types_matrix.hpp
deleted file mode 100644
index 5b4ccb8fdf67..000000000000
--- a/dpnp/backend/extensions/vm/types_matrix.hpp
+++ /dev/null
@@ -1,659 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2023-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-#pragma once
-
-#include <type_traits>
-
-// dpctl tensor headers
-#include "utils/type_dispatch.hpp"
-
-// dpctl namespace for types dispatching
-namespace dpctl_td_ns = dpctl::tensor::type_dispatch;
-
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace vm
-{
-namespace types
-{
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::abs<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct AbsOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>, float>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::acos<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct AcosOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::acosh<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct AcoshOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::add<T> function.
- *
- * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
- */
-template <typename T>
-struct AddOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<double>,
-                                              T,
-                                              std::complex<double>,
-                                              std::complex<double>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<float>,
-                                              T,
-                                              std::complex<float>,
-                                              std::complex<float>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, double, T, double, double>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, float, T, float, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::asin<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct AsinOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::asinh<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct AsinhOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::atan<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct AtanOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::atan2<T> function.
- *
- * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
- */
-template <typename T>
-struct Atan2OutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, double, T, double, double>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, float, T, float, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::atanh<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct AtanhOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::cbrt<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct CbrtOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::ceil<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct CeilOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::conj<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct ConjOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::cos<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct CosOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::cosh<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct CoshOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::div<T> function.
- *
- * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
- */
-template <typename T>
-struct DivOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<double>,
-                                              T,
-                                              std::complex<double>,
-                                              std::complex<double>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<float>,
-                                              T,
-                                              std::complex<float>,
-                                              std::complex<float>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, double, T, double, double>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, float, T, float, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::exp<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct ExpOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::exp2<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct Exp2OutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::expm1<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct Expm1OutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::floor<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct FloorOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::hypot<T> function.
- *
- * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
- */
-template <typename T>
-struct HypotOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, double, T, double, double>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, float, T, float, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::ln<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct LnOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::log10<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct Log10OutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::log1p<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct Log1pOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::log2<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct Log2OutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::mul<T> function.
- *
- * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
- */
-template <typename T>
-struct MulOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<double>,
-                                              T,
-                                              std::complex<double>,
-                                              std::complex<double>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<float>,
-                                              T,
-                                              std::complex<float>,
-                                              std::complex<float>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, double, T, double, double>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, float, T, float, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::pow<T> function.
- *
- * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
- */
-template <typename T>
-struct PowOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<double>,
-                                              T,
-                                              std::complex<double>,
-                                              std::complex<double>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<float>,
-                                              T,
-                                              std::complex<float>,
-                                              std::complex<float>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, double, T, double, double>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, float, T, float, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::rint<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct RoundOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::sin<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct SinOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::sinh<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct SinhOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::sqr<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct SqrOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::sqrt<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct SqrtOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::sub<T> function.
- *
- * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
- */
-template <typename T>
-struct SubOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<double>,
-                                              T,
-                                              std::complex<double>,
-                                              std::complex<double>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T,
-                                              std::complex<float>,
-                                              T,
-                                              std::complex<float>,
-                                              std::complex<float>>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, double, T, double, double>,
-        dpctl_td_ns::BinaryTypeMapResultEntry<T, float, T, float, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::tan<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct TanOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::tanh<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct TanhOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<double>>,
-        dpctl_td_ns::TypeMapResultEntry<T, std::complex<float>>,
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-/**
- * @brief A factory to define pairs of supported types for which
- * MKL VM library provides support in oneapi::mkl::vm::trunc<T> function.
- *
- * @tparam T Type of input vector `a` and of result vector `y`.
- */
-template <typename T>
-struct TruncOutputType
-{
-    using value_type = typename std::disjunction<
-        dpctl_td_ns::TypeMapResultEntry<T, double>,
-        dpctl_td_ns::TypeMapResultEntry<T, float>,
-        dpctl_td_ns::DefaultResultEntry<void>>::result_type;
-};
-
-} // namespace types
-} // namespace vm
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
diff --git a/dpnp/backend/extensions/vm/vm_py.cpp b/dpnp/backend/extensions/vm/vm_py.cpp
index 74d2ae677943..791a8f6d6561 100644
--- a/dpnp/backend/extensions/vm/vm_py.cpp
+++ b/dpnp/backend/extensions/vm/vm_py.cpp
@@ -27,9 +27,6 @@
 //
 //*****************************************************************************
 
-#include <pybind11/pybind11.h>
-#include <pybind11/stl.h>
-
 #include "abs.hpp"
 #include "acos.hpp"
 #include "acosh.hpp"
@@ -41,7 +38,6 @@
 #include "atanh.hpp"
 #include "cbrt.hpp"
 #include "ceil.hpp"
-#include "common.hpp"
 #include "conj.hpp"
 #include "cos.hpp"
 #include "cosh.hpp"
@@ -57,7 +53,7 @@
 #include "log2.hpp"
 #include "mul.hpp"
 #include "pow.hpp"
-#include "round.hpp"
+#include "rint.hpp"
 #include "sin.hpp"
 #include "sinh.hpp"
 #include "sqr.hpp"
@@ -66,1047 +62,44 @@
 #include "tan.hpp"
 #include "tanh.hpp"
 #include "trunc.hpp"
-#include "types_matrix.hpp"
-
-namespace py = pybind11;
-namespace vm_ext = dpnp::backend::ext::vm;
 
-using vm_ext::binary_impl_fn_ptr_t;
-using vm_ext::unary_impl_fn_ptr_t;
-
-static unary_impl_fn_ptr_t abs_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t acos_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t acosh_dispatch_vector[dpctl_td_ns::num_types];
-static binary_impl_fn_ptr_t add_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t asin_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t asinh_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t atan_dispatch_vector[dpctl_td_ns::num_types];
-static binary_impl_fn_ptr_t atan2_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t atanh_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t cbrt_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t ceil_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t conj_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t cos_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t cosh_dispatch_vector[dpctl_td_ns::num_types];
-static binary_impl_fn_ptr_t div_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t exp_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t exp2_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t expm1_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t floor_dispatch_vector[dpctl_td_ns::num_types];
-static binary_impl_fn_ptr_t hypot_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t ln_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t log10_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t log1p_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t log2_dispatch_vector[dpctl_td_ns::num_types];
-static binary_impl_fn_ptr_t mul_dispatch_vector[dpctl_td_ns::num_types];
-static binary_impl_fn_ptr_t pow_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t round_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t sin_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t sinh_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t sqr_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t sqrt_dispatch_vector[dpctl_td_ns::num_types];
-static binary_impl_fn_ptr_t sub_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t tan_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t tanh_dispatch_vector[dpctl_td_ns::num_types];
-static unary_impl_fn_ptr_t trunc_dispatch_vector[dpctl_td_ns::num_types];
+namespace vm_ns = dpnp::extensions::vm;
 
 PYBIND11_MODULE(_vm_impl, m)
 {
-    using arrayT = dpctl::tensor::usm_ndarray;
-    using event_vecT = std::vector<sycl::event>;
-
-    // UnaryUfunc: ==== Abs(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::AbsContigFactory>(
-            abs_dispatch_vector);
-
-        auto abs_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                             const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       abs_dispatch_vector);
-        };
-        m.def("_abs", abs_pyapi,
-              "Call `abs` function from OneMKL VM library to compute "
-              "the absolute value of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto abs_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                          arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    abs_dispatch_vector);
-        };
-        m.def("_mkl_abs_to_call", abs_need_to_call_pyapi,
-              "Check input arguments to answer if `abs` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Acos(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::AcosContigFactory>(
-            acos_dispatch_vector);
-
-        auto acos_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       acos_dispatch_vector);
-        };
-        m.def("_acos", acos_pyapi,
-              "Call `acos` function from OneMKL VM library to compute "
-              "inverse cosine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto acos_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    acos_dispatch_vector);
-        };
-        m.def("_mkl_acos_to_call", acos_need_to_call_pyapi,
-              "Check input arguments to answer if `acos` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Acosh(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::AcoshContigFactory>(
-            acosh_dispatch_vector);
-
-        auto acosh_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                               const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       acosh_dispatch_vector);
-        };
-        m.def("_acosh", acosh_pyapi,
-              "Call `acosh` function from OneMKL VM library to compute "
-              "inverse cosine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto acosh_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                            arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    acosh_dispatch_vector);
-        };
-        m.def("_mkl_acosh_to_call", acosh_need_to_call_pyapi,
-              "Check input arguments to answer if `acosh` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // BinaryUfunc: ==== Add(x1, x2) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<binary_impl_fn_ptr_t,
-                                           vm_ext::AddContigFactory>(
-            add_dispatch_vector);
-
-        auto add_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                             arrayT dst, const event_vecT &depends = {}) {
-            return vm_ext::binary_ufunc(exec_q, src1, src2, dst, depends,
-                                        add_dispatch_vector);
-        };
-        m.def("_add", add_pyapi,
-              "Call `add` function from OneMKL VM library to performs element "
-              "by element addition of vector `src1` by vector `src2` "
-              "to resulting vector `dst`",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"), py::arg("depends") = py::list());
-
-        auto add_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src1,
-                                          arrayT src2, arrayT dst) {
-            return vm_ext::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
-                                                     add_dispatch_vector);
-        };
-        m.def("_mkl_add_to_call", add_need_to_call_pyapi,
-              "Check input arguments to answer if `add` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Asin(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::AsinContigFactory>(
-            asin_dispatch_vector);
-
-        auto asin_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       asin_dispatch_vector);
-        };
-        m.def("_asin", asin_pyapi,
-              "Call `asin` function from OneMKL VM library to compute "
-              "inverse sine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto asin_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    asin_dispatch_vector);
-        };
-        m.def("_mkl_asin_to_call", asin_need_to_call_pyapi,
-              "Check input arguments to answer if `asin` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Asinh(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::AsinhContigFactory>(
-            asinh_dispatch_vector);
-
-        auto asinh_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                               const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       asinh_dispatch_vector);
-        };
-        m.def("_asinh", asinh_pyapi,
-              "Call `asinh` function from OneMKL VM library to compute "
-              "inverse cosine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto asinh_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                            arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    asinh_dispatch_vector);
-        };
-        m.def("_mkl_asinh_to_call", asinh_need_to_call_pyapi,
-              "Check input arguments to answer if `asinh` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Atan(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::AtanContigFactory>(
-            atan_dispatch_vector);
-
-        auto atan_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       atan_dispatch_vector);
-        };
-        m.def("_atan", atan_pyapi,
-              "Call `atan` function from OneMKL VM library to compute "
-              "inverse tangent of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto atan_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    atan_dispatch_vector);
-        };
-        m.def("_mkl_atan_to_call", atan_need_to_call_pyapi,
-              "Check input arguments to answer if `atan` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // BinaryUfunc: ==== Atan2(x1, x2) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<binary_impl_fn_ptr_t,
-                                           vm_ext::Atan2ContigFactory>(
-            atan2_dispatch_vector);
-
-        auto atan2_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                               arrayT dst, const event_vecT &depends = {}) {
-            return vm_ext::binary_ufunc(exec_q, src1, src2, dst, depends,
-                                        atan2_dispatch_vector);
-        };
-        m.def("_atan2", atan2_pyapi,
-              "Call `atan2` function from OneMKL VM library to compute element "
-              "by element inverse tangent of `x1/x2`",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"), py::arg("depends") = py::list());
-
-        auto atan2_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src1,
-                                            arrayT src2, arrayT dst) {
-            return vm_ext::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
-                                                     atan2_dispatch_vector);
-        };
-        m.def("_mkl_atan2_to_call", atan2_need_to_call_pyapi,
-              "Check input arguments to answer if `atan2` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Atanh(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::AtanhContigFactory>(
-            atanh_dispatch_vector);
-
-        auto atanh_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                               const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       atanh_dispatch_vector);
-        };
-        m.def("_atanh", atanh_pyapi,
-              "Call `atanh` function from OneMKL VM library to compute "
-              "inverse cosine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto atanh_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                            arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    atanh_dispatch_vector);
-        };
-        m.def("_mkl_atanh_to_call", atanh_need_to_call_pyapi,
-              "Check input arguments to answer if `atanh` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Cbrt(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::CbrtContigFactory>(
-            cbrt_dispatch_vector);
-
-        auto cbrt_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       cbrt_dispatch_vector);
-        };
-        m.def("_cbrt", cbrt_pyapi,
-              "Call `cbrt` function from OneMKL VM library to compute "
-              "the element-wise cube root of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto cbrt_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    cbrt_dispatch_vector);
-        };
-        m.def("_mkl_cbrt_to_call", cbrt_need_to_call_pyapi,
-              "Check input arguments to answer if `cbrt` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Ceil(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::CeilContigFactory>(
-            ceil_dispatch_vector);
-
-        auto ceil_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       ceil_dispatch_vector);
-        };
-        m.def("_ceil", ceil_pyapi,
-              "Call `ceil` function from OneMKL VM library to compute "
-              "ceiling of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto ceil_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    ceil_dispatch_vector);
-        };
-        m.def("_mkl_ceil_to_call", ceil_need_to_call_pyapi,
-              "Check input arguments to answer if `ceil` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Conj(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::ConjContigFactory>(
-            conj_dispatch_vector);
-
-        auto conj_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       conj_dispatch_vector);
-        };
-        m.def("_conj", conj_pyapi,
-              "Call `conj` function from OneMKL VM library to compute "
-              "conjugate of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto conj_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    conj_dispatch_vector);
-        };
-        m.def("_mkl_conj_to_call", conj_need_to_call_pyapi,
-              "Check input arguments to answer if `conj` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Cos(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::CosContigFactory>(
-            cos_dispatch_vector);
-
-        auto cos_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                             const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       cos_dispatch_vector);
-        };
-        m.def("_cos", cos_pyapi,
-              "Call `cos` function from OneMKL VM library to compute "
-              "cosine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto cos_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                          arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    cos_dispatch_vector);
-        };
-        m.def("_mkl_cos_to_call", cos_need_to_call_pyapi,
-              "Check input arguments to answer if `cos` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Cosh(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::CoshContigFactory>(
-            cosh_dispatch_vector);
-
-        auto cosh_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       cosh_dispatch_vector);
-        };
-        m.def("_cosh", cosh_pyapi,
-              "Call `cosh` function from OneMKL VM library to compute "
-              "inverse cosine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto cosh_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    cosh_dispatch_vector);
-        };
-        m.def("_mkl_cosh_to_call", cosh_need_to_call_pyapi,
-              "Check input arguments to answer if `cosh` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // BinaryUfunc: ==== Div(x1, x2) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<binary_impl_fn_ptr_t,
-                                           vm_ext::DivContigFactory>(
-            div_dispatch_vector);
-
-        auto div_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                             arrayT dst, const event_vecT &depends = {}) {
-            return vm_ext::binary_ufunc(exec_q, src1, src2, dst, depends,
-                                        div_dispatch_vector);
-        };
-        m.def("_div", div_pyapi,
-              "Call `div` function from OneMKL VM library to performs element "
-              "by element division of vector `src1` by vector `src2` "
-              "to resulting vector `dst`",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"), py::arg("depends") = py::list());
-
-        auto div_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src1,
-                                          arrayT src2, arrayT dst) {
-            return vm_ext::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
-                                                     div_dispatch_vector);
-        };
-        m.def("_mkl_div_to_call", div_need_to_call_pyapi,
-              "Check input arguments to answer if `div` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Exp(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::ExpContigFactory>(
-            exp_dispatch_vector);
-
-        auto exp_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                             const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       exp_dispatch_vector);
-        };
-        m.def("_exp", exp_pyapi,
-              "Call `exp` function from OneMKL VM library to compute "
-              "natural (base-e) exponential of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto exp_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                          arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    exp_dispatch_vector);
-        };
-        m.def("_mkl_exp_to_call", exp_need_to_call_pyapi,
-              "Check input arguments to answer if `exp` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== exp2(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::Exp2ContigFactory>(
-            exp2_dispatch_vector);
-
-        auto exp2_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       exp2_dispatch_vector);
-        };
-        m.def("_exp2", exp2_pyapi,
-              "Call `exp2` function from OneMKL VM library to compute "
-              "the element-wise base-2 exponential of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto exp2_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    exp2_dispatch_vector);
-        };
-        m.def("_mkl_exp2_to_call", exp2_need_to_call_pyapi,
-              "Check input arguments to answer if `exp2` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== expm1(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::Expm1ContigFactory>(
-            expm1_dispatch_vector);
-
-        auto expm1_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                               const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       expm1_dispatch_vector);
-        };
-        m.def("_expm1", expm1_pyapi,
-              "Call `expm1` function from OneMKL VM library to compute "
-              "subtraction of 1 from the exponential of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto expm1_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                            arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    expm1_dispatch_vector);
-        };
-        m.def("_mkl_expm1_to_call", expm1_need_to_call_pyapi,
-              "Check input arguments to answer if `expm1` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Floor(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::FloorContigFactory>(
-            floor_dispatch_vector);
-
-        auto floor_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                               const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       floor_dispatch_vector);
-        };
-        m.def("_floor", floor_pyapi,
-              "Call `floor` function from OneMKL VM library to compute "
-              "floor of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto floor_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                            arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    floor_dispatch_vector);
-        };
-        m.def("_mkl_floor_to_call", floor_need_to_call_pyapi,
-              "Check input arguments to answer if `floor` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // BinaryUfunc: ==== Hypot(x1, x2) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<binary_impl_fn_ptr_t,
-                                           vm_ext::HypotContigFactory>(
-            hypot_dispatch_vector);
-
-        auto hypot_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                               arrayT dst, const event_vecT &depends = {}) {
-            return vm_ext::binary_ufunc(exec_q, src1, src2, dst, depends,
-                                        hypot_dispatch_vector);
-        };
-        m.def("_hypot", hypot_pyapi,
-              "Call `hypot` function from OneMKL VM library to compute element "
-              "by element hypotenuse of `x`",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"), py::arg("depends") = py::list());
-
-        auto hypot_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src1,
-                                            arrayT src2, arrayT dst) {
-            return vm_ext::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
-                                                     hypot_dispatch_vector);
-        };
-        m.def("_mkl_hypot_to_call", hypot_need_to_call_pyapi,
-              "Check input arguments to answer if `hypot` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Ln(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::LnContigFactory>(
-            ln_dispatch_vector);
-
-        auto ln_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                            const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       ln_dispatch_vector);
-        };
-        m.def("_ln", ln_pyapi,
-              "Call `ln` function from OneMKL VM library to compute "
-              "natural logarithm of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto ln_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                         arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    ln_dispatch_vector);
-        };
-        m.def("_mkl_ln_to_call", ln_need_to_call_pyapi,
-              "Check input arguments to answer if `ln` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Log10(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::Log10ContigFactory>(
-            log10_dispatch_vector);
-
-        auto log10_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                               const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       log10_dispatch_vector);
-        };
-        m.def("_log10", log10_pyapi,
-              "Call `log10` function from OneMKL VM library to compute "
-              "base-10 logarithm of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto log10_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                            arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    log10_dispatch_vector);
-        };
-        m.def("_mkl_log10_to_call", log10_need_to_call_pyapi,
-              "Check input arguments to answer if `log10` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Log1p(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::Log1pContigFactory>(
-            log1p_dispatch_vector);
-
-        auto log1p_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                               const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       log1p_dispatch_vector);
-        };
-        m.def("_log1p", log1p_pyapi,
-              "Call `log1p` function from OneMKL VM library to compute "
-              "natural logarithm of 1 plus vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto log1p_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                            arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    log1p_dispatch_vector);
-        };
-        m.def("_mkl_log1p_to_call", log1p_need_to_call_pyapi,
-              "Check input arguments to answer if `log1p` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Log2(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::Log2ContigFactory>(
-            log2_dispatch_vector);
-
-        auto log2_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       log2_dispatch_vector);
-        };
-        m.def("_log2", log2_pyapi,
-              "Call `log2` function from OneMKL VM library to compute "
-              "base-2 logarithm of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto log2_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    log2_dispatch_vector);
-        };
-        m.def("_mkl_log2_to_call", log2_need_to_call_pyapi,
-              "Check input arguments to answer if `log2` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // BinaryUfunc: ==== Mul(x1, x2) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<binary_impl_fn_ptr_t,
-                                           vm_ext::MulContigFactory>(
-            mul_dispatch_vector);
-
-        auto mul_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                             arrayT dst, const event_vecT &depends = {}) {
-            return vm_ext::binary_ufunc(exec_q, src1, src2, dst, depends,
-                                        mul_dispatch_vector);
-        };
-        m.def("_mul", mul_pyapi,
-              "Call `mul` function from OneMKL VM library to performs element "
-              "by element multiplication of vector `src1` by vector `src2` "
-              "to resulting vector `dst`",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"), py::arg("depends") = py::list());
-
-        auto mul_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src1,
-                                          arrayT src2, arrayT dst) {
-            return vm_ext::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
-                                                     mul_dispatch_vector);
-        };
-        m.def("_mkl_mul_to_call", mul_need_to_call_pyapi,
-              "Check input arguments to answer if `mul` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"));
-    }
-
-    // BinaryUfunc: ==== Pow(x1, x2) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<binary_impl_fn_ptr_t,
-                                           vm_ext::PowContigFactory>(
-            pow_dispatch_vector);
-
-        auto pow_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                             arrayT dst, const event_vecT &depends = {}) {
-            return vm_ext::binary_ufunc(exec_q, src1, src2, dst, depends,
-                                        pow_dispatch_vector);
-        };
-        m.def("_pow", pow_pyapi,
-              "Call `pow` function from OneMKL VM library to performs element "
-              "by element exponentiation of vector `src1` raised to the power "
-              "of vector `src2` to resulting vector `dst`",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"), py::arg("depends") = py::list());
-
-        auto pow_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src1,
-                                          arrayT src2, arrayT dst) {
-            return vm_ext::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
-                                                     pow_dispatch_vector);
-        };
-        m.def("_mkl_pow_to_call", pow_need_to_call_pyapi,
-              "Check input arguments to answer if `pow` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Round(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::RoundContigFactory>(
-            round_dispatch_vector);
-
-        auto round_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                               const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       round_dispatch_vector);
-        };
-        m.def("_round", round_pyapi,
-              "Call `rint` function from OneMKL VM library to compute "
-              "the rounded value of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto round_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                            arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    round_dispatch_vector);
-        };
-        m.def("_mkl_round_to_call", round_need_to_call_pyapi,
-              "Check input arguments to answer if `rint` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Sin(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::SinContigFactory>(
-            sin_dispatch_vector);
-
-        auto sin_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                             const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       sin_dispatch_vector);
-        };
-        m.def("_sin", sin_pyapi,
-              "Call `sin` function from OneMKL VM library to compute "
-              "sine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto sin_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                          arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    sin_dispatch_vector);
-        };
-        m.def("_mkl_sin_to_call", sin_need_to_call_pyapi,
-              "Check input arguments to answer if `sin` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Sinh(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::SinhContigFactory>(
-            sinh_dispatch_vector);
-
-        auto sinh_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       sinh_dispatch_vector);
-        };
-        m.def("_sinh", sinh_pyapi,
-              "Call `sinh` function from OneMKL VM library to compute "
-              "inverse cosine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto sinh_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    sinh_dispatch_vector);
-        };
-        m.def("_mkl_sinh_to_call", sinh_need_to_call_pyapi,
-              "Check input arguments to answer if `sinh` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Sqr(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::SqrContigFactory>(
-            sqr_dispatch_vector);
-
-        auto sqr_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                             const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       sqr_dispatch_vector);
-        };
-        m.def(
-            "_sqr", sqr_pyapi,
-            "Call `sqr` from OneMKL VM library to performs element by element "
-            "operation of squaring of vector `src` to resulting vector `dst`",
-            py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-            py::arg("depends") = py::list());
-
-        auto sqr_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                          arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    sqr_dispatch_vector);
-        };
-        m.def("_mkl_sqr_to_call", sqr_need_to_call_pyapi,
-              "Check input arguments to answer if `sqr` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Sqrt(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::SqrtContigFactory>(
-            sqrt_dispatch_vector);
-
-        auto sqrt_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       sqrt_dispatch_vector);
-        };
-        m.def(
-            "_sqrt", sqrt_pyapi,
-            "Call `sqrt` from OneMKL VM library to performs element by element "
-            "operation of extracting the square root "
-            "of vector `src` to resulting vector `dst`",
-            py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-            py::arg("depends") = py::list());
-
-        auto sqrt_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    sqrt_dispatch_vector);
-        };
-        m.def("_mkl_sqrt_to_call", sqrt_need_to_call_pyapi,
-              "Check input arguments to answer if `sqrt` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // BinaryUfunc: ==== Sub(x1, x2) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<binary_impl_fn_ptr_t,
-                                           vm_ext::SubContigFactory>(
-            sub_dispatch_vector);
-
-        auto sub_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                             arrayT dst, const event_vecT &depends = {}) {
-            return vm_ext::binary_ufunc(exec_q, src1, src2, dst, depends,
-                                        sub_dispatch_vector);
-        };
-        m.def("_sub", sub_pyapi,
-              "Call `sub` function from OneMKL VM library to performs element "
-              "by element subtraction of vector `src1` by vector `src2` "
-              "to resulting vector `dst`",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"), py::arg("depends") = py::list());
-
-        auto sub_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src1,
-                                          arrayT src2, arrayT dst) {
-            return vm_ext::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
-                                                     sub_dispatch_vector);
-        };
-        m.def("_mkl_sub_to_call", sub_need_to_call_pyapi,
-              "Check input arguments to answer if `sub` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
-              py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Tan(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::TanContigFactory>(
-            tan_dispatch_vector);
-
-        auto tan_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                             const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       tan_dispatch_vector);
-        };
-        m.def("_tan", tan_pyapi,
-              "Call `tan` function from OneMKL VM library to compute "
-              "tangent of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto tan_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                          arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    tan_dispatch_vector);
-        };
-        m.def("_mkl_tan_to_call", tan_need_to_call_pyapi,
-              "Check input arguments to answer if `tan` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Tanh(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::TanhContigFactory>(
-            tanh_dispatch_vector);
-
-        auto tanh_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                              const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       tanh_dispatch_vector);
-        };
-        m.def("_tanh", tanh_pyapi,
-              "Call `tanh` function from OneMKL VM library to compute "
-              "inverse cosine of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto tanh_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                           arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    tanh_dispatch_vector);
-        };
-        m.def("_mkl_tanh_to_call", tanh_need_to_call_pyapi,
-              "Check input arguments to answer if `tanh` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
-
-    // UnaryUfunc: ==== Trunc(x) ====
-    {
-        vm_ext::init_ufunc_dispatch_vector<unary_impl_fn_ptr_t,
-                                           vm_ext::TruncContigFactory>(
-            trunc_dispatch_vector);
-
-        auto trunc_pyapi = [&](sycl::queue exec_q, arrayT src, arrayT dst,
-                               const event_vecT &depends = {}) {
-            return vm_ext::unary_ufunc(exec_q, src, dst, depends,
-                                       trunc_dispatch_vector);
-        };
-        m.def("_trunc", trunc_pyapi,
-              "Call `trunc` function from OneMKL VM library to compute "
-              "the truncated value of vector elements",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"),
-              py::arg("depends") = py::list());
-
-        auto trunc_need_to_call_pyapi = [&](sycl::queue exec_q, arrayT src,
-                                            arrayT dst) {
-            return vm_ext::need_to_call_unary_ufunc(exec_q, src, dst,
-                                                    trunc_dispatch_vector);
-        };
-        m.def("_mkl_trunc_to_call", trunc_need_to_call_pyapi,
-              "Check input arguments to answer if `trunc` function from "
-              "OneMKL VM library can be used",
-              py::arg("sycl_queue"), py::arg("src"), py::arg("dst"));
-    }
+    vm_ns::init_abs(m);
+    vm_ns::init_acos(m);
+    vm_ns::init_acosh(m);
+    vm_ns::init_add(m);
+    vm_ns::init_asin(m);
+    vm_ns::init_asinh(m);
+    vm_ns::init_atan(m);
+    vm_ns::init_atan2(m);
+    vm_ns::init_atanh(m);
+    vm_ns::init_cbrt(m);
+    vm_ns::init_ceil(m);
+    vm_ns::init_conj(m);
+    vm_ns::init_cos(m);
+    vm_ns::init_cosh(m);
+    vm_ns::init_div(m);
+    vm_ns::init_exp(m);
+    vm_ns::init_exp2(m);
+    vm_ns::init_expm1(m);
+    vm_ns::init_floor(m);
+    vm_ns::init_hypot(m);
+    vm_ns::init_ln(m);
+    vm_ns::init_log10(m);
+    vm_ns::init_log1p(m);
+    vm_ns::init_log2(m);
+    vm_ns::init_mul(m);
+    vm_ns::init_pow(m);
+    vm_ns::init_rint(m);
+    vm_ns::init_sin(m);
+    vm_ns::init_sinh(m);
+    vm_ns::init_sqr(m);
+    vm_ns::init_sqrt(m);
+    vm_ns::init_sub(m);
+    vm_ns::init_tan(m);
+    vm_ns::init_tanh(m);
+    vm_ns::init_trunc(m);
 }

From 6a737c49919baf52f2186182019410b17a7c5080 Mon Sep 17 00:00:00 2001
From: vlad-perevezentsev <vladislav.perevezentsev@intel.com>
Date: Fri, 14 Jun 2024 18:35:37 +0200
Subject: [PATCH 20/49] Handling warnings in pytest (#1845)

* Enable loading of warning plugin in pytest

* Fix DeprecationWarning in test_histogram.py

* Ignore DeprecationWarning for pkg_resources

* Fix SyntaxWarning in test_ndarray_math.py

* Deprecate numpy_cupy_array_list_equal

* Fix DeprecationWarning in test_mathematical.py

* Avoid FutureWarning for rcond parameter of numpy.linalg.lstsq

* Fix DeprecationWarning in cupy test_elementwise.py

* Skip test_msort_zero_dim - not implemented

* Ignore RuntimeWarning for numpy.arccosh

* Fix DeprecationWarning for numpy.fromstring

* Add test_digitize_inf to TestDigitize

* Fix DeprecationWarning: Converting np.integer to a dtype is deprecated

* Fix ComplexWarning in 2 ways

* Fix RuntimeWarning by reducing shape in TestNansumNanprodLong

* Handle DeprecationWarning in test_dparray.py

* Skip test_lexsort_one_dim/_two_dim - not implemented

* Handle RuntimeWarning in test_linspace_float_underflow

* Fix DeprecationWarning in test_round_halfway_uint

* Update test_linspace to avoid DeprecationWarning

* Use pytest.mark.usefixtures('suppress_complex_warning')

* Handle RuntimeWarning: divide by zero in test_reciprocal

* Fix DeprecationWarning in test_mathematical.py

* Handle RuntimeWarning in test_from_dlpack

* Add the fixture to test_from_dlpack_with_dpt insted of test_from_dlpack
---
 setup.cfg                                     | 12 ++++++++---
 tests/test_arraycreation.py                   |  6 +++---
 tests/test_arraymanipulation.py               |  1 +
 tests/test_dparray.py                         |  6 ++++++
 tests/test_histogram.py                       | 20 +++++++++++++------
 tests/test_linalg.py                          | 12 ++++++++---
 tests/test_mathematical.py                    |  2 +-
 tests/test_random_state.py                    | 17 ++++++----------
 tests/test_sycl_queue.py                      |  7 +++++--
 tests/test_umath.py                           |  1 +
 tests/test_usm_type.py                        |  2 +-
 .../cupy/binary_tests/test_elementwise.py     |  6 +++---
 .../cupy/core_tests/test_ndarray_math.py      |  8 ++++----
 .../cupy/creation_tests/test_ranges.py        |  6 +++---
 .../cupy/logic_tests/test_comparison.py       |  8 ++++----
 .../cupy/math_tests/test_sumprod.py           | 14 ++++++++++++-
 16 files changed, 83 insertions(+), 45 deletions(-)

diff --git a/setup.cfg b/setup.cfg
index 387884acef0c..60c8b1f6372a 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -6,14 +6,20 @@ ignore = E201
 # By default, tests marked as slow will be deselected.
 # To run all tests, use -m "slow and not slow".
 # To run only slow tests, use -m "slow".
-addopts = -m "not slow" -p no:warnings --tb=short --strict-markers
+addopts = -m "not slow" --tb=short --strict-markers
 norecursedirs = tests_perf
 testpaths = tests
 markers =
     slow: marks tests as slow (deselect with '-m "not slow"')
     multi_gpu: marks tests that require a specified number of GPUs
-    # Added due to -p no:warnings to avoid errors with --strict-markers
-    filterwarnings: mark to filter warnings during tests
+filterwarnings =
+    # pkg_resources
+    ignore:pkg_resources is deprecated as an API:DeprecationWarning
+    # NumPy arccosh
+    # Undefined behavior depends on the backend:
+    # NumPy with OpenBLAS for np.array[1.0] does not raise a warning
+    # while numpy with OneMKL raises RuntimeWarning
+    ignore:invalid value encountered in arccosh:RuntimeWarning
 
 [versioneer]
 VCS = git
diff --git a/tests/test_arraycreation.py b/tests/test_arraycreation.py
index 1f98aa2de6ff..ca91ec6f6994 100644
--- a/tests/test_arraycreation.py
+++ b/tests/test_arraycreation.py
@@ -516,6 +516,7 @@ def test_vander_seq(sequence):
     assert_allclose(vander_func(numpy, sequence), vander_func(dpnp, sequence))
 
 
+@pytest.mark.usefixtures("suppress_complex_warning")
 @pytest.mark.parametrize(
     "shape",
     [(), 0, (0,), (2, 0, 3), (3, 2)],
@@ -531,6 +532,7 @@ def test_full(shape, fill_value, dtype, order):
     assert_array_equal(func(numpy), func(dpnp))
 
 
+@pytest.mark.usefixtures("suppress_complex_warning")
 @pytest.mark.parametrize(
     "array",
     [[], 0, [1, 2, 3], [[1, 2], [3, 4]]],
@@ -709,9 +711,7 @@ def test_linspace(start, stop, num, dtype, retstep):
     if numpy.issubdtype(dtype, dpnp.integer):
         assert_allclose(res_np, res_dp, rtol=1)
     else:
-        if dtype is None and not has_support_aspect64():
-            dtype = dpnp.float32
-        assert_allclose(res_np, res_dp, rtol=1e-06, atol=dpnp.finfo(dtype).eps)
+        assert_dtype_allclose(res_dp, res_np)
 
 
 @pytest.mark.parametrize(
diff --git a/tests/test_arraymanipulation.py b/tests/test_arraymanipulation.py
index 69116ef86921..12f14bf41098 100644
--- a/tests/test_arraymanipulation.py
+++ b/tests/test_arraymanipulation.py
@@ -846,6 +846,7 @@ def test_asfarray(dtype, data):
     assert_array_equal(result, expected)
 
 
+@pytest.mark.usefixtures("suppress_complex_warning")
 @pytest.mark.parametrize("dtype", get_all_dtypes())
 @pytest.mark.parametrize("data", [[1.0, 2.0, 3.0]], ids=["[1., 2., 3.]"])
 @pytest.mark.parametrize("data_dtype", get_all_dtypes(no_none=True))
diff --git a/tests/test_dparray.py b/tests/test_dparray.py
index 874493f4e952..ac9757c580a8 100644
--- a/tests/test_dparray.py
+++ b/tests/test_dparray.py
@@ -220,6 +220,9 @@ def test_print_dpnp_zero_shape():
     assert result == expected
 
 
+# Numpy will raise an error when converting a.ndim > 0 to a scalar
+# TODO: Discuss dpnp behavior according to these future changes
+@pytest.mark.filterwarnings("ignore::DeprecationWarning")
 @pytest.mark.parametrize("func", [bool, float, int, complex])
 @pytest.mark.parametrize("shape", [tuple(), (1,), (1, 1), (1, 1, 1)])
 @pytest.mark.parametrize(
@@ -231,6 +234,9 @@ def test_scalar_type_casting(func, shape, dtype):
     assert func(numpy_array) == func(dpnp_array)
 
 
+# Numpy will raise an error when converting a.ndim > 0 to a scalar
+# TODO: Discuss dpnp behavior according to these future changes
+@pytest.mark.filterwarnings("ignore::DeprecationWarning")
 @pytest.mark.parametrize(
     "method", ["__bool__", "__float__", "__int__", "__complex__"]
 )
diff --git a/tests/test_histogram.py b/tests/test_histogram.py
index 7601d67c54a9..da58a4ac2f8f 100644
--- a/tests/test_histogram.py
+++ b/tests/test_histogram.py
@@ -38,11 +38,6 @@ class TestDigitize:
                 numpy.array([1, 2, 3, 4, 5, 6, 7, 8, 9]),
                 numpy.array([1, 4, 6, 7]),
             ),
-            # Infinity values
-            (
-                numpy.array([-numpy.inf, -1, 0, 1, numpy.inf]),
-                numpy.array([-2, -1, 0, 1, 2]),
-            ),
             # Repeated elements
             (numpy.array([1, 2, 2, 3, 3, 3, 4, 5]), numpy.array([1, 2, 3, 4])),
         ],
@@ -57,6 +52,18 @@ def test_digitize(self, x, bins, dtype, right):
         expected = numpy.digitize(x, bins, right=right)
         assert_dtype_allclose(result, expected)
 
+    @pytest.mark.parametrize("dtype", get_float_dtypes())
+    @pytest.mark.parametrize("right", [True, False])
+    def test_digitize_inf(self, dtype, right):
+        x = numpy.array([-numpy.inf, -1, 0, 1, numpy.inf], dtype=dtype)
+        bins = numpy.array([-2, -1, 0, 1, 2], dtype=dtype)
+        x_dp = dpnp.array(x)
+        bins_dp = dpnp.array(bins)
+
+        result = dpnp.digitize(x_dp, bins_dp, right=right)
+        expected = numpy.digitize(x, bins, right=right)
+        assert_dtype_allclose(result, expected)
+
     @pytest.mark.parametrize(
         "dtype_x", get_all_dtypes(no_bool=True, no_complex=True)
     )
@@ -386,7 +393,8 @@ def test_infinite_edge(self, xp, inf_val):
 
         # both first and last ranges must be finite
         with assert_raises_regex(
-            ValueError, f"autodetected range of \[{min}, {max}\] is not finite"
+            ValueError,
+            f"autodetected range of \\[{min}, {max}\\] is not finite",
         ):
             xp.histogram(v)
 
diff --git a/tests/test_linalg.py b/tests/test_linalg.py
index a45b2826b3ac..ec2a085d4d34 100644
--- a/tests/test_linalg.py
+++ b/tests/test_linalg.py
@@ -780,7 +780,9 @@ def test_lstsq(self, a_shape, b_shape, dtype):
         b_dp = inp.array(b_np)
 
         result = inp.linalg.lstsq(a_dp, b_dp)
-        expected = numpy.linalg.lstsq(a_np, b_np)
+        # if rcond is not set, FutureWarning is given.
+        # By default Numpy uses None for calculations
+        expected = numpy.linalg.lstsq(a_np, b_np, rcond=None)
 
         for param_dp, param_np in zip(result, expected):
             assert_dtype_allclose(param_dp, param_np)
@@ -794,7 +796,9 @@ def test_lstsq_diff_type(self, a_dtype, b_dtype):
         a_dp = inp.array(a_np)
         b_dp = inp.array(b_np)
 
-        expected = numpy.linalg.lstsq(a_np, b_np)
+        # if rcond is not set, FutureWarning is given.
+        # By default Numpy uses None for calculations
+        expected = numpy.linalg.lstsq(a_np, b_np, rcond=None)
         result = inp.linalg.lstsq(a_dp, b_dp)
 
         for param_dp, param_np in zip(result, expected):
@@ -813,7 +817,9 @@ def test_lstsq_empty(self, m, n, nrhs, dtype):
         b_dp = inp.array(b_np)
 
         result = inp.linalg.lstsq(a_dp, b_dp)
-        expected = numpy.linalg.lstsq(a_np, b_np)
+        # if rcond is not set, FutureWarning is given.
+        # By default Numpy uses None for calculations
+        expected = numpy.linalg.lstsq(a_np, b_np, rcond=None)
 
         for param_dp, param_np in zip(result, expected):
             assert_dtype_allclose(param_dp, param_np)
diff --git a/tests/test_mathematical.py b/tests/test_mathematical.py
index 4a86cdc081ed..4a1ee63c8fc9 100644
--- a/tests/test_mathematical.py
+++ b/tests/test_mathematical.py
@@ -91,7 +91,7 @@ def test_mode(self):
         d = dpnp.ones(100)
         k = dpnp.ones(3)
         default_mode = dpnp.convolve(d, k, mode="full")
-        full_mode = dpnp.convolve(d, k, mode="f")
+        full_mode = dpnp.convolve(d, k, mode="full")
         assert_array_equal(full_mode, default_mode)
         # integer mode
         with assert_raises(ValueError):
diff --git a/tests/test_random_state.py b/tests/test_random_state.py
index 70940501d2ee..ed56dbdf7307 100644
--- a/tests/test_random_state.py
+++ b/tests/test_random_state.py
@@ -239,7 +239,6 @@ def test_fallback(self, loc, scale):
         [
             dpnp.float16,
             float,
-            dpnp.integer,
             dpnp.int64,
             dpnp.int32,
             dpnp.int,
@@ -253,7 +252,6 @@ def test_fallback(self, loc, scale):
         ids=[
             "dpnp.float16",
             "float",
-            "dpnp.integer",
             "dpnp.int64",
             "dpnp.int32",
             "dpnp.int",
@@ -366,8 +364,8 @@ def test_wrong_dims(self):
 class TestRandInt:
     @pytest.mark.parametrize(
         "dtype",
-        [int, dpnp.int32, dpnp.int, dpnp.integer],
-        ids=["int", "dpnp.int32", "dpnp.int", "dpnp.integer"],
+        [int, dpnp.int32, dpnp.int],
+        ids=["int", "dpnp.int32", "dpnp.int"],
     )
     @pytest.mark.parametrize(
         "usm_type",
@@ -379,7 +377,7 @@ def test_distr(self, dtype, usm_type):
         low = 1
         high = 10
 
-        if dtype in (dpnp.int, dpnp.integer) and dtype != dpnp.dtype("int32"):
+        if dtype == dpnp.int and dtype != dpnp.dtype("int32"):
             pytest.skip(
                 "dtype isn't alias on dpnp.int32 on the target OS, so there will be a fallback"
             )
@@ -566,11 +564,10 @@ def test_bounds_fallback(self, low, high):
     @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @pytest.mark.parametrize(
         "dtype",
-        [dpnp.int64, dpnp.int, dpnp.integer, dpnp.bool, dpnp.bool_, bool],
+        [dpnp.int64, dpnp.int, dpnp.bool, dpnp.bool_, bool],
         ids=[
             "dpnp.int64",
             "dpnp.int",
-            "dpnp.integer",
             "dpnp.bool",
             "dpnp.bool_",
             "bool",
@@ -582,7 +579,7 @@ def test_dtype_fallback(self, dtype):
         high = 37 if not dtype in {dpnp.bool_, bool} else 2
         size = (3, 2, 5)
 
-        if dtype in (dpnp.int, dpnp.integer) and dtype == dpnp.dtype("int32"):
+        if dtype == dpnp.int and dtype == dpnp.dtype("int32"):
             pytest.skip(
                 "dtype is alias on dpnp.int32 on the target OS, so no fallback here"
             )
@@ -1157,7 +1154,6 @@ def test_fallback(self, low, high):
         [
             dpnp.float16,
             float,
-            dpnp.integer,
             dpnp.int64,
             dpnp.int,
             int,
@@ -1170,7 +1166,6 @@ def test_fallback(self, low, high):
         ids=[
             "dpnp.float16",
             "float",
-            "dpnp.integer",
             "dpnp.int64",
             "dpnp.int",
             "int",
@@ -1182,7 +1177,7 @@ def test_fallback(self, low, high):
         ],
     )
     def test_invalid_dtype(self, dtype):
-        if dtype in (dpnp.int, dpnp.integer) and dtype == dpnp.dtype("int32"):
+        if dtype == dpnp.int and dtype == dpnp.dtype("int32"):
             pytest.skip(
                 "dtype is alias on dpnp.int32 on the target OS, so no error here"
             )
diff --git a/tests/test_sycl_queue.py b/tests/test_sycl_queue.py
index 8332f26949ba..3073b8806e5e 100644
--- a/tests/test_sycl_queue.py
+++ b/tests/test_sycl_queue.py
@@ -103,7 +103,7 @@ def vvsort(val, vec, size, xp):
             {"dtype": dpnp.int32},
         ),
         pytest.param("fromiter", [[1, 2, 3, 4]], {"dtype": dpnp.int64}),
-        pytest.param("fromstring", ["1, 2"], {"dtype": int, "sep": " "}),
+        pytest.param("fromstring", ["1 2"], {"dtype": int, "sep": " "}),
         pytest.param("full", [(2, 2)], {"fill_value": 5}),
         pytest.param("eye", [4, 2], {}),
         pytest.param("geomspace", [1, 4, 8], {}),
@@ -1686,6 +1686,7 @@ def test_from_dlpack(arr_dtype, shape, device):
         assert V.strides == W.strides
 
 
+@pytest.mark.usefixtures("suppress_invalid_numpy_warnings")
 @pytest.mark.parametrize(
     "device",
     valid_devices,
@@ -2112,7 +2113,9 @@ def test_lstsq(m, n, nrhs, device):
     b_dp = dpnp.array(b_np, device=device)
 
     result_dp = dpnp.linalg.lstsq(a_dp, b_dp)
-    result = numpy.linalg.lstsq(a_np, b_np)
+    # if rcond is not set, FutureWarning is given.
+    # By default Numpy uses None for calculations
+    result = numpy.linalg.lstsq(a_np, b_np, rcond=None)
 
     for param_dp, param_np in zip(result_dp, result):
         assert_dtype_allclose(param_dp, param_np)
diff --git a/tests/test_umath.py b/tests/test_umath.py
index 5a61079335f1..c302cbba3a05 100644
--- a/tests/test_umath.py
+++ b/tests/test_umath.py
@@ -398,6 +398,7 @@ def test_invalid_out(self, out):
 
 
 class TestReciprocal:
+    @pytest.mark.usefixtures("suppress_divide_numpy_warnings")
     @pytest.mark.parametrize("dtype", get_float_complex_dtypes())
     def test_reciprocal(self, dtype):
         np_array, expected = _get_numpy_arrays_1in_1out(
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index f66017ea6e26..77839f9b9338 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -199,7 +199,7 @@ def test_array_creation_from_2d_array(func, args, usm_type_x, usm_type_y):
             "fromfunction", [(lambda i, j: i + j), (3, 3)], {"dtype": dp.int32}
         ),
         pytest.param("fromiter", [[1, 2, 3, 4]], {"dtype": dp.int64}),
-        pytest.param("fromstring", ["1, 2"], {"dtype": int, "sep": " "}),
+        pytest.param("fromstring", ["1 2"], {"dtype": int, "sep": " "}),
         pytest.param("full", [(2, 2)], {"fill_value": 5}),
         pytest.param("eye", [4, 2], {}),
         pytest.param("geomspace", [1, 4, 8], {}),
diff --git a/tests/third_party/cupy/binary_tests/test_elementwise.py b/tests/third_party/cupy/binary_tests/test_elementwise.py
index a26983662566..e756c454f150 100644
--- a/tests/third_party/cupy/binary_tests/test_elementwise.py
+++ b/tests/third_party/cupy/binary_tests/test_elementwise.py
@@ -7,14 +7,14 @@ class TestElementwise(unittest.TestCase):
     @testing.for_int_dtypes()
     @testing.numpy_cupy_array_equal()
     def check_unary_int(self, name, xp, dtype):
-        a = xp.array([-3, -2, -1, 0, 1, 2, 3], dtype=dtype)
+        a = xp.array([-3, -2, -1, 0, 1, 2, 3]).astype(dtype)
         return getattr(xp, name)(a)
 
     @testing.for_int_dtypes()
     @testing.numpy_cupy_array_equal()
     def check_binary_int(self, name, xp, dtype):
-        a = xp.array([-3, -2, -1, 0, 1, 2, 3], dtype=dtype)
-        b = xp.array([0, 1, 2, 3, 4, 5, 6], dtype=dtype)
+        a = xp.array([-3, -2, -1, 0, 1, 2, 3]).astype(dtype)
+        b = xp.array([0, 1, 2, 3, 4, 5, 6]).astype(dtype)
         return getattr(xp, name)(a, b)
 
     def test_bitwise_and(self):
diff --git a/tests/third_party/cupy/core_tests/test_ndarray_math.py b/tests/third_party/cupy/core_tests/test_ndarray_math.py
index 3233687789ab..81caf2c8ceb0 100644
--- a/tests/third_party/cupy/core_tests/test_ndarray_math.py
+++ b/tests/third_party/cupy/core_tests/test_ndarray_math.py
@@ -57,7 +57,7 @@ class TestRoundHalfway(unittest.TestCase):
     @testing.for_float_dtypes()
     @testing.numpy_cupy_allclose(atol=1e-5)
     def test_round_halfway_float(self, xp, dtype):
-        if self.decimals is -3 and dtype == numpy.float32:
+        if self.decimals == -3 and dtype == numpy.float32:
             pytest.skip(
                 "Case with decimals=-3 and dtype float32 has divide error less than 1e-5"
             )
@@ -78,7 +78,7 @@ def test_round_halfway_float(self, xp, dtype):
     @testing.numpy_cupy_array_equal()
     def test_round_halfway_int(self, xp, dtype):
         # generate [..., -1.5, -0.5, 0.5, 1.5, ...] * 10^{-decimals}
-        if self.decimals is -3 and not has_support_aspect64():
+        if self.decimals == -3 and not has_support_aspect64():
             pytest.skip(
                 "Case with decimals=-3 and dtype float32 has divide error less than 1e-5"
             )
@@ -96,7 +96,7 @@ def test_round_halfway_int(self, xp, dtype):
     @testing.numpy_cupy_array_equal()
     def test_round_halfway_uint(self, xp, dtype):
         # generate [0.5, 1.5, ...] * 10^{-decimals}
-        if self.decimals is -3 and not has_support_aspect64():
+        if self.decimals == -3 and not has_support_aspect64():
             pytest.skip(
                 "Case with decimals=-3 and dtype float32 has divide error less than 1e-5"
             )
@@ -105,7 +105,7 @@ def test_round_halfway_uint(self, xp, dtype):
         a -= 1
         scale = 10 ** abs(self.decimals)
         if self.decimals < 0:
-            a *= xp.array(scale, dtype=dtype)
+            a *= xp.array(scale).astype(dtype)
         a >>= 1
 
         return a.round(self.decimals)
diff --git a/tests/third_party/cupy/creation_tests/test_ranges.py b/tests/third_party/cupy/creation_tests/test_ranges.py
index 2094f2ffc8e1..92c81061b7a1 100644
--- a/tests/third_party/cupy/creation_tests/test_ranges.py
+++ b/tests/third_party/cupy/creation_tests/test_ranges.py
@@ -176,9 +176,9 @@ def test_linspace_float_overflow(self, xp):
     def test_linspace_float_underflow(self, xp):
         # find minimum subnormal number
         dtype = cupy.default_float_type()
-        x = xp.finfo(dtype).min
-        while x / 2 > 0:
-            x /= 2
+        # use .tiny instead of .min and while to get
+        # minimum subnormal number directly and avoid RuntimeWarning
+        x = xp.finfo(dtype).tiny
         return xp.linspace(0.0, x, 10, dtype=dtype)
 
     @testing.with_requires("numpy>=1.16")
diff --git a/tests/third_party/cupy/logic_tests/test_comparison.py b/tests/third_party/cupy/logic_tests/test_comparison.py
index b7dba2a219b8..eed4c7f9b368 100644
--- a/tests/third_party/cupy/logic_tests/test_comparison.py
+++ b/tests/third_party/cupy/logic_tests/test_comparison.py
@@ -46,28 +46,28 @@ class TestComparisonOperator(unittest.TestCase):
     ]
 
     @testing.for_all_dtypes(no_complex=True)
-    @testing.numpy_cupy_array_list_equal()
+    @testing.numpy_cupy_array_equal()
     def test_binary_npscalar_array(self, xp, dtype):
         a = numpy.int16(3)
         b = testing.shaped_arange((2, 3), xp, dtype)
         return [op(a, b) for op in self.operators]
 
     @testing.for_all_dtypes(no_complex=True)
-    @testing.numpy_cupy_array_list_equal()
+    @testing.numpy_cupy_array_equal()
     def test_binary_pyscalar_array(self, xp, dtype):
         a = 3.0
         b = testing.shaped_arange((2, 3), xp, dtype)
         return [op(a, b) for op in self.operators]
 
     @testing.for_all_dtypes(no_complex=True)
-    @testing.numpy_cupy_array_list_equal()
+    @testing.numpy_cupy_array_equal()
     def test_binary_array_npscalar(self, xp, dtype):
         a = testing.shaped_arange((2, 3), xp, dtype)
         b = numpy.float32(3.0)
         return [op(a, b) for op in self.operators]
 
     @testing.for_all_dtypes(no_complex=True)
-    @testing.numpy_cupy_array_list_equal()
+    @testing.numpy_cupy_array_equal()
     def test_binary_array_pyscalar(self, xp, dtype):
         a = testing.shaped_arange((2, 3), xp, dtype)
         b = 3
diff --git a/tests/third_party/cupy/math_tests/test_sumprod.py b/tests/third_party/cupy/math_tests/test_sumprod.py
index f36086755e97..b1561260402f 100644
--- a/tests/third_party/cupy/math_tests/test_sumprod.py
+++ b/tests/third_party/cupy/math_tests/test_sumprod.py
@@ -234,7 +234,18 @@ def _numpy_nanprod_implemented(self):
         )
 
     def _test(self, xp, dtype):
-        a = testing.shaped_arange(self.shape, xp, dtype)
+        shape = self.shape
+        # Reduce the shape of the input array to avoid overflow warning
+        # for nanprod with float32, shape=(20, 30, 40), axis=0 and transpose_axes=False
+        if (
+            self.func == "nanprod"
+            and dtype == xp.float32
+            and self.shape == (20, 30, 40)
+            and self.axis == 0
+            and not self.transpose_axes
+        ):
+            shape = (10, 20, 30)
+        a = testing.shaped_arange(shape, xp, dtype)
         if self.transpose_axes:
             a = a.transpose(2, 0, 1)
         if not issubclass(dtype, xp.integer):
@@ -245,6 +256,7 @@ def _test(self, xp, dtype):
     @testing.for_all_dtypes(no_bool=True, no_float16=True)
     @testing.numpy_cupy_allclose(type_check=has_support_aspect64())
     def test_nansum_all(self, xp, dtype):
+        dtype = xp.float32
         if (
             not self._numpy_nanprod_implemented()
             or not self._do_transposed_axis_test()

From 38fd39debebc741349587bcd254648cc9b1ca724 Mon Sep 17 00:00:00 2001
From: Natalia Polina <natalia.polina@intel.com>
Date: Fri, 14 Jun 2024 11:07:34 -0700
Subject: [PATCH 21/49] Added device keyword argument to astype function
 (#1870)

* Added device keyword argument to astype function

* Added test for astype function

* address comments

---------

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 dpnp/dpnp_array.py       | 21 +++++++++++++++++++--
 dpnp/dpnp_iface.py       | 11 +++++++++--
 tests/test_sycl_queue.py | 19 +++++++++++++++++++
 3 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/dpnp/dpnp_array.py b/dpnp/dpnp_array.py
index fb8e1fcef12d..fd2d06f74285 100644
--- a/dpnp/dpnp_array.py
+++ b/dpnp/dpnp_array.py
@@ -562,7 +562,15 @@ def asnumpy(self):
 
         return dpt.asnumpy(self._array_obj)
 
-    def astype(self, dtype, order="K", casting="unsafe", subok=True, copy=True):
+    def astype(
+        self,
+        dtype,
+        order="K",
+        casting="unsafe",
+        subok=True,
+        copy=True,
+        device=None,
+    ):
         """
         Copy the array with data type casting.
 
@@ -597,6 +605,13 @@ def astype(self, dtype, order="K", casting="unsafe", subok=True, copy=True):
             this is set to ``False``, and the `dtype`, `order`, and `subok`
             requirements are satisfied, the input array is returned instead of
             a copy.
+        device : {None, string, SyclDevice, SyclQueue}, optional
+            An array API concept of device where the output array is created.
+            The `device` can be ``None`` (the default), an OneAPI filter selector
+            string, an instance of :class:`dpctl.SyclDevice` corresponding to
+            a non-partitioned SYCL device, an instance of :class:`dpctl.SyclQueue`,
+            or a `Device` object returned by
+            :obj:`dpnp.dpnp_array.dpnp_array.device` property. Default: ``None``.
 
         Returns
         -------
@@ -626,7 +641,9 @@ def astype(self, dtype, order="K", casting="unsafe", subok=True, copy=True):
                 f"subok={subok} is currently not supported"
             )
 
-        return dpnp.astype(self, dtype, order=order, casting=casting, copy=copy)
+        return dpnp.astype(
+            self, dtype, order=order, casting=casting, copy=copy, device=device
+        )
 
     # 'base',
     # 'byteswap',
diff --git a/dpnp/dpnp_iface.py b/dpnp/dpnp_iface.py
index 0dfd63dab217..49e7b41c01c9 100644
--- a/dpnp/dpnp_iface.py
+++ b/dpnp/dpnp_iface.py
@@ -180,7 +180,7 @@ def asnumpy(a, order="C"):
 
 
 # pylint: disable=redefined-outer-name
-def astype(x1, dtype, order="K", casting="unsafe", copy=True):
+def astype(x1, dtype, order="K", casting="unsafe", copy=True, device=None):
     """
     Copy the array with data type casting.
 
@@ -213,6 +213,13 @@ def astype(x1, dtype, order="K", casting="unsafe", copy=True):
         By default, ``astype`` always returns a newly allocated array. If this
         is set to ``False``, and the `dtype`, `order`, and `subok` requirements
         are satisfied, the input array is returned instead of a copy.
+    device : {None, string, SyclDevice, SyclQueue}, optional
+        An array API concept of device where the output array is created.
+        The `device` can be ``None`` (the default), an OneAPI filter selector
+        string, an instance of :class:`dpctl.SyclDevice` corresponding to
+        a non-partitioned SYCL device, an instance of :class:`dpctl.SyclQueue`,
+        or a `Device` object returned by
+        :obj:`dpnp.dpnp_array.dpnp_array.device` property. Default: ``None``.
 
     Returns
     -------
@@ -228,7 +235,7 @@ def astype(x1, dtype, order="K", casting="unsafe", copy=True):
 
     x1_obj = dpnp.get_usm_ndarray(x1)
     array_obj = dpt.astype(
-        x1_obj, dtype, order=order, casting=casting, copy=copy
+        x1_obj, dtype, order=order, casting=casting, copy=copy, device=device
     )
 
     # return x1 if dpctl returns a zero copy of x1_obj
diff --git a/tests/test_sycl_queue.py b/tests/test_sycl_queue.py
index 3073b8806e5e..99334cfabfcd 100644
--- a/tests/test_sycl_queue.py
+++ b/tests/test_sycl_queue.py
@@ -2211,3 +2211,22 @@ def test_histogram_bin_edges(weights, device):
 
     edges_queue = result_edges.sycl_queue
     assert_sycl_queue_equal(edges_queue, iv.sycl_queue)
+
+
+@pytest.mark.parametrize(
+    "device_x",
+    valid_devices,
+    ids=[device.filter_string for device in valid_devices],
+)
+@pytest.mark.parametrize(
+    "device_y",
+    valid_devices,
+    ids=[device.filter_string for device in valid_devices],
+)
+def test_astype(device_x, device_y):
+    x = dpnp.array([1, 2, 3], dtype="i4", device=device_x)
+    y = dpnp.astype(x, dtype="f4")
+    assert_sycl_queue_equal(y.sycl_queue, x.sycl_queue)
+    sycl_queue = dpctl.SyclQueue(device_y)
+    y = dpnp.astype(x, dtype="f4", device=sycl_queue)
+    assert_sycl_queue_equal(y.sycl_queue, sycl_queue)

From fe93c056e9998f55ce1711110704955fa2fc690c Mon Sep 17 00:00:00 2001
From: vtavana <120411540+vtavana@users.noreply.github.com>
Date: Sat, 15 Jun 2024 05:03:00 -0500
Subject: [PATCH 22/49] resolve gh-1871 (#1872)

* update returned result when out is defined with order F

* address comments

* add test for out keyword in einsum

---------

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 dpnp/dpnp_iface_linearalgebra.py            |   1 -
 dpnp/dpnp_utils/dpnp_utils_linearalgebra.py | 101 ++++++++++++++------
 tests/test_linalg.py                        |  16 ++++
 tests/test_mathematical.py                  | 100 +++++++++++++++++--
 tests/test_product.py                       |   8 ++
 5 files changed, 191 insertions(+), 35 deletions(-)

diff --git a/dpnp/dpnp_iface_linearalgebra.py b/dpnp/dpnp_iface_linearalgebra.py
index 1af952388a6c..f674c96040a2 100644
--- a/dpnp/dpnp_iface_linearalgebra.py
+++ b/dpnp/dpnp_iface_linearalgebra.py
@@ -821,7 +821,6 @@ def matmul(
 
     """
 
-    dpnp.check_supported_arrays_type(x1, x2)
     if subok is False:
         raise NotImplementedError(
             "subok keyword argument is only supported by its default value."
diff --git a/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py b/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
index 0b9686771c30..c98acc2c81ed 100644
--- a/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
+++ b/dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
@@ -33,6 +33,7 @@
 import dpctl.tensor._tensor_elementwise_impl as tei
 import dpctl.tensor._tensor_impl as ti
 import numpy
+from dpctl.utils import ExecutionPlacementError
 from numpy.core.numeric import normalize_axis_tuple
 
 import dpnp
@@ -218,7 +219,9 @@ def _compute_size(start, shape):
     return ret
 
 
-def _copy_array(x, dep_events, host_events, copy_flag=False, dtype=None):
+def _copy_array(
+    x, dep_events, host_events, copy_flag=False, dtype=None, order="C"
+):
     """
     Creating a copy of input array if needed.
 
@@ -236,7 +239,7 @@ def _copy_array(x, dep_events, host_events, copy_flag=False, dtype=None):
         copy = x.dtype != dtype if dtype is not None else False
 
     if copy:
-        x_copy = dpnp.empty_like(x, dtype=dtype, order="C")
+        x_copy = dpnp.empty_like(x, dtype=dtype, order=order)
         ht_copy_ev, copy_ev = ti._copy_usm_ndarray_into_usm_ndarray(
             src=dpnp.get_usm_ndarray(x),
             dst=x_copy.get_array(),
@@ -248,7 +251,9 @@ def _copy_array(x, dep_events, host_events, copy_flag=False, dtype=None):
     return x
 
 
-def _create_result_array(x1, x2, out, shape, dtype, usm_type, sycl_queue):
+def _create_result_array(
+    x1, x2, out, shape, dtype, usm_type, sycl_queue, order="C"
+):
     """
     Create the result array.
 
@@ -263,13 +268,12 @@ def _create_result_array(x1, x2, out, shape, dtype, usm_type, sycl_queue):
         x1_usm = dpnp.get_usm_ndarray(x1)
         x2_usm = dpnp.get_usm_ndarray(x2)
         out_usm = dpnp.get_usm_ndarray(out)
-        contig_flag = _define_contig_flag(out)
+        contig_flag, _, _ = _define_contig_flag(out)
 
         if (
             out.dtype == dtype
             and out.shape == shape
             and out.usm_type == usm_type
-            and out.sycl_queue == sycl_queue
             and contig_flag
             and not ti._array_overlap(x1_usm, out_usm)
             and not ti._array_overlap(x2_usm, out_usm)
@@ -279,6 +283,7 @@ def _create_result_array(x1, x2, out, shape, dtype, usm_type, sycl_queue):
     return dpnp.empty(
         shape,
         dtype=dtype,
+        order=order,
         usm_type=usm_type,
         sycl_queue=sycl_queue,
     )
@@ -295,14 +300,14 @@ def _define_contig_flag(x):
     x_strides = x.strides
     x_shape = x.shape
     if x.ndim < 2:
-        return True
+        return True, True, True
 
     x_strides = _standardize_strides_to_nonzero(x_strides, x_shape)
     x_is_c_contiguous = x_strides[-1] == 1 and x_strides[-2] == x_shape[-1]
     x_is_f_contiguous = x_strides[-2] == 1 and x_strides[-1] == x_shape[-2]
     if x_is_c_contiguous or x_is_f_contiguous:
         flag = True
-    return flag
+    return flag, x_is_c_contiguous, x_is_f_contiguous
 
 
 def _define_dim_flags(x, pos):
@@ -746,17 +751,26 @@ def _gemm_batch_matmul(exec_q, x1, x2, res, dev_tasks_list):
         )
         ht_tasks_list.append(ht_blas_ev)
     dpctl.SyclEvent.wait_for(ht_tasks_list)
+
     res_shape = res.shape
-    if not row_major:
-        res = dpnp.reshape(
-            res.ravel(), (batch_size, res_shape[2], res_shape[1])
-        ).transpose(0, 2, 1)
+    _, res_is_c_contig, res_is_f_contig = _define_contig_flag(res)
+    if row_major:
+        if res_is_f_contig:
+            res = dpnp.reshape(
+                dpnp.ravel(res, order="F"),
+                (res_shape[1], res_shape[2], batch_size),
+            ).transpose(2, 0, 1)
+    else:
+        if res_is_c_contig:
+            res = dpnp.reshape(
+                dpnp.ravel(res, order="C"),
+                (batch_size, res_shape[2], res_shape[1]),
+            ).transpose(0, 2, 1)
 
     if res_shape != orig_shape:
         res = res.reshape(orig_shape)
 
-    res = dpnp.ascontiguousarray(res)
-    return res
+    return dpnp.ascontiguousarray(res)
 
 
 def _gemm_matmul(exec_q, x1, x2, res, dev_tasks_list):
@@ -769,14 +783,16 @@ def _gemm_matmul(exec_q, x1, x2, res, dev_tasks_list):
     )
     ht_blas_ev.wait()
 
-    if not row_major:
-        # TODO: investigate the possibility of defining result
-        # array with "F" order for this case
-        res = dpnp.ascontiguousarray(
-            dpnp.reshape(res.ravel(), res.shape, order="F")
-        )
+    if row_major:
+        if res.flags.f_contiguous is True:
+            # read data in "F" order and write it in "C" order
+            res = dpnp.reshape(dpnp.ravel(res, order="F"), res.shape, order="C")
+    else:
+        if res.flags.c_contiguous is True:
+            # read data in "C" order and write it in "F" order
+            res = dpnp.reshape(dpnp.ravel(res, order="C"), res.shape, order="F")
 
-    return res
+    return dpnp.ascontiguousarray(res)
 
 
 def _greedy_path(input_sets, output_set, idx_dict, memory_limit):
@@ -1746,6 +1762,13 @@ def dpnp_dot(a, b, /, out=None, *, conjugate=False):
         )
 
     res_usm_type, exec_q = get_usm_allocations([a, b])
+    if (
+        out is not None
+        and dpctl.utils.get_execution_queue((exec_q, out.sycl_queue)) is None
+    ):
+        raise ExecutionPlacementError(
+            "Input and output allocation queues are not compatible"
+        )
 
     # Determine the appropriate data types
     dot_dtype, res_dtype = _compute_res_dtype(a, b, sycl_queue=exec_q)
@@ -1812,6 +1835,12 @@ def dpnp_einsum(
             arrays.append(a)
 
     res_usm_type, exec_q = get_usm_allocations(arrays)
+    if out is not None:
+        dpnp.check_supported_arrays_type(out)
+        if dpctl.utils.get_execution_queue((exec_q, out.sycl_queue)) is None:
+            raise ExecutionPlacementError(
+                "Input and output allocation queues are not compatible"
+            )
     result_dtype = dpnp.result_type(*arrays) if dtype is None else dtype
     for id, a in enumerate(operands):
         if dpnp.isscalar(a):
@@ -2056,10 +2085,17 @@ def dpnp_matmul(
 
     """
 
-    x1_ndim = x1.ndim
-    x2_ndim = x2.ndim
+    dpnp.check_supported_arrays_type(x1, x2)
     res_usm_type, exec_q = get_usm_allocations([x1, x2])
+    if out is not None:
+        dpnp.check_supported_arrays_type(out)
+        if dpctl.utils.get_execution_queue((exec_q, out.sycl_queue)) is None:
+            raise ExecutionPlacementError(
+                "Input and output allocation queues are not compatible"
+            )
 
+    x1_ndim = x1.ndim
+    x2_ndim = x2.ndim
     if axes is not None:
         axes = _validate_axes(x1, x2, axes)
 
@@ -2072,7 +2108,6 @@ def dpnp_matmul(
         x2 = dpnp.moveaxis(x2, axes_x2, (-2, -1)) if x2_ndim != 1 else x2
         out_orig = out
         if out is not None:
-            dpnp.check_supported_arrays_type(out)
             # out that is passed to the backend should have the correct shape
             if len(axes_res) == 2:
                 out = dpnp.moveaxis(out, axes_res, (-2, -1))
@@ -2161,8 +2196,18 @@ def dpnp_matmul(
             res = dpnp_dot(x1, x2, out=out)
         res_shape = res.shape
     else:
+        x1_contig_flag, _, x1_f = _define_contig_flag(x1)
+        x2_contig_flag, _, x2_f = _define_contig_flag(x2)
+        res_order = "F" if (x1_f and x2_f and call_flag == "gemm") else "C"
         res = _create_result_array(
-            x1, x2, out, res_shape, compute_dtype, res_usm_type, exec_q
+            x1,
+            x2,
+            out,
+            res_shape,
+            compute_dtype,
+            res_usm_type,
+            exec_q,
+            res_order,
         )
 
         # calculate result
@@ -2175,21 +2220,21 @@ def dpnp_matmul(
             # their base (last 2-dimensions) to be c-contiguous or f-contiguous
             dep_events_list = []
             host_tasks_list = []
-            contig_flag = _define_contig_flag(x1)
             x1 = _copy_array(
                 x1,
                 dep_events_list,
                 host_tasks_list,
-                copy_flag=not contig_flag,
+                copy_flag=not x1_contig_flag,
                 dtype=compute_dtype,
+                order=res_order,
             )
-            contig_flag = _define_contig_flag(x2)
             x2 = _copy_array(
                 x2,
                 dep_events_list,
                 host_tasks_list,
-                copy_flag=not contig_flag,
+                copy_flag=not x2_contig_flag,
                 dtype=compute_dtype,
+                order=res_order,
             )
 
             if call_flag == "gemv":
diff --git a/tests/test_linalg.py b/tests/test_linalg.py
index ec2a085d4d34..48a4891034c1 100644
--- a/tests/test_linalg.py
+++ b/tests/test_linalg.py
@@ -613,12 +613,28 @@ def test_einsum_trivial_cases(self):
         expected = numpy.einsum("i,i,i", b_np, b_np, b_np, optimize="greedy")
         assert_dtype_allclose(result, expected)
 
+    def test_einsum_out(self):
+        a = inp.ones((5, 5))
+        a_np = a.asnumpy()
+        out = inp.empty((5,))
+        out_np = out.asnumpy()
+        result = inp.einsum("ii->i", a, out=out)
+        assert result is out
+        expected = numpy.einsum("ii->i", a_np, out=out_np)
+        assert_dtype_allclose(result, expected)
+
     def test_einsum_error(self):
         a = inp.ones((5, 5))
         # unknown keyword argument
         with pytest.raises(TypeError):
             inp.einsum("ii->i", a, copy=False)
 
+        a = inp.ones((5, 5))
+        out = inp.empty((5,), sycl_queue=dpctl.SyclQueue())
+        # inconsistent sycl_queue
+        with pytest.raises(ExecutionPlacementError):
+            inp.einsum("ii->i", a, out=out)
+
         # unknown value for optimize keyword
         with pytest.raises(TypeError):
             inp.einsum("ii->i", a, optimize="average")
diff --git a/tests/test_mathematical.py b/tests/test_mathematical.py
index 4a1ee63c8fc9..6cf52e91deb0 100644
--- a/tests/test_mathematical.py
+++ b/tests/test_mathematical.py
@@ -2,6 +2,7 @@
 import dpctl.tensor as dpt
 import numpy
 import pytest
+from dpctl.utils import ExecutionPlacementError
 from numpy.testing import (
     assert_allclose,
     assert_almost_equal,
@@ -2975,20 +2976,99 @@ def test_matmul_strided_vec_mat(self, shape, incx, incy, transpose):
         assert result is out
         assert_dtype_allclose(result, expected)
 
+    @pytest.mark.parametrize(
+        "order1, order2, out_order",
+        [
+            ("C", "C", "C"),
+            ("C", "C", "F"),
+            ("C", "F", "C"),
+            ("C", "F", "F"),
+            ("F", "C", "C"),
+            ("F", "C", "F"),
+            ("F", "F", "F"),
+            ("F", "F", "C"),
+        ],
+    )
     @pytest.mark.parametrize(
         "dtype", get_all_dtypes(no_none=True, no_bool=True)
     )
-    def test_matmul_out(self, dtype):
-        a1 = numpy.arange(5 * 4, dtype=dtype).reshape(5, 4)
-        a2 = numpy.arange(7 * 4, dtype=dtype).reshape(4, 7)
+    def test_matmul_out1(self, order1, order2, out_order, dtype):
+        # test gemm with out keyword
+        a1 = numpy.arange(20, dtype=dtype).reshape(5, 4, order=order1)
+        a2 = numpy.arange(28, dtype=dtype).reshape(4, 7, order=order2)
 
         b1 = dpnp.asarray(a1)
         b2 = dpnp.asarray(a2)
 
-        dpnp_out = dpnp.empty((5, 7), dtype=dtype)
+        dpnp_out = dpnp.empty((5, 7), dtype=dtype, order=out_order)
         result = dpnp.matmul(b1, b2, out=dpnp_out)
-        expected = numpy.matmul(a1, a2)
         assert result is dpnp_out
+
+        out = numpy.empty((5, 7), dtype=dtype, order=out_order)
+        expected = numpy.matmul(a1, a2, out=out)
+        assert result.flags.c_contiguous == expected.flags.c_contiguous
+        assert result.flags.f_contiguous == expected.flags.f_contiguous
+        assert_dtype_allclose(result, expected)
+
+    @pytest.mark.parametrize("trans", [True, False])
+    @pytest.mark.parametrize(
+        "dtype", get_all_dtypes(no_none=True, no_bool=True)
+    )
+    def test_matmul_out2(self, trans, dtype):
+        # test gemm_batch with out keyword
+        # the base of input arrays is c-contiguous
+        # the base of output array is c-contiguous or f-contiguous
+        a1 = numpy.arange(24, dtype=dtype).reshape(2, 3, 4)
+        a2 = numpy.arange(40, dtype=dtype).reshape(2, 4, 5)
+        b1 = dpnp.asarray(a1)
+        b2 = dpnp.asarray(a2)
+
+        if trans:
+            dpnp_out = dpnp.empty((2, 5, 3), dtype=dtype).transpose(0, 2, 1)
+            out = numpy.empty((2, 5, 3), dtype=dtype).transpose(0, 2, 1)
+        else:
+            dpnp_out = dpnp.empty((2, 3, 5), dtype=dtype)
+            out = numpy.empty((2, 3, 5), dtype=dtype)
+
+        result = dpnp.matmul(b1, b2, out=dpnp_out)
+        assert result is dpnp_out
+
+        expected = numpy.matmul(a1, a2, out=out)
+        assert result.flags.c_contiguous == expected.flags.c_contiguous
+        assert result.flags.f_contiguous == expected.flags.f_contiguous
+        assert_dtype_allclose(result, expected)
+
+    @pytest.mark.parametrize("trans", [True, False])
+    @pytest.mark.parametrize(
+        "dtype", get_all_dtypes(no_none=True, no_bool=True)
+    )
+    def test_matmul_out3(self, trans, dtype):
+        # test gemm_batch with out keyword
+        # the base of input arrays is f-contiguous
+        # the base of output array is c-contiguous or f-contiguous
+        a1 = numpy.arange(24, dtype=dtype).reshape(2, 4, 3)
+        a2 = numpy.arange(40, dtype=dtype).reshape(2, 5, 4)
+        b1 = dpnp.asarray(a1)
+        b2 = dpnp.asarray(a2)
+
+        a1 = numpy.asarray(a1).transpose(0, 2, 1)
+        a2 = numpy.asarray(a2).transpose(0, 2, 1)
+        b1 = b1.transpose(0, 2, 1)
+        b2 = b2.transpose(0, 2, 1)
+
+        if trans:
+            dpnp_out = dpnp.empty((2, 5, 3), dtype=dtype).transpose(0, 2, 1)
+            out = numpy.empty((2, 5, 3), dtype=dtype).transpose(0, 2, 1)
+        else:
+            dpnp_out = dpnp.empty((2, 3, 5), dtype=dtype)
+            out = numpy.empty((2, 3, 5), dtype=dtype)
+
+        result = dpnp.matmul(b1, b2, out=dpnp_out)
+        assert result is dpnp_out
+
+        expected = numpy.matmul(a1, a2, out=out)
+        assert result.flags.c_contiguous == expected.flags.c_contiguous
+        assert result.flags.f_contiguous == expected.flags.f_contiguous
         assert_dtype_allclose(result, expected)
 
     @pytest.mark.parametrize(
@@ -3000,6 +3080,9 @@ def test_matmul_out(self, dtype):
         ],
     )
     def test_matmul_out_0D(self, out_shape):
+        # for matmul of 0-D arrays with out keyword,
+        # NumPy repeats the data to match the shape
+        # of output array
         a = numpy.arange(3)
         b = dpnp.asarray(a)
 
@@ -3107,10 +3190,15 @@ def test_invalid_dtype(self, dtype):
     def test_exe_q(self):
         x1 = dpnp.ones((5, 4), sycl_queue=dpctl.SyclQueue())
         x2 = dpnp.ones((4, 7), sycl_queue=dpctl.SyclQueue())
-
         with pytest.raises(ValueError):
             dpnp.matmul(x1, x2)
 
+        x1 = dpnp.ones((5, 4))
+        x2 = dpnp.ones((4, 7))
+        out = dpnp.empty((5, 7), sycl_queue=dpctl.SyclQueue())
+        with pytest.raises(ExecutionPlacementError):
+            dpnp.matmul(x1, x2, out=out)
+
     def test_matmul_casting(self):
         a1 = dpnp.arange(2 * 4, dtype=dpnp.float32).reshape(2, 4)
         a2 = dpnp.arange(4 * 3).reshape(4, 3)
diff --git a/tests/test_product.py b/tests/test_product.py
index d9463a1546cc..a15e82f6d906 100644
--- a/tests/test_product.py
+++ b/tests/test_product.py
@@ -1,6 +1,7 @@
 import dpctl
 import numpy
 import pytest
+from dpctl.utils import ExecutionPlacementError
 from numpy.testing import assert_raises
 
 import dpnp
@@ -473,6 +474,12 @@ def test_dot_sycl_queue_error(self):
         with pytest.raises(ValueError):
             dpnp.dot(a, b)
 
+        a = dpnp.ones((5,))
+        b = dpnp.ones((5,))
+        out = dpnp.empty((), sycl_queue=dpctl.SyclQueue())
+        with pytest.raises(ExecutionPlacementError):
+            dpnp.dot(a, b, out=out)
+
     @pytest.mark.parametrize("ia", [1, dpnp.ones((), dtype=dpnp.float32)])
     def test_dot_out_error_scalar(self, ia):
         a = ia if dpnp.isscalar(ia) else ia.asnumpy()
@@ -487,6 +494,7 @@ def test_dot_out_error_scalar(self, ia):
 
         # output shape is incorrect
         dp_out = dpnp.empty((2,), dtype=dpnp.int32)
+        out = numpy.empty((2,), dtype=numpy.int32)
         assert_raises(ValueError, dpnp.dot, ia, ib, out=dp_out)
         assert_raises(ValueError, numpy.dot, a, b, out=out)
 

From de71047cc6b5f29363c1bda2caa14af387e180aa Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Sat, 15 Jun 2024 13:28:11 +0200
Subject: [PATCH 23/49] Update docstrings for ufuncs (#1881)

---
 dpnp/dpnp_iface_bitwise.py       |  24 +++----
 dpnp/dpnp_iface_logic.py         |  53 ++++++++--------
 dpnp/dpnp_iface_mathematical.py  | 100 ++++++++++++++---------------
 dpnp/dpnp_iface_trigonometric.py | 106 +++++++++++++++----------------
 4 files changed, 142 insertions(+), 141 deletions(-)

diff --git a/dpnp/dpnp_iface_bitwise.py b/dpnp/dpnp_iface_bitwise.py
index 91560732d3f9..21ee7cc3d827 100644
--- a/dpnp/dpnp_iface_bitwise.py
+++ b/dpnp/dpnp_iface_bitwise.py
@@ -70,12 +70,12 @@
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have integer or boolean data
     type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -137,12 +137,12 @@
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have integer or boolean data
     type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -199,12 +199,12 @@
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have integer or boolean data
     type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -261,12 +261,12 @@
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have integer or boolean data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -331,12 +331,12 @@
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have integer data type.
     Each element must be greater than or equal to 0.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 Returns
 -------
 out : dpnp.ndarray
@@ -389,12 +389,12 @@
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have integer data type.
     Each element must be greater than or equal to 0.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
diff --git a/dpnp/dpnp_iface_logic.py b/dpnp/dpnp_iface_logic.py
index dad2dd78039d..d780cf578bf8 100644
--- a/dpnp/dpnp_iface_logic.py
+++ b/dpnp/dpnp_iface_logic.py
@@ -317,12 +317,12 @@ def any(x, /, axis=None, out=None, keepdims=False, *, where=True):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -386,12 +386,12 @@ def any(x, /, axis=None, out=None, keepdims=False, *, where=True):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -449,12 +449,12 @@ def any(x, /, axis=None, out=None, keepdims=False, *, where=True):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -553,12 +553,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -612,12 +612,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -665,12 +665,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -722,12 +722,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -739,6 +739,7 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 -----------
 Parameters `where` and `subok` are supported with their default values.
 Otherwise ``NotImplementedError`` exception will be raised.
+
 See Also
 --------
 :obj:`dpnp.greater` : Return the truth value of (x1 > x2) element-wise.
@@ -784,12 +785,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -847,12 +848,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
     First input array.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -909,12 +910,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -964,12 +965,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
     First input array.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1029,12 +1030,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
     First input array.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1092,12 +1093,12 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
diff --git a/dpnp/dpnp_iface_mathematical.py b/dpnp/dpnp_iface_mathematical.py
index b0d0c7b61237..fb3496709df0 100644
--- a/dpnp/dpnp_iface_mathematical.py
+++ b/dpnp/dpnp_iface_mathematical.py
@@ -340,12 +340,12 @@ def _gradient_num_diff_edges(
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -408,12 +408,12 @@ def _gradient_num_diff_edges(
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -477,12 +477,12 @@ def _gradient_num_diff_edges(
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have a complex-valued floating-point data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -534,11 +534,10 @@ def around(x, /, decimals=0, out=None):
         Number of decimal places to round to (default: 0). If decimals is
         negative, it specifies the number of positions to the left of the
         decimal point.
-    out : {None, dpnp.ndarray}, optional
+    out : {None, dpnp.ndarray, usm_ndarray}, optional
         Output array to populate.
         Array must have the correct shape and the expected data type.
 
-
     Returns
     -------
     out : dpnp.ndarray
@@ -556,6 +555,7 @@ def around(x, /, decimals=0, out=None):
     Notes
     -----
     This function works the same as :obj:`dpnp.round`.
+
     """
 
     return round(x, decimals, out)
@@ -570,12 +570,12 @@ def around(x, /, decimals=0, out=None):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have a real-valued data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -631,7 +631,7 @@ def clip(a, a_min, a_max, *, out=None, order="K", **kwargs):
         output. Its type is preserved.
     order : {"C", "F", "A", "K", None}, optional
         Memory layout of the newly output array, if parameter `out` is `None`.
-        If `order` is ``None``, the default value "K" will be used.
+        If `order` is ``None``, the default value ``"K"`` will be used.
 
     Returns
     -------
@@ -696,12 +696,12 @@ def clip(a, a_min, a_max, *, out=None, order="K", **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -772,7 +772,7 @@ def convolve(a, v, mode="full"):
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1240,12 +1240,12 @@ def diff(a, n=1, axis=-1, prepend=None, append=None):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1393,12 +1393,12 @@ def fabs(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have a real-valued data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1452,12 +1452,12 @@ def fabs(x1, **kwargs):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2056,12 +2056,12 @@ def gradient(f, *varargs, axis=None, edge_order=1):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2113,12 +2113,12 @@ def gradient(f, *varargs, axis=None, edge_order=1):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2185,12 +2185,12 @@ def gradient(f, *varargs, axis=None, edge_order=1):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2344,12 +2344,12 @@ def modf(x1, **kwargs):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2410,12 +2410,12 @@ def modf(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2465,12 +2465,12 @@ def modf(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2527,12 +2527,12 @@ def modf(x1, **kwargs):
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate. Array must have the correct
     shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2553,7 +2553,6 @@ def modf(x1, **kwargs):
 :obj:`dpnp.fmin` : Element-wise minimum of array elements.
 :obj:`dpnp.fmod` : Calculate the element-wise remainder of division.
 
-
 Examples
 --------
 >>> import dpnp as dp
@@ -2708,12 +2707,12 @@ def prod(
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2791,12 +2790,12 @@ def prod(
     First input array, expected to have a real-valued data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have a real-valued data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2806,6 +2805,7 @@ def prod(
     array is determined by the Type Promotion Rules.
 
 Limitations
+-----------
 Parameters `where` and `subok` are supported with their default values.
 Keyword argument `kwargs` is currently unsupported.
 Otherwise ``NotImplementedError`` exception will be raised.
@@ -2857,12 +2857,12 @@ def prod(
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -2916,7 +2916,7 @@ def prod(
 decimals : int, optional
     Number of decimal places to round to (default: 0). If decimals is negative,
     it specifies the number of positions to the left of the decimal point.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 
@@ -2972,12 +2972,12 @@ def prod(
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -3026,12 +3026,12 @@ def prod(
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -3079,12 +3079,12 @@ def prod(
     First input array, expected to have numeric data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -3362,12 +3362,12 @@ def trapz(y1, x1=None, dx=1.0, axis=-1):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have a real-valued data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
diff --git a/dpnp/dpnp_iface_trigonometric.py b/dpnp/dpnp_iface_trigonometric.py
index 64c110190bf8..d38af96ea2cf 100644
--- a/dpnp/dpnp_iface_trigonometric.py
+++ b/dpnp/dpnp_iface_trigonometric.py
@@ -118,12 +118,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -172,12 +172,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -226,12 +226,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -280,12 +280,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type..
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -334,12 +334,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type..
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -396,12 +396,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have a real-valued
     floating-point data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -463,12 +463,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -517,12 +517,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have a real-valued data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -568,12 +568,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -621,12 +621,12 @@ def _get_accumulation_res_dt(a, dtype, _out):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -817,12 +817,12 @@ def degrees(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -869,12 +869,12 @@ def degrees(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have a floating-point data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -924,12 +924,12 @@ def degrees(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -986,12 +986,12 @@ def degrees(x1, **kwargs):
     First input array, expected to have a real-valued data type.
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have a real-valued data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1046,12 +1046,12 @@ def degrees(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1100,12 +1100,12 @@ def degrees(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1159,12 +1159,12 @@ def degrees(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1218,12 +1218,12 @@ def degrees(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1284,12 +1284,12 @@ def degrees(x1, **kwargs):
 x2 : {dpnp.ndarray, usm_ndarray}
     Second input array, also expected to have a real-valued
     floating-point data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1423,12 +1423,12 @@ def logsumexp(x, /, *, axis=None, dtype=None, keepdims=False, out=None):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have a real-valued floating-point data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1558,7 +1558,7 @@ def reduce_hypot(x, /, *, axis=None, dtype=None, keepdims=False, out=None):
     Array must have the correct shape and the expected data type.
 order : ({'C', 'F', 'A', 'K'}, optional):
     Memory layout of the newly output array, if parameter `out` is `None`.
-    Default: "K"
+    Default: ``"K"``
 
 Returns
 -------
@@ -1657,12 +1657,12 @@ def radians(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1710,12 +1710,12 @@ def radians(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1762,12 +1762,12 @@ def radians(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1817,12 +1817,12 @@ def radians(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1871,12 +1871,12 @@ def radians(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------
@@ -1924,12 +1924,12 @@ def radians(x1, **kwargs):
 ----------
 x : {dpnp.ndarray, usm_ndarray}
     Input array, expected to have numeric data type.
-out : {None, dpnp.ndarray}, optional
+out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
-    Default: "K".
+    Default: ``"K"``.
 
 Returns
 -------

From 79eded160dc3abab3c983943d0c770127ff78e39 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Sat, 15 Jun 2024 15:16:01 +0200
Subject: [PATCH 24/49] mod is an alias of remainder (#1882)

---
 dpnp/dpnp_iface_mathematical.py | 65 ++++-----------------------------
 1 file changed, 8 insertions(+), 57 deletions(-)

diff --git a/dpnp/dpnp_iface_mathematical.py b/dpnp/dpnp_iface_mathematical.py
index fb3496709df0..dc27384a9173 100644
--- a/dpnp/dpnp_iface_mathematical.py
+++ b/dpnp/dpnp_iface_mathematical.py
@@ -2245,63 +2245,6 @@ def gradient(f, *varargs, axis=None, edge_order=1):
 )
 
 
-def mod(
-    x1,
-    x2,
-    /,
-    out=None,
-    *,
-    where=True,
-    order="K",
-    dtype=None,
-    subok=True,
-    **kwargs,
-):
-    """
-    Compute element-wise remainder of division.
-
-    For full documentation refer to :obj:`numpy.mod`.
-
-    Returns
-    -------
-    out : dpnp.ndarray
-        The element-wise remainder of the quotient `floor_divide(x1, x2)`.
-
-    Limitations
-    -----------
-    Parameters `x1` and `x2` are supported as either scalar,
-    :class:`dpnp.ndarray` or :class:`dpctl.tensor.usm_ndarray`, but both `x1`
-    and `x2` can not be scalars at the same time.
-    Parameters `where`, `dtype` and `subok` are supported with their default
-    values.
-    Keyword argument `kwargs` is currently unsupported.
-    Otherwise the function will be executed sequentially on CPU.
-    Input array data types are limited by supported DPNP :ref:`Data types`.
-
-    See Also
-    --------
-    :obj:`dpnp.fmod` : Calculate the element-wise remainder of division
-    :obj:`dpnp.remainder` : Remainder complementary to floor_divide.
-    :obj:`dpnp.divide` : Standard division.
-
-    Notes
-    -----
-    This function works the same as :obj:`dpnp.remainder`.
-
-    """
-
-    return dpnp.remainder(
-        x1,
-        x2,
-        out=out,
-        where=where,
-        order=order,
-        dtype=dtype,
-        subok=subok,
-        **kwargs,
-    )
-
-
 def modf(x1, **kwargs):
     """
     Return the fractional and integral parts of an array, element-wise.
@@ -2818,6 +2761,12 @@ def prod(
 :obj:`dpnp.floor_divide` : Compute the largest integer smaller or equal to the division of the inputs.
 :obj:`dpnp.mod` : Calculate the element-wise remainder of division.
 
+Notes
+-----
+Returns ``0`` when `x2` is ``0`` and both `x1` and `x2` are (arrays of)
+integers.
+:obj:`dpnp.mod` is an alias of :obj:`dpnp.remainder`.
+
 Examples
 --------
 >>> import dpnp as np
@@ -2843,6 +2792,8 @@ def prod(
     binary_inplace_fn=ti._remainder_inplace,
 )
 
+mod = remainder
+
 
 _RINT_DOCSTRING = """
 Rounds each element `x_i` of the input array `x` to

From 2d50ce1b949c97dde002510c2342e7febe15350c Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Sat, 15 Jun 2024 19:04:59 +0200
Subject: [PATCH 25/49] Rework implementation of `dpnp.fabs` function (#1878)

* Preparation to reuse common dpctl f/w for VM functions

* PoC to decouple abs implementation to separate source file

* Reuse typedef for function poiter from dpctl.tensor

* Define populating vectors by a separate macro

* Move implementation of utility functions from headers to source to resolve link issues

* Separated implementation of acos function

* Separated implementation of acosh function

* Use function to simplify strides from dpctl tensor headers

* PoC to decouple add implementation to separate source file

* Separated implementation of asin function

* Separated implementation of asinh function

* Separated implementation of atan, atan2, atanh functions

* Resolve issue with calling MKL function for undefined types

* Separated implementation of cbrt, ceil, conj, cos and cosh functions

* Separated implementation of div, exp, exp2, expm1, floor and hypot functions

* Separated implementation of ln, log1p, log2 and log10 functions

* Separated implementation of mul, pow, rint, sin and sinh functions

* Separated implementation of sqr, sqrt, sub, tan, tanh and trunc functions

* Removed unused header with types matrix

* Remove unused functions

* Use passing by reference in unary and binary funcs

* Implement dpnp.fabs function

* Create an instance of DPNPUnaryFunc for fabs

* Enable and add relating tests

* Decouple populate logic to a macro

* Resolve compilation failure on Win

* Update dpnp/dpnp_iface_mathematical.py

Co-authored-by: vtavana <120411540+vtavana@users.noreply.github.com>

---------

Co-authored-by: vtavana <120411540+vtavana@users.noreply.github.com>
---
 dpnp/CMakeLists.txt                           |   1 +
 dpnp/backend/extensions/ufunc/CMakeLists.txt  |  79 +++++++++++
 .../ufunc/elementwise_functions/common.cpp    |  41 ++++++
 .../ufunc/elementwise_functions/common.hpp    |  35 +++++
 .../ufunc/elementwise_functions/fabs.cpp      | 128 ++++++++++++++++++
 .../ufunc/elementwise_functions/fabs.hpp      |  35 +++++
 .../ufunc/elementwise_functions/populate.hpp  | 122 +++++++++++++++++
 dpnp/backend/extensions/ufunc/ufunc_py.cpp    |  36 +++++
 dpnp/backend/extensions/vm/add.cpp            |   6 +-
 dpnp/backend/extensions/vm/atan2.cpp          |   6 +-
 dpnp/backend/extensions/vm/div.cpp            |   6 +-
 dpnp/backend/extensions/vm/hypot.cpp          |   6 +-
 dpnp/backend/extensions/vm/mul.cpp            |   6 +-
 dpnp/backend/extensions/vm/pow.cpp            |   6 +-
 dpnp/backend/extensions/vm/sub.cpp            |   6 +-
 dpnp/backend/include/dpnp_iface_fptr.hpp      |  28 ++--
 dpnp/backend/kernels/dpnp_krnl_elemwise.cpp   |   9 --
 .../kernels/elementwise_functions/fabs.hpp    |  49 +++++++
 dpnp/dpnp_algo/dpnp_algo.pxd                  |   1 -
 dpnp/dpnp_algo/dpnp_algo_mathematical.pxi     |   5 -
 dpnp/dpnp_iface_mathematical.py               |  69 ++++++----
 tests/skipped_tests.tbl                       |  92 -------------
 tests/skipped_tests_gpu.tbl                   |  90 ------------
 tests/skipped_tests_gpu_no_fp64.tbl           |   7 -
 tests/test_usm_type.py                        |   1 +
 .../third_party/cupy/math_tests/test_misc.py  |  10 +-
 26 files changed, 610 insertions(+), 270 deletions(-)
 create mode 100644 dpnp/backend/extensions/ufunc/CMakeLists.txt
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/common.hpp
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/fabs.cpp
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/fabs.hpp
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/populate.hpp
 create mode 100644 dpnp/backend/extensions/ufunc/ufunc_py.cpp
 create mode 100644 dpnp/backend/kernels/elementwise_functions/fabs.hpp

diff --git a/dpnp/CMakeLists.txt b/dpnp/CMakeLists.txt
index 9c79d5af385e..d9c95b62c0be 100644
--- a/dpnp/CMakeLists.txt
+++ b/dpnp/CMakeLists.txt
@@ -60,6 +60,7 @@ add_subdirectory(backend/extensions/blas)
 add_subdirectory(backend/extensions/lapack)
 add_subdirectory(backend/extensions/vm)
 add_subdirectory(backend/extensions/sycl_ext)
+add_subdirectory(backend/extensions/ufunc)
 
 add_subdirectory(dpnp_algo)
 add_subdirectory(dpnp_utils)
diff --git a/dpnp/backend/extensions/ufunc/CMakeLists.txt b/dpnp/backend/extensions/ufunc/CMakeLists.txt
new file mode 100644
index 000000000000..7f9a240271b1
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/CMakeLists.txt
@@ -0,0 +1,79 @@
+# *****************************************************************************
+# Copyright (c) 2024, Intel Corporation
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+# - Redistributions of source code must retain the above copyright notice,
+#   this list of conditions and the following disclaimer.
+# - Redistributions in binary form must reproduce the above copyright notice,
+#   this list of conditions and the following disclaimer in the documentation
+#   and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+# THE POSSIBILITY OF SUCH DAMAGE.
+# *****************************************************************************
+
+set(_elementwise_sources
+    ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/common.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/fabs.cpp
+)
+
+set(python_module_name _ufunc_impl)
+
+set(_module_src
+    # TODO: remove sources from `elementwise_functions` folder
+    ${CMAKE_CURRENT_SOURCE_DIR}/../elementwise_functions/elementwise_functions_type_utils.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/../elementwise_functions/simplify_iteration_space.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/ufunc_py.cpp
+    ${_elementwise_sources}
+)
+
+pybind11_add_module(${python_module_name} MODULE ${_module_src})
+add_sycl_to_target(TARGET ${python_module_name} SOURCES ${_module_src})
+
+if (WIN32)
+    if (${CMAKE_VERSION} VERSION_LESS "3.27")
+        # this is a work-around for target_link_options inserting option after -link option, cause
+        # linker to ignore it.
+        set(CMAKE_CXX_LINK_FLAGS "${CMAKE_CXX_LINK_FLAGS} -fsycl-device-code-split=per_kernel")
+    endif()
+endif()
+
+set_target_properties(${python_module_name} PROPERTIES CMAKE_POSITION_INDEPENDENT_CODE ON)
+
+target_include_directories(${python_module_name} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/../../)
+
+target_include_directories(${python_module_name} PUBLIC ${Dpctl_INCLUDE_DIR})
+target_include_directories(${python_module_name} PUBLIC ${Dpctl_TENSOR_INCLUDE_DIR})
+
+if (WIN32)
+  target_compile_options(${python_module_name} PRIVATE
+    /clang:-fno-approx-func
+    /clang:-fno-finite-math-only
+    )
+else()
+  target_compile_options(${python_module_name} PRIVATE
+    -fno-approx-func
+    -fno-finite-math-only
+    )
+endif()
+
+target_link_options(${python_module_name} PUBLIC -fsycl-device-code-split=per_kernel)
+
+if (DPNP_GENERATE_COVERAGE)
+    target_link_options(${python_module_name} PRIVATE -fprofile-instr-generate -fcoverage-mapping)
+endif()
+
+install(TARGETS ${python_module_name}
+  DESTINATION "dpnp/backend/extensions/ufunc"
+)
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp b/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
new file mode 100644
index 000000000000..44173fc764fe
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
@@ -0,0 +1,41 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <pybind11/pybind11.h>
+
+#include "fabs.hpp"
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+/**
+ * @brief Add elementwise functions to Python module
+ */
+void init_elementwise_functions(py::module_ m)
+{
+    init_fabs(m);
+}
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/common.hpp b/dpnp/backend/extensions/ufunc/elementwise_functions/common.hpp
new file mode 100644
index 000000000000..345ff14308e6
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/common.hpp
@@ -0,0 +1,35 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <pybind11/pybind11.h>
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+void init_elementwise_functions(py::module_);
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/fabs.cpp b/dpnp/backend/extensions/ufunc/elementwise_functions/fabs.cpp
new file mode 100644
index 000000000000..7588e1334732
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/fabs.cpp
@@ -0,0 +1,128 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "fabs.hpp"
+#include "kernels/elementwise_functions/fabs.hpp"
+#include "populate.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+namespace impl
+{
+/**
+ * @brief A factory to define pairs of supported types for which
+ * sycl::fabs<T> function is available.
+ *
+ * @tparam T Type of input vector `a` and of result vector `y`.
+ */
+template <typename T>
+struct OutputType
+{
+    using value_type =
+        typename std::disjunction<td_ns::TypeMapResultEntry<T, sycl::half>,
+                                  td_ns::TypeMapResultEntry<T, float>,
+                                  td_ns::TypeMapResultEntry<T, double>,
+                                  td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+using dpnp::kernels::fabs::FabsFunctor;
+
+template <typename argT,
+          typename resT = argT,
+          unsigned int vec_sz = 4,
+          unsigned int n_vecs = 2,
+          bool enable_sg_loadstore = true>
+using ContigFunctor = ew_cmn_ns::UnaryContigFunctor<argT,
+                                                    resT,
+                                                    FabsFunctor<argT, resT>,
+                                                    vec_sz,
+                                                    n_vecs,
+                                                    enable_sg_loadstore>;
+
+template <typename argTy, typename resTy, typename IndexerT>
+using StridedFunctor = ew_cmn_ns::
+    UnaryStridedFunctor<argTy, resTy, IndexerT, FabsFunctor<argTy, resTy>>;
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+static unary_contig_impl_fn_ptr_t fabs_contig_dispatch_vector[td_ns::num_types];
+static int fabs_output_typeid_vector[td_ns::num_types];
+static unary_strided_impl_fn_ptr_t
+    fabs_strided_dispatch_vector[td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_VECTORS(fabs);
+} // namespace impl
+
+void init_fabs(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+    {
+        impl::populate_fabs_dispatch_vectors();
+        using impl::fabs_contig_dispatch_vector;
+        using impl::fabs_output_typeid_vector;
+        using impl::fabs_strided_dispatch_vector;
+
+        auto fabs_pyapi = [&](const arrayT &src, const arrayT &dst,
+                              sycl::queue &exec_q,
+                              const event_vecT &depends = {}) {
+            return py_int::py_unary_ufunc(
+                src, dst, exec_q, depends, fabs_output_typeid_vector,
+                fabs_contig_dispatch_vector, fabs_strided_dispatch_vector);
+        };
+        m.def("_fabs", fabs_pyapi, "", py::arg("src"), py::arg("dst"),
+              py::arg("sycl_queue"), py::arg("depends") = py::list());
+
+        auto fabs_result_type_pyapi = [&](const py::dtype &dtype) {
+            return py_int::py_unary_ufunc_result_type(
+                dtype, fabs_output_typeid_vector);
+        };
+        m.def("_fabs_result_type", fabs_result_type_pyapi);
+    }
+}
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/fabs.hpp b/dpnp/backend/extensions/ufunc/elementwise_functions/fabs.hpp
new file mode 100644
index 000000000000..f4a070747ac2
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/fabs.hpp
@@ -0,0 +1,35 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <pybind11/pybind11.h>
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+void init_fabs(py::module_ m);
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/populate.hpp b/dpnp/backend/extensions/ufunc/elementwise_functions/populate.hpp
new file mode 100644
index 000000000000..6261fcc08eb6
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/populate.hpp
@@ -0,0 +1,122 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+/**
+ * @brief A macro used to define factories and a populating universal functions.
+ */
+#define MACRO_POPULATE_DISPATCH_VECTORS(__name__)                              \
+    template <typename T1, typename T2, unsigned int vec_sz,                   \
+              unsigned int n_vecs>                                             \
+    class __name__##_contig_kernel;                                            \
+                                                                               \
+    template <typename argTy>                                                  \
+    sycl::event __name__##_contig_impl(                                        \
+        sycl::queue &exec_q, size_t nelems, const char *arg_p, char *res_p,    \
+        const std::vector<sycl::event> &depends = {})                          \
+    {                                                                          \
+        return ew_cmn_ns::unary_contig_impl<argTy, OutputType, ContigFunctor,  \
+                                            __name__##_contig_kernel>(         \
+            exec_q, nelems, arg_p, res_p, depends);                            \
+    }                                                                          \
+                                                                               \
+    template <typename fnT, typename T>                                        \
+    struct ContigFactory                                                       \
+    {                                                                          \
+        fnT get()                                                              \
+        {                                                                      \
+            if constexpr (std::is_same_v<typename OutputType<T>::value_type,   \
+                                         void>) {                              \
+                fnT fn = nullptr;                                              \
+                return fn;                                                     \
+            }                                                                  \
+            else {                                                             \
+                fnT fn = __name__##_contig_impl<T>;                            \
+                return fn;                                                     \
+            }                                                                  \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    template <typename fnT, typename T>                                        \
+    struct TypeMapFactory                                                      \
+    {                                                                          \
+        std::enable_if_t<std::is_same<fnT, int>::value, int> get()             \
+        {                                                                      \
+            using rT = typename OutputType<T>::value_type;                     \
+            return td_ns::GetTypeid<rT>{}.get();                               \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    template <typename T1, typename T2, typename T3>                           \
+    class __name__##_strided_kernel;                                           \
+                                                                               \
+    template <typename argTy>                                                  \
+    sycl::event __name__##_strided_impl(                                       \
+        sycl::queue &exec_q, size_t nelems, int nd,                            \
+        const py::ssize_t *shape_and_strides, const char *arg_p,               \
+        py::ssize_t arg_offset, char *res_p, py::ssize_t res_offset,           \
+        const std::vector<sycl::event> &depends,                               \
+        const std::vector<sycl::event> &additional_depends)                    \
+    {                                                                          \
+        return ew_cmn_ns::unary_strided_impl<                                  \
+            argTy, OutputType, StridedFunctor, __name__##_strided_kernel>(     \
+            exec_q, nelems, nd, shape_and_strides, arg_p, arg_offset, res_p,   \
+            res_offset, depends, additional_depends);                          \
+    }                                                                          \
+                                                                               \
+    template <typename fnT, typename T>                                        \
+    struct StridedFactory                                                      \
+    {                                                                          \
+        fnT get()                                                              \
+        {                                                                      \
+            if constexpr (std::is_same_v<typename OutputType<T>::value_type,   \
+                                         void>) {                              \
+                fnT fn = nullptr;                                              \
+                return fn;                                                     \
+            }                                                                  \
+            else {                                                             \
+                fnT fn = __name__##_strided_impl<T>;                           \
+                return fn;                                                     \
+            }                                                                  \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    void populate_##__name__##_dispatch_vectors(void)                          \
+    {                                                                          \
+        td_ns::DispatchVectorBuilder<unary_contig_impl_fn_ptr_t,               \
+                                     ContigFactory, td_ns::num_types>          \
+            dvb1;                                                              \
+        dvb1.populate_dispatch_vector(__name__##_contig_dispatch_vector);      \
+                                                                               \
+        td_ns::DispatchVectorBuilder<unary_strided_impl_fn_ptr_t,              \
+                                     StridedFactory, td_ns::num_types>         \
+            dvb2;                                                              \
+        dvb2.populate_dispatch_vector(__name__##_strided_dispatch_vector);     \
+                                                                               \
+        td_ns::DispatchVectorBuilder<int, TypeMapFactory, td_ns::num_types>    \
+            dvb3;                                                              \
+        dvb3.populate_dispatch_vector(__name__##_output_typeid_vector);        \
+    };
diff --git a/dpnp/backend/extensions/ufunc/ufunc_py.cpp b/dpnp/backend/extensions/ufunc/ufunc_py.cpp
new file mode 100644
index 000000000000..3618bce2cec5
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/ufunc_py.cpp
@@ -0,0 +1,36 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <pybind11/pybind11.h>
+
+#include "elementwise_functions/common.hpp"
+
+namespace py = pybind11;
+namespace ufunc_ns = dpnp::extensions::ufunc;
+
+PYBIND11_MODULE(_ufunc_impl, m)
+{
+    ufunc_ns::init_elementwise_functions(m);
+}
diff --git a/dpnp/backend/extensions/vm/add.cpp b/dpnp/backend/extensions/vm/add.cpp
index c43f07bbcde1..c174bf73a99d 100644
--- a/dpnp/backend/extensions/vm/add.cpp
+++ b/dpnp/backend/extensions/vm/add.cpp
@@ -83,11 +83,11 @@ template <typename T1, typename T2>
 static sycl::event add_contig_impl(sycl::queue &exec_q,
                                    std::size_t in_n,
                                    const char *in_a,
-                                   ssize_t a_offset,
+                                   py::ssize_t a_offset,
                                    const char *in_b,
-                                   ssize_t b_offset,
+                                   py::ssize_t b_offset,
                                    char *out_y,
-                                   ssize_t out_offset,
+                                   py::ssize_t out_offset,
                                    const std::vector<sycl::event> &depends)
 {
     tu_ns::validate_type_for_device<T1>(exec_q);
diff --git a/dpnp/backend/extensions/vm/atan2.cpp b/dpnp/backend/extensions/vm/atan2.cpp
index 30bb59c9c422..4820a9623f0c 100644
--- a/dpnp/backend/extensions/vm/atan2.cpp
+++ b/dpnp/backend/extensions/vm/atan2.cpp
@@ -73,11 +73,11 @@ template <typename T1, typename T2>
 static sycl::event atan2_contig_impl(sycl::queue &exec_q,
                                      std::size_t in_n,
                                      const char *in_a,
-                                     ssize_t a_offset,
+                                     py::ssize_t a_offset,
                                      const char *in_b,
-                                     ssize_t b_offset,
+                                     py::ssize_t b_offset,
                                      char *out_y,
-                                     ssize_t out_offset,
+                                     py::ssize_t out_offset,
                                      const std::vector<sycl::event> &depends)
 {
     tu_ns::validate_type_for_device<T1>(exec_q);
diff --git a/dpnp/backend/extensions/vm/div.cpp b/dpnp/backend/extensions/vm/div.cpp
index 8cdb547feb4e..5fb7122a76c2 100644
--- a/dpnp/backend/extensions/vm/div.cpp
+++ b/dpnp/backend/extensions/vm/div.cpp
@@ -83,11 +83,11 @@ template <typename T1, typename T2>
 static sycl::event div_contig_impl(sycl::queue &exec_q,
                                    std::size_t in_n,
                                    const char *in_a,
-                                   ssize_t a_offset,
+                                   py::ssize_t a_offset,
                                    const char *in_b,
-                                   ssize_t b_offset,
+                                   py::ssize_t b_offset,
                                    char *out_y,
-                                   ssize_t out_offset,
+                                   py::ssize_t out_offset,
                                    const std::vector<sycl::event> &depends)
 {
     tu_ns::validate_type_for_device<T1>(exec_q);
diff --git a/dpnp/backend/extensions/vm/hypot.cpp b/dpnp/backend/extensions/vm/hypot.cpp
index 42dd81271111..50ca178c37c3 100644
--- a/dpnp/backend/extensions/vm/hypot.cpp
+++ b/dpnp/backend/extensions/vm/hypot.cpp
@@ -73,11 +73,11 @@ template <typename T1, typename T2>
 static sycl::event hypot_contig_impl(sycl::queue &exec_q,
                                      std::size_t in_n,
                                      const char *in_a,
-                                     ssize_t a_offset,
+                                     py::ssize_t a_offset,
                                      const char *in_b,
-                                     ssize_t b_offset,
+                                     py::ssize_t b_offset,
                                      char *out_y,
-                                     ssize_t out_offset,
+                                     py::ssize_t out_offset,
                                      const std::vector<sycl::event> &depends)
 {
     tu_ns::validate_type_for_device<T1>(exec_q);
diff --git a/dpnp/backend/extensions/vm/mul.cpp b/dpnp/backend/extensions/vm/mul.cpp
index 34007fbc07c2..de59d087f516 100644
--- a/dpnp/backend/extensions/vm/mul.cpp
+++ b/dpnp/backend/extensions/vm/mul.cpp
@@ -83,11 +83,11 @@ template <typename T1, typename T2>
 static sycl::event mul_contig_impl(sycl::queue &exec_q,
                                    std::size_t in_n,
                                    const char *in_a,
-                                   ssize_t a_offset,
+                                   py::ssize_t a_offset,
                                    const char *in_b,
-                                   ssize_t b_offset,
+                                   py::ssize_t b_offset,
                                    char *out_y,
-                                   ssize_t out_offset,
+                                   py::ssize_t out_offset,
                                    const std::vector<sycl::event> &depends)
 {
     tu_ns::validate_type_for_device<T1>(exec_q);
diff --git a/dpnp/backend/extensions/vm/pow.cpp b/dpnp/backend/extensions/vm/pow.cpp
index 65acd2ece44b..491b86f79469 100644
--- a/dpnp/backend/extensions/vm/pow.cpp
+++ b/dpnp/backend/extensions/vm/pow.cpp
@@ -83,11 +83,11 @@ template <typename T1, typename T2>
 static sycl::event pow_contig_impl(sycl::queue &exec_q,
                                    std::size_t in_n,
                                    const char *in_a,
-                                   ssize_t a_offset,
+                                   py::ssize_t a_offset,
                                    const char *in_b,
-                                   ssize_t b_offset,
+                                   py::ssize_t b_offset,
                                    char *out_y,
-                                   ssize_t out_offset,
+                                   py::ssize_t out_offset,
                                    const std::vector<sycl::event> &depends)
 {
     tu_ns::validate_type_for_device<T1>(exec_q);
diff --git a/dpnp/backend/extensions/vm/sub.cpp b/dpnp/backend/extensions/vm/sub.cpp
index 4ec1bdc36b50..8bfc477bfa79 100644
--- a/dpnp/backend/extensions/vm/sub.cpp
+++ b/dpnp/backend/extensions/vm/sub.cpp
@@ -83,11 +83,11 @@ template <typename T1, typename T2>
 static sycl::event sub_contig_impl(sycl::queue &exec_q,
                                    std::size_t in_n,
                                    const char *in_a,
-                                   ssize_t a_offset,
+                                   py::ssize_t a_offset,
                                    const char *in_b,
-                                   ssize_t b_offset,
+                                   py::ssize_t b_offset,
                                    char *out_y,
-                                   ssize_t out_offset,
+                                   py::ssize_t out_offset,
                                    const std::vector<sycl::event> &depends)
 {
     tu_ns::validate_type_for_device<T1>(exec_q);
diff --git a/dpnp/backend/include/dpnp_iface_fptr.hpp b/dpnp/backend/include/dpnp_iface_fptr.hpp
index d8e6f8b26e81..0f6ef51bc7ce 100644
--- a/dpnp/backend/include/dpnp_iface_fptr.hpp
+++ b/dpnp/backend/include/dpnp_iface_fptr.hpp
@@ -117,21 +117,19 @@ enum class DPNPFuncName : size_t
     DPNP_FN_DOT,           /**< Used in numpy.dot() impl  */
     DPNP_FN_DOT_EXT, /**< Used in numpy.dot() impl, requires extra parameters */
     DPNP_FN_EDIFF1D, /**< Used in numpy.ediff1d() impl  */
-    DPNP_FN_EDIFF1D_EXT, /**< Used in numpy.ediff1d() impl, requires extra
-                            parameters */
-    DPNP_FN_EIG,         /**< Used in numpy.linalg.eig() impl  */
-    DPNP_FN_EIGVALS,     /**< Used in numpy.linalg.eigvals() impl  */
-    DPNP_FN_ERF,         /**< Used in scipy.special.erf impl  */
-    DPNP_FN_ERF_EXT,     /**< Used in scipy.special.erf impl, requires extra
-                            parameters */
-    DPNP_FN_EYE,         /**< Used in numpy.eye() impl  */
-    DPNP_FN_EXP,         /**< Used in numpy.exp() impl  */
-    DPNP_FN_EXP2,        /**< Used in numpy.exp2() impl  */
-    DPNP_FN_EXPM1,       /**< Used in numpy.expm1() impl  */
-    DPNP_FN_FABS,        /**< Used in numpy.fabs() impl  */
-    DPNP_FN_FABS_EXT, /**< Used in numpy.fabs() impl, requires extra parameters
-                       */
-    DPNP_FN_FFT_FFT,  /**< Used in numpy.fft.fft() impl  */
+    DPNP_FN_EDIFF1D_EXT,   /**< Used in numpy.ediff1d() impl, requires extra
+                              parameters */
+    DPNP_FN_EIG,           /**< Used in numpy.linalg.eig() impl  */
+    DPNP_FN_EIGVALS,       /**< Used in numpy.linalg.eigvals() impl  */
+    DPNP_FN_ERF,           /**< Used in scipy.special.erf impl  */
+    DPNP_FN_ERF_EXT,       /**< Used in scipy.special.erf impl, requires extra
+                              parameters */
+    DPNP_FN_EYE,           /**< Used in numpy.eye() impl  */
+    DPNP_FN_EXP,           /**< Used in numpy.exp() impl  */
+    DPNP_FN_EXP2,          /**< Used in numpy.exp2() impl  */
+    DPNP_FN_EXPM1,         /**< Used in numpy.expm1() impl  */
+    DPNP_FN_FABS,          /**< Used in numpy.fabs() impl  */
+    DPNP_FN_FFT_FFT,       /**< Used in numpy.fft.fft() impl  */
     DPNP_FN_FFT_FFT_EXT,   /**< Used in numpy.fft.fft() impl, requires extra
                               parameters */
     DPNP_FN_FFT_RFFT,      /**< Used in numpy.fft.rfft() impl  */
diff --git a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
index a69a875fc1e7..122a3ccdedd3 100644
--- a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
@@ -462,15 +462,6 @@ static void func_map_init_elemwise_1arg_2type(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_FABS][eft_DBL][eft_DBL] = {
         eft_DBL, (void *)dpnp_fabs_c_default<double, double>};
 
-    fmap[DPNPFuncName::DPNP_FN_FABS_EXT][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_fabs_c_ext<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FABS_EXT][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_fabs_c_ext<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FABS_EXT][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_fabs_c_ext<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_FABS_EXT][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_fabs_c_ext<double, double>};
-
     fmap[DPNPFuncName::DPNP_FN_FLOOR][eft_INT][eft_INT] = {
         eft_DBL, (void *)dpnp_floor_c_default<int32_t, double>};
     fmap[DPNPFuncName::DPNP_FN_FLOOR][eft_LNG][eft_LNG] = {
diff --git a/dpnp/backend/kernels/elementwise_functions/fabs.hpp b/dpnp/backend/kernels/elementwise_functions/fabs.hpp
new file mode 100644
index 000000000000..525cfc5bfe6e
--- /dev/null
+++ b/dpnp/backend/kernels/elementwise_functions/fabs.hpp
@@ -0,0 +1,49 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <sycl/sycl.hpp>
+
+namespace dpnp::kernels::fabs
+{
+template <typename argT, typename resT>
+struct FabsFunctor
+{
+    // is function constant for given argT
+    using is_constant = typename std::false_type;
+    // constant value, if constant
+    // constexpr resT constant_value = resT{};
+    // is function defined for sycl::vec
+    using supports_vec = typename std::false_type;
+    // do both argT and resT support sugroup store/load operation
+    using supports_sg_loadstore = typename std::true_type;
+
+    resT operator()(const argT &x) const
+    {
+        return sycl::fabs(x);
+    }
+};
+} // namespace dpnp::kernels::fabs
diff --git a/dpnp/dpnp_algo/dpnp_algo.pxd b/dpnp/dpnp_algo/dpnp_algo.pxd
index a82a96ed0c59..f6df42981a9f 100644
--- a/dpnp/dpnp_algo/dpnp_algo.pxd
+++ b/dpnp/dpnp_algo/dpnp_algo.pxd
@@ -40,7 +40,6 @@ cdef extern from "dpnp_iface_fptr.hpp" namespace "DPNPFuncName":  # need this na
         DPNP_FN_DEGREES_EXT
         DPNP_FN_EDIFF1D_EXT
         DPNP_FN_ERF_EXT
-        DPNP_FN_FABS_EXT
         DPNP_FN_FFT_FFT_EXT
         DPNP_FN_FFT_RFFT_EXT
         DPNP_FN_FMOD_EXT
diff --git a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
index 2b8d63c6d2dd..405037da7829 100644
--- a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
+++ b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
@@ -37,7 +37,6 @@ and the rest of the library
 
 __all__ += [
     "dpnp_ediff1d",
-    "dpnp_fabs",
     "dpnp_fmod",
     "dpnp_fmax",
     "dpnp_fmin",
@@ -110,10 +109,6 @@ cpdef utils.dpnp_descriptor dpnp_ediff1d(utils.dpnp_descriptor x1):
     return result
 
 
-cpdef utils.dpnp_descriptor dpnp_fabs(utils.dpnp_descriptor x1):
-    return call_fptr_1in_1out_strides(DPNP_FN_FABS_EXT, x1)
-
-
 cpdef utils.dpnp_descriptor dpnp_fmod(utils.dpnp_descriptor x1_obj,
                                       utils.dpnp_descriptor x2_obj,
                                       object dtype=None,
diff --git a/dpnp/dpnp_iface_mathematical.py b/dpnp/dpnp_iface_mathematical.py
index dc27384a9173..2f34be46312b 100644
--- a/dpnp/dpnp_iface_mathematical.py
+++ b/dpnp/dpnp_iface_mathematical.py
@@ -55,12 +55,12 @@
 )
 
 import dpnp
+import dpnp.backend.extensions.ufunc._ufunc_impl as ufi
 import dpnp.backend.extensions.vm._vm_impl as vmi
 
 from .backend.extensions.sycl_ext import _sycl_ext_impl
 from .dpnp_algo import (
     dpnp_ediff1d,
-    dpnp_fabs,
     dpnp_fmax,
     dpnp_fmin,
     dpnp_fmod,
@@ -1347,39 +1347,54 @@ def ediff1d(x1, to_end=None, to_begin=None):
     return call_origin(numpy.ediff1d, x1, to_end=to_end, to_begin=to_begin)
 
 
-def fabs(x1, **kwargs):
-    """
-    Compute the absolute values element-wise.
+_FABS_DOCSTRING = """
+Compute the absolute values element-wise.
 
-    For full documentation refer to :obj:`numpy.fabs`.
+This function returns the absolute values (positive magnitude) of the data in
+`x`. Complex values are not handled, use :obj:`dpnp.absolute` to find the
+absolute values of complex data.
 
-    Limitations
-    -----------
-    Parameter `x1` is supported as :class:`dpnp.ndarray`.
-    Keyword argument `kwargs` is currently unsupported.
-    Otherwise the function will be executed sequentially on CPU.
-    Input array data types are limited by supported DPNP :ref:`Data types`.
+For full documentation refer to :obj:`numpy.fabs`.
 
-    See Also
-    --------
-    :obj:`dpnp.absolute` : Calculate the absolute value element-wise.
+Parameters
+----------
+x : {dpnp.ndarray, usm_ndarray}
+    The array of numbers for which the absolute values are required.
+out : {None, dpnp.ndarray, usm_ndarray}, optional
+    Output array to populate.
+    Array must have the correct shape and the expected data type.
+order : {"C", "F", "A", "K"}, optional
+    Memory layout of the newly output array, if parameter `out` is ``None``.
+    Default: ``"K"``.
 
-    Examples
-    --------
-    >>> import dpnp as np
-    >>> result = np.fabs(np.array([1, -2, 6, -9]))
-    >>> [x for x in result]
-    [1.0, 2.0, 6.0, 9.0]
+Returns
+-------
+out : dpnp.ndarray
+    The absolute values of `x`, the returned values are always floats.
+    If `x` does not have a floating point data type, the returned array
+    will have a data type that depends on the capabilities of the device
+    on which the array resides.
 
-    """
+See Also
+--------
+:obj:`dpnp.absolute` : Absolute values including `complex` types.
 
-    x1_desc = dpnp.get_dpnp_descriptor(
-        x1, copy_when_strides=False, copy_when_nondefault_queue=False
-    )
-    if x1_desc:
-        return dpnp_fabs(x1_desc).get_pyobj()
+Examples
+--------
+>>> import dpnp as np
+>>> a = np.array([-1.2, 1.2])
+>>> np.fabs(a)
+array([1.2, 1.2])
+"""
 
-    return call_origin(numpy.fabs, x1, **kwargs)
+fabs = DPNPUnaryFunc(
+    "fabs",
+    ufi._fabs_result_type,
+    ufi._fabs,
+    _FABS_DOCSTRING,
+    mkl_fn_to_call=vmi._mkl_abs_to_call,
+    mkl_impl_fn=vmi._abs,
+)
 
 
 _FLOOR_DOCSTRING = """
diff --git a/tests/skipped_tests.tbl b/tests/skipped_tests.tbl
index 5e012b3a4966..c86b0d848c5b 100644
--- a/tests/skipped_tests.tbl
+++ b/tests/skipped_tests.tbl
@@ -36,7 +36,6 @@ tests/third_party/cupy/fft_tests/test_fft.py::TestFftn_param_23_{axes=None, norm
 tests/third_party/intel/test_zero_copy_test1.py::test_dpnp_interaction_with_dpctl_memory
 
 tests/test_strides.py::test_strides_1arg[(10,)-None-degrees]
-tests/test_strides.py::test_strides_1arg[(10,)-None-fabs]
 tests/test_strides.py::test_strides_1arg[(10,)-None-radians]
 
 tests/test_umath.py::test_umaths[('divmod', 'ii')]
@@ -260,12 +259,6 @@ tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_inf_ar
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_broadcast[nan]
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_broadcast[posinf]
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_broadcast[neginf]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_empty[_param_0_{mode='valid'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_empty[_param_1_{mode='same'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_empty[_param_2_{mode='full'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_ndim[_param_0_{mode='valid'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_ndim[_param_1_{mode='same'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_ndim[_param_2_{mode='full'}]
 
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_scalar_nan
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_copy
@@ -292,91 +285,6 @@ tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_interp_inf_to_nan
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_heaviside
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_heaviside_nan_inf
 
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_0_{mode='valid', shape1=(), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_1_{mode='valid', shape1=(), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_2_{mode='valid', shape1=(), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_3_{mode='valid', shape1=(), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_4_{mode='valid', shape1=(), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_5_{mode='valid', shape1=(5,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_6_{mode='valid', shape1=(5,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_7_{mode='valid', shape1=(5,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_8_{mode='valid', shape1=(5,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_9_{mode='valid', shape1=(5,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_10_{mode='valid', shape1=(6,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_11_{mode='valid', shape1=(6,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_12_{mode='valid', shape1=(6,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_13_{mode='valid', shape1=(6,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_14_{mode='valid', shape1=(6,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_15_{mode='valid', shape1=(20,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_16_{mode='valid', shape1=(20,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_17_{mode='valid', shape1=(20,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_18_{mode='valid', shape1=(20,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_19_{mode='valid', shape1=(20,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_20_{mode='valid', shape1=(21,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_21_{mode='valid', shape1=(21,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_22_{mode='valid', shape1=(21,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_23_{mode='valid', shape1=(21,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_24_{mode='valid', shape1=(21,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_25_{mode='same', shape1=(), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_26_{mode='same', shape1=(), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_27_{mode='same', shape1=(), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_28_{mode='same', shape1=(), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_29_{mode='same', shape1=(), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_30_{mode='same', shape1=(5,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_31_{mode='same', shape1=(5,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_32_{mode='same', shape1=(5,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_33_{mode='same', shape1=(5,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_34_{mode='same', shape1=(5,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_35_{mode='same', shape1=(6,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_36_{mode='same', shape1=(6,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_37_{mode='same', shape1=(6,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_38_{mode='same', shape1=(6,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_39_{mode='same', shape1=(6,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_40_{mode='same', shape1=(20,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_41_{mode='same', shape1=(20,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_42_{mode='same', shape1=(20,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_43_{mode='same', shape1=(20,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_44_{mode='same', shape1=(20,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_45_{mode='same', shape1=(21,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_46_{mode='same', shape1=(21,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_47_{mode='same', shape1=(21,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_48_{mode='same', shape1=(21,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_49_{mode='same', shape1=(21,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_50_{mode='full', shape1=(), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_51_{mode='full', shape1=(), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_52_{mode='full', shape1=(), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_53_{mode='full', shape1=(), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_54_{mode='full', shape1=(), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_55_{mode='full', shape1=(5,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_56_{mode='full', shape1=(5,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_57_{mode='full', shape1=(5,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_58_{mode='full', shape1=(5,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_59_{mode='full', shape1=(5,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_60_{mode='full', shape1=(6,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_61_{mode='full', shape1=(6,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_62_{mode='full', shape1=(6,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_63_{mode='full', shape1=(6,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_64_{mode='full', shape1=(6,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_65_{mode='full', shape1=(20,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_66_{mode='full', shape1=(20,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_67_{mode='full', shape1=(20,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_68_{mode='full', shape1=(20,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_69_{mode='full', shape1=(20,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_70_{mode='full', shape1=(21,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_71_{mode='full', shape1=(21,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_72_{mode='full', shape1=(21,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_73_{mode='full', shape1=(21,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_74_{mode='full', shape1=(21,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_non_contiguous[valid]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_non_contiguous[same]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_non_contiguous[full]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_large_non_contiguous[valid]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_large_non_contiguous[same]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_large_non_contiguous[full]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_diff_types[valid]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_diff_types[same]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_diff_types[full]
-
 tests/third_party/cupy/math_tests/test_rounding.py::TestRounding::test_fix
 
 tests/third_party/cupy/math_tests/test_trigonometric.py::TestUnwrap::test_unwrap_1dim_with_discont
diff --git a/tests/skipped_tests_gpu.tbl b/tests/skipped_tests_gpu.tbl
index e14b954abe63..45b41f2dafbc 100644
--- a/tests/skipped_tests_gpu.tbl
+++ b/tests/skipped_tests_gpu.tbl
@@ -310,12 +310,6 @@ tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_inf_ar
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_broadcast[nan]
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_broadcast[posinf]
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_broadcast[neginf]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_empty[_param_0_{mode='valid'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_empty[_param_1_{mode='same'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_empty[_param_2_{mode='full'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_ndim[_param_0_{mode='valid'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_ndim[_param_1_{mode='same'}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveInvalid::test_convolve_ndim[_param_2_{mode='full'}]
 
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_scalar_nan
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_copy
@@ -341,90 +335,6 @@ tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_interp_size1
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_interp_inf_to_nan
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_heaviside
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_heaviside_nan_inf
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_0_{mode='valid', shape1=(), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_1_{mode='valid', shape1=(), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_2_{mode='valid', shape1=(), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_3_{mode='valid', shape1=(), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_4_{mode='valid', shape1=(), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_5_{mode='valid', shape1=(5,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_6_{mode='valid', shape1=(5,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_7_{mode='valid', shape1=(5,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_8_{mode='valid', shape1=(5,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_9_{mode='valid', shape1=(5,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_10_{mode='valid', shape1=(6,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_11_{mode='valid', shape1=(6,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_12_{mode='valid', shape1=(6,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_13_{mode='valid', shape1=(6,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_14_{mode='valid', shape1=(6,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_15_{mode='valid', shape1=(20,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_16_{mode='valid', shape1=(20,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_17_{mode='valid', shape1=(20,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_18_{mode='valid', shape1=(20,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_19_{mode='valid', shape1=(20,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_20_{mode='valid', shape1=(21,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_21_{mode='valid', shape1=(21,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_22_{mode='valid', shape1=(21,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_23_{mode='valid', shape1=(21,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_24_{mode='valid', shape1=(21,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_25_{mode='same', shape1=(), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_26_{mode='same', shape1=(), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_27_{mode='same', shape1=(), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_28_{mode='same', shape1=(), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_29_{mode='same', shape1=(), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_30_{mode='same', shape1=(5,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_31_{mode='same', shape1=(5,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_32_{mode='same', shape1=(5,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_33_{mode='same', shape1=(5,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_34_{mode='same', shape1=(5,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_35_{mode='same', shape1=(6,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_36_{mode='same', shape1=(6,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_37_{mode='same', shape1=(6,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_38_{mode='same', shape1=(6,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_39_{mode='same', shape1=(6,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_40_{mode='same', shape1=(20,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_41_{mode='same', shape1=(20,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_42_{mode='same', shape1=(20,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_43_{mode='same', shape1=(20,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_44_{mode='same', shape1=(20,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_45_{mode='same', shape1=(21,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_46_{mode='same', shape1=(21,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_47_{mode='same', shape1=(21,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_48_{mode='same', shape1=(21,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_49_{mode='same', shape1=(21,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_50_{mode='full', shape1=(), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_51_{mode='full', shape1=(), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_52_{mode='full', shape1=(), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_53_{mode='full', shape1=(), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_54_{mode='full', shape1=(), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_55_{mode='full', shape1=(5,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_56_{mode='full', shape1=(5,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_57_{mode='full', shape1=(5,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_58_{mode='full', shape1=(5,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_59_{mode='full', shape1=(5,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_60_{mode='full', shape1=(6,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_61_{mode='full', shape1=(6,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_62_{mode='full', shape1=(6,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_63_{mode='full', shape1=(6,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_64_{mode='full', shape1=(6,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_65_{mode='full', shape1=(20,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_66_{mode='full', shape1=(20,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_67_{mode='full', shape1=(20,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_68_{mode='full', shape1=(20,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_69_{mode='full', shape1=(20,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_70_{mode='full', shape1=(21,), shape2=()}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_71_{mode='full', shape1=(21,), shape2=(5,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_72_{mode='full', shape1=(21,), shape2=(6,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_73_{mode='full', shape1=(21,), shape2=(20,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolveShapeCombination::test_convolve[_param_74_{mode='full', shape1=(21,), shape2=(21,)}]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_non_contiguous[valid]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_non_contiguous[same]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_non_contiguous[full]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_large_non_contiguous[valid]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_large_non_contiguous[same]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_large_non_contiguous[full]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_diff_types[valid]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_diff_types[same]
-tests/third_party/cupy/math_tests/test_misc.py::TestConvolve::test_convolve_diff_types[full]
 
 tests/third_party/cupy/math_tests/test_rounding.py::TestRounding::test_fix
 
diff --git a/tests/skipped_tests_gpu_no_fp64.tbl b/tests/skipped_tests_gpu_no_fp64.tbl
index c209c876df6b..44e4c856b773 100644
--- a/tests/skipped_tests_gpu_no_fp64.tbl
+++ b/tests/skipped_tests_gpu_no_fp64.tbl
@@ -1,12 +1,5 @@
-tests/test_strides.py::test_strides_1arg[(10,)-int32-fabs]
-tests/test_strides.py::test_strides_1arg[(10,)-int64-fabs]
-tests/test_strides.py::test_strides_1arg[(10,)-None-fabs]
-
 tests/test_umath.py::test_umaths[('floor_divide', 'ff')]
 
-tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_fabs
-tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_fabs_negative
-
 tests/third_party/cupy/math_tests/test_trigonometric.py::TestUnwrap::test_unwrap_1dim
 
 tests/third_party/cupy/random_tests/test_distributions.py::TestDistributionsBeta_param_6_{a_shape=(3, 2), b_shape=(3, 2), shape=(4, 3, 2)}::test_beta
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index 77839f9b9338..4f7314ff2db0 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -538,6 +538,7 @@ def test_norm(usm_type, ord, axis):
         pytest.param("exp", [1.0, 2.0, 4.0, 7.0]),
         pytest.param("exp2", [0.0, 1.0, 2.0]),
         pytest.param("expm1", [1.0e-10, 1.0, 2.0, 4.0, 7.0]),
+        pytest.param("fabs", [-1.2, 1.2]),
         pytest.param("floor", [-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0]),
         pytest.param("gradient", [1, 2, 4, 7, 11, 16]),
         pytest.param("histogram_bin_edges", [0, 0, 0, 1, 2, 3, 3, 4, 5]),
diff --git a/tests/third_party/cupy/math_tests/test_misc.py b/tests/third_party/cupy/math_tests/test_misc.py
index dd7fe9dcc1aa..62717803aca3 100644
--- a/tests/third_party/cupy/math_tests/test_misc.py
+++ b/tests/third_party/cupy/math_tests/test_misc.py
@@ -26,6 +26,7 @@ def check_binary(self, name, xp, dtype, no_bool=False):
 
     @testing.for_dtypes(["?", "b", "h", "i", "q", "e", "f", "d", "F", "D"])
     @testing.numpy_cupy_allclose(atol=1e-5)
+    # TODO: remove no_comlex=True, once adopted to numpy 2.0
     def check_unary_negative(
         self, name, xp, dtype, no_bool=False, no_complex=False
     ):
@@ -184,13 +185,13 @@ def test_absolute_negative(self):
         self.check_unary_negative("absolute")
 
     @testing.for_all_dtypes(no_complex=True)
-    @testing.numpy_cupy_allclose(atol=1e-5)
+    @testing.numpy_cupy_allclose(atol=1e-5, type_check=has_support_aspect64())
     def test_fabs(self, xp, dtype):
         a = xp.array([2, 3, 4], dtype=dtype)
         return xp.fabs(a)
 
     @testing.for_all_dtypes(no_complex=True)
-    @testing.numpy_cupy_allclose(atol=1e-5)
+    @testing.numpy_cupy_allclose(atol=1e-5, type_check=has_support_aspect64())
     def test_fabs_negative(self, xp, dtype):
         a = xp.array([-2.0, -4.0, 0.0, 4.0], dtype=dtype)
         return xp.fabs(a)
@@ -198,7 +199,7 @@ def test_fabs_negative(self, xp, dtype):
     def test_sign(self):
         self.check_unary("sign", no_bool=True)
 
-    # TODO: remove no_comlex=True, when numpy 2.0.0 will release
+    # TODO: remove no_comlex=True, once adopted to numpy 2.0
     def test_sign_negative(self):
         self.check_unary_negative("sign", no_bool=True, no_complex=True)
 
@@ -504,6 +505,7 @@ def test_heaviside_nan_inf(self, xp, dtype_1, dtype_2):
         }
     )
 )
+@pytest.mark.skip("convolve() is not implemented yet")
 class TestConvolveShapeCombination:
     @testing.for_all_dtypes(no_float16=True)
     @testing.numpy_cupy_allclose(rtol=1e-3)
@@ -513,6 +515,7 @@ def test_convolve(self, xp, dtype):
         return xp.convolve(a, b, mode=self.mode)
 
 
+@pytest.mark.skip("convolve() is not implemented yet")
 @pytest.mark.parametrize("mode", ["valid", "same", "full"])
 class TestConvolve:
     @testing.for_all_dtypes(no_float16=True)
@@ -537,6 +540,7 @@ def test_convolve_diff_types(self, xp, dtype1, dtype2, mode):
         return xp.convolve(a, b, mode=mode)
 
 
+@pytest.mark.skip("convolve() is not implemented yet")
 @testing.parameterize(*testing.product({"mode": ["valid", "same", "full"]}))
 class TestConvolveInvalid:
     @testing.for_all_dtypes()

From a01f21f8dcb1b3d843caed7811d14cf87592209b Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 16 Jun 2024 11:11:54 +0200
Subject: [PATCH 26/49] Bump github/codeql-action from 3.25.8 to 3.25.10
 (#1885)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.25.8 to 3.25.10.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/2e230e8fe0ad3a14a340ad0815ddb96d599d2aff...23acc5c183826b7a8a97bce3cecc52db901f8251)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 .github/workflows/openssf-scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/openssf-scorecard.yml b/.github/workflows/openssf-scorecard.yml
index 5d0d13d45fba..09b21df13508 100644
--- a/.github/workflows/openssf-scorecard.yml
+++ b/.github/workflows/openssf-scorecard.yml
@@ -68,6 +68,6 @@ jobs:
 
       # Upload the results to GitHub's code scanning dashboard.
       - name: "Upload to code-scanning"
-        uses: github/codeql-action/upload-sarif@2e230e8fe0ad3a14a340ad0815ddb96d599d2aff # v3.25.8
+        uses: github/codeql-action/upload-sarif@23acc5c183826b7a8a97bce3cecc52db901f8251 # v3.25.10
         with:
           sarif_file: results.sarif

From af601c60d1c043b4056c661107d8e108a4a247bd Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 16 Jun 2024 18:36:14 +0200
Subject: [PATCH 27/49] Bump actions/checkout from 4.1.6 to 4.1.7 (#1886)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.6 to 4.1.7.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/a5ac7e51b41094c92402da3b24376905380afc29...692973e3d937129bcbf40652eb9f2f61becf3332)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 .github/workflows/build-sphinx.yml       | 4 ++--
 .github/workflows/conda-package.yml      | 4 ++--
 .github/workflows/generate_coverage.yaml | 2 +-
 .github/workflows/openssf-scorecard.yml  | 2 +-
 .github/workflows/pre-commit.yml         | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/.github/workflows/build-sphinx.yml b/.github/workflows/build-sphinx.yml
index 02d4be095413..13c84de50e71 100644
--- a/.github/workflows/build-sphinx.yml
+++ b/.github/workflows/build-sphinx.yml
@@ -91,7 +91,7 @@ jobs:
           sudo apt-get install -y nvidia-cuda-toolkit clinfo
 
       - name: Checkout repo
-        uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
+        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
         with:
           fetch-depth: 0
 
@@ -221,7 +221,7 @@ jobs:
     runs-on: ubuntu-20.04
 
     steps:
-      - uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
+      - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
         with:
           fetch-depth: 0
 
diff --git a/.github/workflows/conda-package.yml b/.github/workflows/conda-package.yml
index 83c657a77c5c..8f474e5398e0 100644
--- a/.github/workflows/conda-package.yml
+++ b/.github/workflows/conda-package.yml
@@ -90,7 +90,7 @@ jobs:
           access_token: ${{ github.token }}
 
       - name: Checkout DPNP repo
-        uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
+        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
         with:
           fetch-depth: 0
 
@@ -515,7 +515,7 @@ jobs:
         run: mamba install anaconda-client
 
       - name: Checkout repo
-        uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
+        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
         with:
           repository: IntelPython/devops-tools
           fetch-depth: 0
diff --git a/.github/workflows/generate_coverage.yaml b/.github/workflows/generate_coverage.yaml
index 22ec13da23af..1fa71fb479dc 100644
--- a/.github/workflows/generate_coverage.yaml
+++ b/.github/workflows/generate_coverage.yaml
@@ -32,7 +32,7 @@ jobs:
           access_token: ${{ github.token }}
 
       - name: Checkout repo
-        uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
+        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
         with:
           fetch-depth: 0
 
diff --git a/.github/workflows/openssf-scorecard.yml b/.github/workflows/openssf-scorecard.yml
index 09b21df13508..803f20d284b4 100644
--- a/.github/workflows/openssf-scorecard.yml
+++ b/.github/workflows/openssf-scorecard.yml
@@ -33,7 +33,7 @@ jobs:
 
     steps:
       - name: "Checkout code"
-        uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
+        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
         with:
           persist-credentials: false
 
diff --git a/.github/workflows/pre-commit.yml b/.github/workflows/pre-commit.yml
index 0a5f91f89bff..3b2f1c9e215c 100644
--- a/.github/workflows/pre-commit.yml
+++ b/.github/workflows/pre-commit.yml
@@ -26,7 +26,7 @@ jobs:
             pylint
 
       - name: Checkout DPNP repo
-        uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
+        uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
 
       - name: Set up python
         uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5.1.0

From a813fae646ec784e0be994568ae63c342ee6288c Mon Sep 17 00:00:00 2001
From: vtavana <120411540+vtavana@users.noreply.github.com>
Date: Mon, 17 Jun 2024 06:10:05 -0500
Subject: [PATCH 28/49] update BLAS extension routines (#1884)

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 dpnp/backend/extensions/blas/blas_py.cpp      | 57 +++++------
 dpnp/backend/extensions/blas/dot.hpp          | 21 ++---
 dpnp/backend/extensions/blas/dot_common.hpp   | 44 ++++-----
 dpnp/backend/extensions/blas/dotc.hpp         | 21 ++---
 dpnp/backend/extensions/blas/dotu.hpp         | 21 ++---
 dpnp/backend/extensions/blas/gemm.cpp         | 73 +++++++-------
 dpnp/backend/extensions/blas/gemm.hpp         | 27 ++----
 dpnp/backend/extensions/blas/gemm_batch.cpp   | 94 +++++++++----------
 dpnp/backend/extensions/blas/gemv.cpp         | 62 ++++++------
 dpnp/backend/extensions/blas/gemv.hpp         | 31 +++---
 dpnp/backend/extensions/blas/types_matrix.hpp | 16 +---
 11 files changed, 192 insertions(+), 275 deletions(-)

diff --git a/dpnp/backend/extensions/blas/blas_py.cpp b/dpnp/backend/extensions/blas/blas_py.cpp
index b5d83375f239..54fde4f4fea4 100644
--- a/dpnp/backend/extensions/blas/blas_py.cpp
+++ b/dpnp/backend/extensions/blas/blas_py.cpp
@@ -37,17 +37,17 @@
 #include "gemm.hpp"
 #include "gemv.hpp"
 
-namespace blas_ext = dpnp::backend::ext::blas;
+namespace blas_ns = dpnp::extensions::blas;
 namespace py = pybind11;
-namespace dot_ext = blas_ext::dot;
-using dot_ext::dot_impl_fn_ptr_t;
+namespace dot_ns = blas_ns::dot;
+using dot_ns::dot_impl_fn_ptr_t;
 
 // populate dispatch vectors and tables
 void init_dispatch_vectors_tables(void)
 {
-    blas_ext::init_gemm_batch_dispatch_table();
-    blas_ext::init_gemm_dispatch_table();
-    blas_ext::init_gemv_dispatch_vector();
+    blas_ns::init_gemm_batch_dispatch_table();
+    blas_ns::init_gemm_dispatch_table();
+    blas_ns::init_gemv_dispatch_vector();
 }
 
 static dot_impl_fn_ptr_t dot_dispatch_vector[dpctl_td_ns::num_types];
@@ -62,14 +62,15 @@ PYBIND11_MODULE(_blas_impl, m)
     using event_vecT = std::vector<sycl::event>;
 
     {
-        dot_ext::init_dot_dispatch_vector<dot_impl_fn_ptr_t,
-                                          blas_ext::DotContigFactory>(
+        dot_ns::init_dot_dispatch_vector<dot_impl_fn_ptr_t,
+                                         blas_ns::DotContigFactory>(
             dot_dispatch_vector);
 
-        auto dot_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                             arrayT dst, const event_vecT &depends = {}) {
-            return dot_ext::dot_func(exec_q, src1, src2, dst, depends,
-                                     dot_dispatch_vector);
+        auto dot_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                             const arrayT &src2, const arrayT &dst,
+                             const event_vecT &depends = {}) {
+            return dot_ns::dot_func(exec_q, src1, src2, dst, depends,
+                                    dot_dispatch_vector);
         };
 
         m.def("_dot", dot_pyapi,
@@ -80,14 +81,15 @@ PYBIND11_MODULE(_blas_impl, m)
     }
 
     {
-        dot_ext::init_dot_dispatch_vector<dot_impl_fn_ptr_t,
-                                          blas_ext::DotcContigFactory>(
+        dot_ns::init_dot_dispatch_vector<dot_impl_fn_ptr_t,
+                                         blas_ns::DotcContigFactory>(
             dotc_dispatch_vector);
 
-        auto dotc_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                              arrayT dst, const event_vecT &depends = {}) {
-            return dot_ext::dot_func(exec_q, src1, src2, dst, depends,
-                                     dotc_dispatch_vector);
+        auto dotc_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                              const arrayT &src2, const arrayT &dst,
+                              const event_vecT &depends = {}) {
+            return dot_ns::dot_func(exec_q, src1, src2, dst, depends,
+                                    dotc_dispatch_vector);
         };
 
         m.def("_dotc", dotc_pyapi,
@@ -99,14 +101,15 @@ PYBIND11_MODULE(_blas_impl, m)
     }
 
     {
-        dot_ext::init_dot_dispatch_vector<dot_impl_fn_ptr_t,
-                                          blas_ext::DotuContigFactory>(
+        dot_ns::init_dot_dispatch_vector<dot_impl_fn_ptr_t,
+                                         blas_ns::DotuContigFactory>(
             dotu_dispatch_vector);
 
-        auto dotu_pyapi = [&](sycl::queue exec_q, arrayT src1, arrayT src2,
-                              arrayT dst, const event_vecT &depends = {}) {
-            return dot_ext::dot_func(exec_q, src1, src2, dst, depends,
-                                     dotu_dispatch_vector);
+        auto dotu_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                              const arrayT &src2, const arrayT &dst,
+                              const event_vecT &depends = {}) {
+            return dot_ns::dot_func(exec_q, src1, src2, dst, depends,
+                                    dotu_dispatch_vector);
         };
 
         m.def("_dotu", dotu_pyapi,
@@ -117,7 +120,7 @@ PYBIND11_MODULE(_blas_impl, m)
     }
 
     {
-        m.def("_gemm", &blas_ext::gemm,
+        m.def("_gemm", &blas_ns::gemm,
               "Call `gemm` from OneMKL BLAS library to compute "
               "the matrix-matrix product with 2-D matrices.",
               py::arg("sycl_queue"), py::arg("matrixA"), py::arg("matrixB"),
@@ -125,7 +128,7 @@ PYBIND11_MODULE(_blas_impl, m)
     }
 
     {
-        m.def("_gemm_batch", &blas_ext::gemm_batch,
+        m.def("_gemm_batch", &blas_ns::gemm_batch,
               "Call `gemm_batch` from OneMKL BLAS library to compute "
               "the matrix-matrix product for a batch of 2-D matrices.",
               py::arg("sycl_queue"), py::arg("matrixA"), py::arg("matrixB"),
@@ -133,7 +136,7 @@ PYBIND11_MODULE(_blas_impl, m)
     }
 
     {
-        m.def("_gemv", &blas_ext::gemv,
+        m.def("_gemv", &blas_ns::gemv,
               "Call `gemv` from OneMKL BLAS library to compute "
               "the matrix-vector product with a general matrix.",
               py::arg("sycl_queue"), py::arg("matrixA"), py::arg("vectorX"),
diff --git a/dpnp/backend/extensions/blas/dot.hpp b/dpnp/backend/extensions/blas/dot.hpp
index 7e665b1f74d8..e700f9830974 100644
--- a/dpnp/backend/extensions/blas/dot.hpp
+++ b/dpnp/backend/extensions/blas/dot.hpp
@@ -27,13 +27,7 @@
 
 #include "dot_common.hpp"
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
+namespace dpnp::extensions::blas
 {
 namespace mkl_blas = oneapi::mkl::blas;
 namespace type_utils = dpctl::tensor::type_utils;
@@ -41,17 +35,17 @@ namespace type_utils = dpctl::tensor::type_utils;
 template <typename T>
 static sycl::event dot_impl(sycl::queue &exec_q,
                             const std::int64_t n,
-                            char *vectorX,
+                            const char *vectorX,
                             const std::int64_t incx,
-                            char *vectorY,
+                            const char *vectorY,
                             const std::int64_t incy,
                             char *result,
                             const std::vector<sycl::event> &depends)
 {
     type_utils::validate_type_for_device<T>(exec_q);
 
-    T *x = reinterpret_cast<T *>(vectorX);
-    T *y = reinterpret_cast<T *>(vectorY);
+    const T *x = reinterpret_cast<const T *>(vectorX);
+    const T *y = reinterpret_cast<const T *>(vectorY);
     T *res = reinterpret_cast<T *>(result);
 
     std::stringstream error_msg;
@@ -99,7 +93,4 @@ struct DotContigFactory
         }
     }
 };
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas
diff --git a/dpnp/backend/extensions/blas/dot_common.hpp b/dpnp/backend/extensions/blas/dot_common.hpp
index 15e7c694f74f..4ee8201338c1 100644
--- a/dpnp/backend/extensions/blas/dot_common.hpp
+++ b/dpnp/backend/extensions/blas/dot_common.hpp
@@ -36,21 +36,13 @@
 
 #include "types_matrix.hpp"
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
-{
-namespace dot
+namespace dpnp::extensions::blas::dot
 {
 typedef sycl::event (*dot_impl_fn_ptr_t)(sycl::queue &,
                                          const std::int64_t,
-                                         char *,
+                                         const char *,
                                          const std::int64_t,
-                                         char *,
+                                         const char *,
                                          const std::int64_t,
                                          char *,
                                          const std::vector<sycl::event> &);
@@ -61,9 +53,9 @@ namespace py = pybind11;
 template <typename dispatchT>
 std::pair<sycl::event, sycl::event>
     dot_func(sycl::queue &exec_q,
-             dpctl::tensor::usm_ndarray vectorX,
-             dpctl::tensor::usm_ndarray vectorY,
-             dpctl::tensor::usm_ndarray result,
+             const dpctl::tensor::usm_ndarray &vectorX,
+             const dpctl::tensor::usm_ndarray &vectorY,
+             const dpctl::tensor::usm_ndarray &result,
              const std::vector<sycl::event> &depends,
              const dispatchT &dot_dispatch_vector)
 {
@@ -109,22 +101,22 @@ std::pair<sycl::event, sycl::event>
             "USM allocations are not compatible with the execution queue.");
     }
 
-    size_t src_nelems = 1;
+    const int src_nelems = 1;
     dpctl::tensor::validation::CheckWritable::throw_if_not_writable(result);
     dpctl::tensor::validation::AmpleMemory::throw_if_not_ample(result,
                                                                src_nelems);
 
-    py::ssize_t x_size = vectorX.get_size();
-    py::ssize_t y_size = vectorY.get_size();
+    const py::ssize_t x_size = vectorX.get_size();
+    const py::ssize_t y_size = vectorY.get_size();
     const std::int64_t n = x_size;
     if (x_size != y_size) {
         throw py::value_error("The size of the first input array must be "
                               "equal to the size of the second input array.");
     }
 
-    int vectorX_typenum = vectorX.get_typenum();
-    int vectorY_typenum = vectorY.get_typenum();
-    int result_typenum = result.get_typenum();
+    const int vectorX_typenum = vectorX.get_typenum();
+    const int vectorY_typenum = vectorY.get_typenum();
+    const int result_typenum = result.get_typenum();
 
     if (result_typenum != vectorX_typenum || result_typenum != vectorY_typenum)
     {
@@ -132,7 +124,7 @@ std::pair<sycl::event, sycl::event>
     }
 
     auto array_types = dpctl_td_ns::usm_ndarray_types();
-    int type_id = array_types.typenum_to_lookup_id(vectorX_typenum);
+    const int type_id = array_types.typenum_to_lookup_id(vectorX_typenum);
 
     dot_impl_fn_ptr_t dot_fn = dot_dispatch_vector[type_id];
     if (dot_fn == nullptr) {
@@ -144,8 +136,8 @@ std::pair<sycl::event, sycl::event>
     char *y_typeless_ptr = vectorY.get_data();
     char *r_typeless_ptr = result.get_data();
 
-    std::vector<py::ssize_t> x_stride = vectorX.get_strides_vector();
-    std::vector<py::ssize_t> y_stride = vectorY.get_strides_vector();
+    const std::vector<py::ssize_t> x_stride = vectorX.get_strides_vector();
+    const std::vector<py::ssize_t> y_stride = vectorY.get_strides_vector();
     const int x_elemsize = vectorX.get_elemsize();
     const int y_elemsize = vectorY.get_elemsize();
 
@@ -184,8 +176,4 @@ void init_dot_dispatch_vector(dispatchT dot_dispatch_vector[])
         contig;
     contig.populate_dispatch_vector(dot_dispatch_vector);
 }
-} // namespace dot
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas::dot
diff --git a/dpnp/backend/extensions/blas/dotc.hpp b/dpnp/backend/extensions/blas/dotc.hpp
index 8ca78c203437..417c832bf067 100644
--- a/dpnp/backend/extensions/blas/dotc.hpp
+++ b/dpnp/backend/extensions/blas/dotc.hpp
@@ -27,13 +27,7 @@
 
 #include "dot_common.hpp"
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
+namespace dpnp::extensions::blas
 {
 namespace mkl_blas = oneapi::mkl::blas;
 namespace type_utils = dpctl::tensor::type_utils;
@@ -41,17 +35,17 @@ namespace type_utils = dpctl::tensor::type_utils;
 template <typename T>
 static sycl::event dotc_impl(sycl::queue &exec_q,
                              const std::int64_t n,
-                             char *vectorX,
+                             const char *vectorX,
                              const std::int64_t incx,
-                             char *vectorY,
+                             const char *vectorY,
                              const std::int64_t incy,
                              char *result,
                              const std::vector<sycl::event> &depends)
 {
     type_utils::validate_type_for_device<T>(exec_q);
 
-    T *x = reinterpret_cast<T *>(vectorX);
-    T *y = reinterpret_cast<T *>(vectorY);
+    const T *x = reinterpret_cast<const T *>(vectorX);
+    const T *y = reinterpret_cast<const T *>(vectorY);
     T *res = reinterpret_cast<T *>(result);
 
     std::stringstream error_msg;
@@ -100,7 +94,4 @@ struct DotcContigFactory
     }
 };
 
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas
diff --git a/dpnp/backend/extensions/blas/dotu.hpp b/dpnp/backend/extensions/blas/dotu.hpp
index 832e99fff5e0..51c30735d228 100644
--- a/dpnp/backend/extensions/blas/dotu.hpp
+++ b/dpnp/backend/extensions/blas/dotu.hpp
@@ -27,13 +27,7 @@
 
 #include "dot_common.hpp"
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
+namespace dpnp::extensions::blas
 {
 namespace mkl_blas = oneapi::mkl::blas;
 namespace type_utils = dpctl::tensor::type_utils;
@@ -41,17 +35,17 @@ namespace type_utils = dpctl::tensor::type_utils;
 template <typename T>
 static sycl::event dotu_impl(sycl::queue &exec_q,
                              const std::int64_t n,
-                             char *vectorX,
+                             const char *vectorX,
                              const std::int64_t incx,
-                             char *vectorY,
+                             const char *vectorY,
                              const std::int64_t incy,
                              char *result,
                              const std::vector<sycl::event> &depends)
 {
     type_utils::validate_type_for_device<T>(exec_q);
 
-    T *x = reinterpret_cast<T *>(vectorX);
-    T *y = reinterpret_cast<T *>(vectorY);
+    const T *x = reinterpret_cast<const T *>(vectorX);
+    const T *y = reinterpret_cast<const T *>(vectorY);
     T *res = reinterpret_cast<T *>(result);
 
     std::stringstream error_msg;
@@ -99,7 +93,4 @@ struct DotuContigFactory
         }
     }
 };
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas
diff --git a/dpnp/backend/extensions/blas/gemm.cpp b/dpnp/backend/extensions/blas/gemm.cpp
index c1005f797b18..f47f8ebe7ae3 100644
--- a/dpnp/backend/extensions/blas/gemm.cpp
+++ b/dpnp/backend/extensions/blas/gemm.cpp
@@ -35,13 +35,7 @@
 
 #include "dpnp_utils.hpp"
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
+namespace dpnp::extensions::blas
 {
 namespace mkl_blas = oneapi::mkl::blas;
 namespace py = pybind11;
@@ -53,13 +47,13 @@ typedef sycl::event (*gemm_impl_fn_ptr_t)(sycl::queue &,
                                           const std::int64_t,
                                           const std::int64_t,
                                           const std::int64_t,
-                                          char *,
+                                          const char *,
                                           const std::int64_t,
-                                          char *,
+                                          const char *,
                                           const std::int64_t,
                                           char *,
                                           const std::int64_t,
-                                          bool,
+                                          const bool,
                                           const std::vector<sycl::event> &);
 
 static gemm_impl_fn_ptr_t gemm_dispatch_table[dpctl_td_ns::num_types]
@@ -72,20 +66,20 @@ static sycl::event gemm_impl(sycl::queue &exec_q,
                              const std::int64_t m,
                              const std::int64_t n,
                              const std::int64_t k,
-                             char *matrixA,
+                             const char *matrixA,
                              const std::int64_t lda,
-                             char *matrixB,
+                             const char *matrixB,
                              const std::int64_t ldb,
                              char *resultC,
                              const std::int64_t ldc,
-                             bool is_row_major,
+                             const bool is_row_major,
                              const std::vector<sycl::event> &depends)
 {
     type_utils::validate_type_for_device<Tab>(exec_q);
     type_utils::validate_type_for_device<Tc>(exec_q);
 
-    Tab *a = reinterpret_cast<Tab *>(matrixA);
-    Tab *b = reinterpret_cast<Tab *>(matrixB);
+    const Tab *a = reinterpret_cast<const Tab *>(matrixA);
+    const Tab *b = reinterpret_cast<const Tab *>(matrixB);
     Tc *res = reinterpret_cast<Tc *>(resultC);
 
     std::stringstream error_msg;
@@ -95,10 +89,10 @@ static sycl::event gemm_impl(sycl::queue &exec_q,
     try {
         auto gemm_func =
             [&](sycl::queue &q, oneapi::mkl::transpose transA,
-                oneapi::mkl::transpose transB, std::int64_t m, std::int64_t n,
-                std::int64_t k, Tab alpha, const Tab *a, std::int64_t lda,
-                const Tab *b, std::int64_t ldb, Tab beta, Tc *c,
-                std::int64_t ldc,
+                oneapi::mkl::transpose transB, const std::int64_t m,
+                const std::int64_t n, const std::int64_t k, Tab alpha,
+                const Tab *a, const std::int64_t lda, const Tab *b,
+                const std::int64_t ldb, Tab beta, Tc *c, const std::int64_t ldc,
                 const std::vector<sycl::event> &deps) -> sycl::event {
             if (is_row_major) {
                 return mkl_blas::row_major::gemm(q, transA, transB, m, n, k,
@@ -152,9 +146,9 @@ static sycl::event gemm_impl(sycl::queue &exec_q,
 
 std::tuple<sycl::event, sycl::event, bool>
     gemm(sycl::queue &exec_q,
-         dpctl::tensor::usm_ndarray matrixA,
-         dpctl::tensor::usm_ndarray matrixB,
-         dpctl::tensor::usm_ndarray resultC,
+         const dpctl::tensor::usm_ndarray &matrixA,
+         const dpctl::tensor::usm_ndarray &matrixB,
+         const dpctl::tensor::usm_ndarray &resultC,
          const std::vector<sycl::event> &depends)
 {
     const int matrixA_nd = matrixA.get_ndim();
@@ -204,17 +198,17 @@ std::tuple<sycl::event, sycl::event, bool>
                               "the number of columns in result array.");
     }
 
-    size_t src_nelems = m * n;
+    const std::size_t src_nelems = m * n;
     dpctl::tensor::validation::CheckWritable::throw_if_not_writable(resultC);
     dpctl::tensor::validation::AmpleMemory::throw_if_not_ample(resultC,
                                                                src_nelems);
 
-    bool is_matrixA_f_contig = matrixA.is_f_contiguous();
-    bool is_matrixB_f_contig = matrixB.is_f_contiguous();
-    bool is_resultC_f_contig = resultC.is_f_contiguous();
-    bool is_matrixA_c_contig = matrixA.is_c_contiguous();
-    bool is_matrixB_c_contig = matrixB.is_c_contiguous();
-    bool is_resultC_c_contig = resultC.is_c_contiguous();
+    const bool is_matrixA_f_contig = matrixA.is_f_contiguous();
+    const bool is_matrixB_f_contig = matrixB.is_f_contiguous();
+    const bool is_resultC_f_contig = resultC.is_f_contiguous();
+    const bool is_matrixA_c_contig = matrixA.is_c_contiguous();
+    const bool is_matrixB_c_contig = matrixB.is_c_contiguous();
+    const bool is_resultC_c_contig = resultC.is_c_contiguous();
 
     if (!is_matrixA_f_contig and !is_matrixA_c_contig) {
         throw py::value_error(
@@ -267,17 +261,19 @@ std::tuple<sycl::event, sycl::event, bool>
     }
     const std::int64_t ldc = is_row_major ? n : m;
 
-    int matrixA_typenum = matrixA.get_typenum();
-    int matrixB_typenum = matrixB.get_typenum();
-    int resultC_typenum = resultC.get_typenum();
+    const int matrixA_typenum = matrixA.get_typenum();
+    const int matrixB_typenum = matrixB.get_typenum();
+    const int resultC_typenum = resultC.get_typenum();
 
     if (matrixA_typenum != matrixB_typenum) {
         throw py::value_error("matrixA and matrixB must be of the same type.");
     }
 
     auto array_types = dpctl_td_ns::usm_ndarray_types();
-    int matrixAB_type_id = array_types.typenum_to_lookup_id(matrixA_typenum);
-    int resultC_type_id = array_types.typenum_to_lookup_id(resultC_typenum);
+    const int matrixAB_type_id =
+        array_types.typenum_to_lookup_id(matrixA_typenum);
+    const int resultC_type_id =
+        array_types.typenum_to_lookup_id(resultC_typenum);
 
     gemm_impl_fn_ptr_t gemm_fn =
         gemm_dispatch_table[matrixAB_type_id][resultC_type_id];
@@ -286,8 +282,8 @@ std::tuple<sycl::event, sycl::event, bool>
             "Types of input matrices and result matrix are mismatched.");
     }
 
-    char *a_typeless_ptr = matrixA.get_data();
-    char *b_typeless_ptr = matrixB.get_data();
+    const char *a_typeless_ptr = matrixA.get_data();
+    const char *b_typeless_ptr = matrixB.get_data();
     char *r_typeless_ptr = resultC.get_data();
 
     sycl::event gemm_ev = gemm_fn(exec_q, transA, transB, m, n, k,
@@ -321,7 +317,4 @@ void init_gemm_dispatch_table(void)
         contig;
     contig.populate_dispatch_table(gemm_dispatch_table);
 }
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas
diff --git a/dpnp/backend/extensions/blas/gemm.hpp b/dpnp/backend/extensions/blas/gemm.hpp
index 6e3a58402698..ee14400ae254 100644
--- a/dpnp/backend/extensions/blas/gemm.hpp
+++ b/dpnp/backend/extensions/blas/gemm.hpp
@@ -25,36 +25,27 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
+namespace dpnp::extensions::blas
 {
 extern std::tuple<sycl::event, sycl::event, bool>
     gemm(sycl::queue &exec_q,
-         dpctl::tensor::usm_ndarray matrixA,
-         dpctl::tensor::usm_ndarray matrixB,
-         dpctl::tensor::usm_ndarray resultC,
+         const dpctl::tensor::usm_ndarray &matrixA,
+         const dpctl::tensor::usm_ndarray &matrixB,
+         const dpctl::tensor::usm_ndarray &resultC,
          const std::vector<sycl::event> &depends);
 
 extern std::tuple<sycl::event, sycl::event, bool>
     gemm_batch(sycl::queue &exec_q,
-               dpctl::tensor::usm_ndarray matrixA,
-               dpctl::tensor::usm_ndarray matrixB,
-               dpctl::tensor::usm_ndarray resultC,
+               const dpctl::tensor::usm_ndarray &matrixA,
+               const dpctl::tensor::usm_ndarray &matrixB,
+               const dpctl::tensor::usm_ndarray &resultC,
                const std::vector<sycl::event> &depends);
 
 extern void init_gemm_dispatch_table(void);
 extern void init_gemm_batch_dispatch_table(void);
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas
diff --git a/dpnp/backend/extensions/blas/gemm_batch.cpp b/dpnp/backend/extensions/blas/gemm_batch.cpp
index 689ef77b786f..640cc7791208 100644
--- a/dpnp/backend/extensions/blas/gemm_batch.cpp
+++ b/dpnp/backend/extensions/blas/gemm_batch.cpp
@@ -35,13 +35,7 @@
 
 #include "dpnp_utils.hpp"
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
+namespace dpnp::extensions::blas
 {
 namespace mkl_blas = oneapi::mkl::blas;
 namespace py = pybind11;
@@ -56,15 +50,15 @@ typedef sycl::event (*gemm_batch_impl_fn_ptr_t)(
     const std::int64_t,
     const std::int64_t,
     const std::int64_t,
-    size_t,
-    size_t,
-    size_t,
+    const std::int64_t,
+    const std::int64_t,
+    const std::int64_t,
     oneapi::mkl::transpose,
     oneapi::mkl::transpose,
+    const char *,
+    const char *,
     char *,
-    char *,
-    char *,
-    bool,
+    const bool,
     const std::vector<sycl::event> &);
 
 static gemm_batch_impl_fn_ptr_t
@@ -79,22 +73,22 @@ static sycl::event gemm_batch_impl(sycl::queue &exec_q,
                                    const std::int64_t lda,
                                    const std::int64_t ldb,
                                    const std::int64_t ldc,
-                                   size_t stridea,
-                                   size_t strideb,
-                                   size_t stridec,
+                                   const std::int64_t stridea,
+                                   const std::int64_t strideb,
+                                   const std::int64_t stridec,
                                    oneapi::mkl::transpose transA,
                                    oneapi::mkl::transpose transB,
-                                   char *matrixA,
-                                   char *matrixB,
+                                   const char *matrixA,
+                                   const char *matrixB,
                                    char *resultC,
-                                   bool is_row_major,
+                                   const bool is_row_major,
                                    const std::vector<sycl::event> &depends)
 {
     type_utils::validate_type_for_device<Tab>(exec_q);
     type_utils::validate_type_for_device<Tc>(exec_q);
 
-    Tab *a = reinterpret_cast<Tab *>(matrixA);
-    Tab *b = reinterpret_cast<Tab *>(matrixB);
+    const Tab *a = reinterpret_cast<const Tab *>(matrixA);
+    const Tab *b = reinterpret_cast<const Tab *>(matrixB);
     Tc *res = reinterpret_cast<Tc *>(resultC);
 
     std::stringstream error_msg;
@@ -104,11 +98,13 @@ static sycl::event gemm_batch_impl(sycl::queue &exec_q,
     try {
         auto gemm_batch_func =
             [&](sycl::queue &q, oneapi::mkl::transpose transA,
-                oneapi::mkl::transpose transB, std::int64_t m, std::int64_t n,
-                std::int64_t k, Tab alpha, const Tab *a, std::int64_t lda,
-                std::int64_t stridea, const Tab *b, std::int64_t ldb,
-                std::int64_t strideb, Tab beta, Tc *c, std::int64_t ldc,
-                std::int64_t stridec, std::int64_t batch_size,
+                oneapi::mkl::transpose transB, const std::int64_t m,
+                const std::int64_t n, const std::int64_t k, Tab alpha,
+                const Tab *a, const std::int64_t lda,
+                const std::int64_t stridea, const Tab *b,
+                const std::int64_t ldb, const std::int64_t strideb, Tab beta,
+                Tc *c, const std::int64_t ldc, const std::int64_t stridec,
+                const std::int64_t batch_size,
                 const std::vector<sycl::event> &deps) -> sycl::event {
             if (is_row_major) {
                 return mkl_blas::row_major::gemm_batch(
@@ -172,9 +168,10 @@ void standardize_strides_to_nonzero(std::vector<py::ssize_t> &strides,
     // When shape of an array along any particular dimension is 1, the stride
     // along that dimension is undefined. This function standardize the strides
     // by calculating the non-zero value of the strides.
-    std::size_t ndim = strides.size();
-    bool has_zero_stride = std::accumulate(strides.begin(), strides.end(), 1,
-                                           std::multiplies<py::ssize_t>{}) == 0;
+    const std::size_t ndim = strides.size();
+    const bool has_zero_stride =
+        std::accumulate(strides.begin(), strides.end(), 1,
+                        std::multiplies<py::ssize_t>{}) == 0;
 
     if (has_zero_stride) {
         for (std::size_t i = 0; i < ndim - 1; ++i) {
@@ -196,9 +193,9 @@ void standardize_strides_to_zero(std::vector<py::ssize_t> &strides,
     // instead of copying the array into the additional dimension for batch
     // multiplication, we choose to use zero as the stride between different
     // matrices.  Therefore, the same array is used repeatedly.
-    std::size_t ndim = strides.size();
+    const std::size_t ndim = strides.size();
 
-    for (size_t i = 0; i < ndim; ++i) {
+    for (std::size_t i = 0; i < ndim; ++i) {
         if (shape[i] <= 1) {
             strides[i] = 0;
         }
@@ -207,9 +204,9 @@ void standardize_strides_to_zero(std::vector<py::ssize_t> &strides,
 
 std::tuple<sycl::event, sycl::event, bool>
     gemm_batch(sycl::queue &exec_q,
-               dpctl::tensor::usm_ndarray matrixA,
-               dpctl::tensor::usm_ndarray matrixB,
-               dpctl::tensor::usm_ndarray resultC,
+               const dpctl::tensor::usm_ndarray &matrixA,
+               const dpctl::tensor::usm_ndarray &matrixB,
+               const dpctl::tensor::usm_ndarray &resultC,
                const std::vector<sycl::event> &depends = {})
 {
     const int matrixA_nd = matrixA.get_ndim();
@@ -257,7 +254,7 @@ std::tuple<sycl::event, sycl::event, bool>
         throw py::value_error("The number of columns in B must be equal to "
                               "the number of columns in result array.");
     }
-    std::int64_t src_nelems = batch_size * m * n;
+    const std::int64_t src_nelems = batch_size * m * n;
     dpctl::tensor::validation::CheckWritable::throw_if_not_writable(resultC);
     dpctl::tensor::validation::AmpleMemory::throw_if_not_ample(resultC,
                                                                src_nelems);
@@ -274,8 +271,10 @@ std::tuple<sycl::event, sycl::event, bool>
 
     standardize_strides_to_nonzero(a_stride, a_shape);
     standardize_strides_to_nonzero(b_stride, b_shape);
-    bool A_base_is_f_contig = a_stride[1] == 1 && a_stride[2] == a_shape[1];
-    bool B_base_is_f_contig = b_stride[1] == 1 && b_stride[2] == b_shape[1];
+    const bool A_base_is_f_contig =
+        a_stride[1] == 1 && a_stride[2] == a_shape[1];
+    const bool B_base_is_f_contig =
+        b_stride[1] == 1 && b_stride[2] == b_shape[1];
 
     bool is_row_major = true;
     if (A_base_is_f_contig && B_base_is_f_contig) {
@@ -317,17 +316,19 @@ std::tuple<sycl::event, sycl::event, bool>
     }
     const std::int64_t ldc = is_row_major ? n : m;
 
-    int matrixA_typenum = matrixA.get_typenum();
-    int matrixB_typenum = matrixB.get_typenum();
-    int resultC_typenum = resultC.get_typenum();
+    const int matrixA_typenum = matrixA.get_typenum();
+    const int matrixB_typenum = matrixB.get_typenum();
+    const int resultC_typenum = resultC.get_typenum();
 
     if (matrixA_typenum != matrixB_typenum) {
         throw py::value_error("matrixA and matrixB must be of the same type.");
     }
 
     auto array_types = dpctl_td_ns::usm_ndarray_types();
-    int matrixAB_type_id = array_types.typenum_to_lookup_id(matrixA_typenum);
-    int resultC_type_id = array_types.typenum_to_lookup_id(resultC_typenum);
+    const int matrixAB_type_id =
+        array_types.typenum_to_lookup_id(matrixA_typenum);
+    const int resultC_type_id =
+        array_types.typenum_to_lookup_id(resultC_typenum);
 
     gemm_batch_impl_fn_ptr_t gemm_batch_fn =
         gemm_batch_dispatch_table[matrixAB_type_id][resultC_type_id];
@@ -336,8 +337,8 @@ std::tuple<sycl::event, sycl::event, bool>
             "Types of input matrices and result matrix are mismatched.");
     }
 
-    char *a_typeless_ptr = matrixA.get_data();
-    char *b_typeless_ptr = matrixB.get_data();
+    const char *a_typeless_ptr = matrixA.get_data();
+    const char *b_typeless_ptr = matrixB.get_data();
     char *r_typeless_ptr = resultC.get_data();
 
     sycl::event gemm_batch_ev =
@@ -374,7 +375,4 @@ void init_gemm_batch_dispatch_table(void)
         contig;
     contig.populate_dispatch_table(gemm_batch_dispatch_table);
 }
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas
diff --git a/dpnp/backend/extensions/blas/gemv.cpp b/dpnp/backend/extensions/blas/gemv.cpp
index c325299aa030..7104c9023f8a 100644
--- a/dpnp/backend/extensions/blas/gemv.cpp
+++ b/dpnp/backend/extensions/blas/gemv.cpp
@@ -35,13 +35,7 @@
 
 #include "dpnp_utils.hpp"
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
+namespace dpnp::extensions::blas
 {
 namespace mkl_blas = oneapi::mkl::blas;
 namespace py = pybind11;
@@ -51,13 +45,13 @@ typedef sycl::event (*gemv_impl_fn_ptr_t)(sycl::queue &,
                                           oneapi::mkl::transpose,
                                           const std::int64_t,
                                           const std::int64_t,
-                                          char *,
+                                          const char *,
                                           const std::int64_t,
-                                          char *,
+                                          const char *,
                                           const std::int64_t,
                                           char *,
                                           const std::int64_t,
-                                          bool,
+                                          const bool,
                                           const std::vector<sycl::event> &);
 
 static gemv_impl_fn_ptr_t gemv_dispatch_vector[dpctl_td_ns::num_types];
@@ -67,19 +61,19 @@ static sycl::event gemv_impl(sycl::queue &exec_q,
                              oneapi::mkl::transpose transA,
                              const std::int64_t m,
                              const std::int64_t n,
-                             char *matrixA,
+                             const char *matrixA,
                              const std::int64_t lda,
-                             char *vectorX,
+                             const char *vectorX,
                              const std::int64_t incx,
                              char *vectorY,
                              const std::int64_t incy,
-                             bool is_row_major,
+                             const bool is_row_major,
                              const std::vector<sycl::event> &depends)
 {
     type_utils::validate_type_for_device<T>(exec_q);
 
-    T *a = reinterpret_cast<T *>(matrixA);
-    T *x = reinterpret_cast<T *>(vectorX);
+    const T *a = reinterpret_cast<const T *>(matrixA);
+    const T *x = reinterpret_cast<const T *>(vectorX);
     T *y = reinterpret_cast<T *>(vectorY);
 
     std::stringstream error_msg;
@@ -88,9 +82,10 @@ static sycl::event gemv_impl(sycl::queue &exec_q,
     sycl::event gemv_event;
     try {
         auto gemv_func =
-            [&](sycl::queue &q, oneapi::mkl::transpose transA, std::int64_t m,
-                std::int64_t n, T alpha, const T *a, std::int64_t lda,
-                const T *x, std::int64_t incx, T beta, T *y, std::int64_t incy,
+            [&](sycl::queue &q, oneapi::mkl::transpose transA,
+                const std::int64_t m, const std::int64_t n, T alpha, const T *a,
+                const std::int64_t lda, const T *x, const std::int64_t incx,
+                T beta, T *y, const std::int64_t incy,
                 const std::vector<sycl::event> &deps) -> sycl::event {
             if (is_row_major) {
                 return mkl_blas::row_major::gemv(q, transA, m, n, alpha, a, lda,
@@ -141,10 +136,10 @@ static sycl::event gemv_impl(sycl::queue &exec_q,
 
 std::pair<sycl::event, sycl::event>
     gemv(sycl::queue &exec_q,
-         dpctl::tensor::usm_ndarray matrixA,
-         dpctl::tensor::usm_ndarray vectorX,
-         dpctl::tensor::usm_ndarray vectorY,
-         bool transpose,
+         const dpctl::tensor::usm_ndarray &matrixA,
+         const dpctl::tensor::usm_ndarray &vectorX,
+         const dpctl::tensor::usm_ndarray &vectorY,
+         const bool transpose,
          const std::vector<sycl::event> &depends)
 {
     const int matrixA_nd = matrixA.get_ndim();
@@ -173,8 +168,8 @@ std::pair<sycl::event, sycl::event>
             "USM allocations are not compatible with the execution queue.");
     }
 
-    bool is_matrixA_f_contig = matrixA.is_f_contiguous();
-    bool is_matrixA_c_contig = matrixA.is_c_contiguous();
+    const bool is_matrixA_f_contig = matrixA.is_f_contiguous();
+    const bool is_matrixA_c_contig = matrixA.is_c_contiguous();
 
     if (!is_matrixA_f_contig and !is_matrixA_c_contig) {
         throw py::value_error(
@@ -194,7 +189,7 @@ std::pair<sycl::event, sycl::event>
     const std::int64_t lda = is_row_major ? n : m;
 
     oneapi::mkl::transpose transA;
-    size_t src_nelems;
+    std::size_t src_nelems;
     if (transpose) {
         transA = oneapi::mkl::transpose::T;
         src_nelems = n;
@@ -223,9 +218,9 @@ std::pair<sycl::event, sycl::event>
     dpctl::tensor::validation::AmpleMemory::throw_if_not_ample(vectorY,
                                                                src_nelems);
 
-    int matrixA_typenum = matrixA.get_typenum();
-    int vectorX_typenum = vectorX.get_typenum();
-    int vectorY_typenum = vectorY.get_typenum();
+    const int matrixA_typenum = matrixA.get_typenum();
+    const int vectorX_typenum = vectorX.get_typenum();
+    const int vectorY_typenum = vectorY.get_typenum();
 
     if (matrixA_typenum != vectorX_typenum ||
         matrixA_typenum != vectorY_typenum) {
@@ -233,7 +228,7 @@ std::pair<sycl::event, sycl::event>
     }
 
     auto array_types = dpctl_td_ns::usm_ndarray_types();
-    int type_id = array_types.typenum_to_lookup_id(matrixA_typenum);
+    const int type_id = array_types.typenum_to_lookup_id(matrixA_typenum);
 
     gemv_impl_fn_ptr_t gemv_fn = gemv_dispatch_vector[type_id];
     if (gemv_fn == nullptr) {
@@ -245,8 +240,8 @@ std::pair<sycl::event, sycl::event>
     char *x_typeless_ptr = vectorX.get_data();
     char *y_typeless_ptr = vectorY.get_data();
 
-    std::vector<py::ssize_t> x_stride = vectorX.get_strides_vector();
-    std::vector<py::ssize_t> y_stride = vectorY.get_strides_vector();
+    const std::vector<py::ssize_t> x_stride = vectorX.get_strides_vector();
+    const std::vector<py::ssize_t> y_stride = vectorY.get_strides_vector();
     const int x_elemsize = vectorX.get_elemsize();
     const int y_elemsize = vectorY.get_elemsize();
     const std::int64_t incx = x_stride[0];
@@ -289,7 +284,4 @@ void init_gemv_dispatch_vector(void)
         contig;
     contig.populate_dispatch_vector(gemv_dispatch_vector);
 }
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas
diff --git a/dpnp/backend/extensions/blas/gemv.hpp b/dpnp/backend/extensions/blas/gemv.hpp
index 703f9c4cc0a7..bb5aff877481 100644
--- a/dpnp/backend/extensions/blas/gemv.hpp
+++ b/dpnp/backend/extensions/blas/gemv.hpp
@@ -25,38 +25,29 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
+namespace dpnp::extensions::blas
 {
 extern std::pair<sycl::event, sycl::event>
     gemv(sycl::queue &exec_q,
-         dpctl::tensor::usm_ndarray matrixA,
-         dpctl::tensor::usm_ndarray vectorX,
-         dpctl::tensor::usm_ndarray vectorY,
-         bool transpose,
+         const dpctl::tensor::usm_ndarray &matrixA,
+         const dpctl::tensor::usm_ndarray &vectorX,
+         const dpctl::tensor::usm_ndarray &vectorY,
+         const bool transpose,
          const std::vector<sycl::event> &depends);
 
 extern std::pair<sycl::event, sycl::event>
     gemv_batch(sycl::queue &exec_q,
-               dpctl::tensor::usm_ndarray matrixA,
-               dpctl::tensor::usm_ndarray vectorX,
-               dpctl::tensor::usm_ndarray vectorY,
-               bool transpose,
+               const dpctl::tensor::usm_ndarray &matrixA,
+               const dpctl::tensor::usm_ndarray &vectorX,
+               const dpctl::tensor::usm_ndarray &vectorY,
+               const bool transpose,
                const std::vector<sycl::event> &depends);
 
 extern void init_gemv_dispatch_vector(void);
 extern void init_gemv_batch_dispatch_vector(void);
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas
diff --git a/dpnp/backend/extensions/blas/types_matrix.hpp b/dpnp/backend/extensions/blas/types_matrix.hpp
index a33fa42b9718..1d9bf637780e 100644
--- a/dpnp/backend/extensions/blas/types_matrix.hpp
+++ b/dpnp/backend/extensions/blas/types_matrix.hpp
@@ -33,15 +33,7 @@
 // dpctl namespace for operations with types
 namespace dpctl_td_ns = dpctl::tensor::type_dispatch;
 
-namespace dpnp
-{
-namespace backend
-{
-namespace ext
-{
-namespace blas
-{
-namespace types
+namespace dpnp::extensions::blas::types
 {
 /**
  * @brief A factory to define pairs of supported types for which
@@ -190,8 +182,4 @@ struct GemvTypePairSupportFactory
         // fall-through
         dpctl_td_ns::NotDefinedEntry>::is_defined;
 };
-} // namespace types
-} // namespace blas
-} // namespace ext
-} // namespace backend
-} // namespace dpnp
+} // namespace dpnp::extensions::blas::types

From 96a2e4111fe025d66d4a6b795cd6ff5f45de8afd Mon Sep 17 00:00:00 2001
From: vlad-perevezentsev <vladislav.perevezentsev@intel.com>
Date: Mon, 17 Jun 2024 14:34:33 +0200
Subject: [PATCH 29/49] Support usm_ndarray batched input for dpnp.linalg 
 (#1880)

* Add usm_ndarray input support for linalg

* Add test_usm_ndarray_input_batch to test_linalg.py

* Add usm_ndarray input support for dpnp_iface_linearalgebra

* Add test_usm_ndarray_linearalgebra_batch to test_linalg.py

* Apply comments

---------

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 dpnp/dpnp_iface_linearalgebra.py | 12 ++---
 dpnp/linalg/dpnp_iface_linalg.py |  6 +--
 dpnp/linalg/dpnp_utils_linalg.py | 20 ++++----
 tests/test_linalg.py             | 81 ++++++++++++++++++++++++++++++++
 4 files changed, 100 insertions(+), 19 deletions(-)

diff --git a/dpnp/dpnp_iface_linearalgebra.py b/dpnp/dpnp_iface_linearalgebra.py
index f674c96040a2..aef9203746f3 100644
--- a/dpnp/dpnp_iface_linearalgebra.py
+++ b/dpnp/dpnp_iface_linearalgebra.py
@@ -892,13 +892,13 @@ def outer(a, b, out=None):
     dpnp.check_supported_arrays_type(a, b, scalar_type=True, all_scalars=False)
     if dpnp.isscalar(a):
         x1 = a
-        x2 = b.ravel()[None, :]
+        x2 = dpnp.ravel(b)[None, :]
     elif dpnp.isscalar(b):
-        x1 = a.ravel()[:, None]
+        x1 = dpnp.ravel(a)[:, None]
         x2 = b
     else:
-        x1 = a.ravel()
-        x2 = b.ravel()
+        x1 = dpnp.ravel(a)
+        x2 = dpnp.ravel(b)
 
     return dpnp.multiply.outer(x1, x2, out=out)
 
@@ -1056,8 +1056,8 @@ def tensordot(a, b, axes=2):
     newshape_b = (n1, n2)
     oldb = [b_shape[axis] for axis in notin]
 
-    at = a.transpose(newaxes_a).reshape(newshape_a)
-    bt = b.transpose(newaxes_b).reshape(newshape_b)
+    at = dpnp.transpose(a, newaxes_a).reshape(newshape_a)
+    bt = dpnp.transpose(b, newaxes_b).reshape(newshape_b)
     res = dpnp.matmul(at, bt)
 
     return res.reshape(olda + oldb)
diff --git a/dpnp/linalg/dpnp_iface_linalg.py b/dpnp/linalg/dpnp_iface_linalg.py
index 72d79ad329d7..5342daa17584 100644
--- a/dpnp/linalg/dpnp_iface_linalg.py
+++ b/dpnp/linalg/dpnp_iface_linalg.py
@@ -1354,7 +1354,7 @@ def tensorinv(a, ind=2):
     old_shape = a.shape
     inv_shape = old_shape[ind:] + old_shape[:ind]
     prod = numpy.prod(old_shape[ind:])
-    a = a.reshape(prod, -1)
+    a = dpnp.reshape(a, (prod, -1))
     a_inv = inv(a)
 
     return a_inv.reshape(*inv_shape)
@@ -1428,7 +1428,7 @@ def tensorsolve(a, b, axes=None):
             "prod(a.shape[b.ndim:]) == prod(a.shape[:b.ndim])"
         )
 
-    a = a.reshape(-1, prod)
-    b = b.ravel()
+    a = dpnp.reshape(a, (-1, prod))
+    b = dpnp.ravel(b)
     res = solve(a, b)
     return res.reshape(old_shape)
diff --git a/dpnp/linalg/dpnp_utils_linalg.py b/dpnp/linalg/dpnp_utils_linalg.py
index 22aa396c7fed..10e297937ee2 100644
--- a/dpnp/linalg/dpnp_utils_linalg.py
+++ b/dpnp/linalg/dpnp_utils_linalg.py
@@ -99,7 +99,7 @@ def _batched_eigh(a, UPLO, eigen_mode, w_type, v_type):
     is_cpu_device = a.sycl_device.has_aspect_cpu
     orig_shape = a.shape
     # get 3d input array by reshape
-    a = a.reshape(-1, orig_shape[-2], orig_shape[-1])
+    a = dpnp.reshape(a, (-1, orig_shape[-2], orig_shape[-1]))
     a_usm_arr = dpnp.get_usm_ndarray(a)
 
     # allocate a memory for dpnp array of eigenvalues
@@ -191,7 +191,7 @@ def _batched_inv(a, res_type):
 
     orig_shape = a.shape
     # get 3d input arrays by reshape
-    a = a.reshape(-1, orig_shape[-2], orig_shape[-1])
+    a = dpnp.reshape(a, (-1, orig_shape[-2], orig_shape[-1]))
     batch_size = a.shape[0]
     a_usm_arr = dpnp.get_usm_ndarray(a)
     a_sycl_queue = a.sycl_queue
@@ -280,11 +280,11 @@ def _batched_solve(a, b, exec_q, res_usm_type, res_type):
     if a.ndim > 3:
         # get 3d input arrays by reshape
         if a.ndim == b.ndim:
-            b = b.reshape(-1, b_shape[-2], b_shape[-1])
+            b = dpnp.reshape(b, (-1, b_shape[-2], b_shape[-1]))
         else:
-            b = b.reshape(-1, b_shape[-1])
+            b = dpnp.reshape(b, (-1, b_shape[-1]))
 
-        a = a.reshape(-1, a_shape[-2], a_shape[-1])
+        a = dpnp.reshape(a, (-1, a_shape[-2], a_shape[-1]))
 
         a_usm_arr = dpnp.get_usm_ndarray(a)
         b_usm_arr = dpnp.get_usm_ndarray(b)
@@ -386,7 +386,7 @@ def _batched_qr(a, mode="reduced"):
     a_sycl_queue = a.sycl_queue
 
     # get 3d input arrays by reshape
-    a = a.reshape(-1, m, n)
+    a = dpnp.reshape(a, (-1, m, n))
 
     a = a.swapaxes(-2, -1)
     a_usm_arr = dpnp.get_usm_ndarray(a)
@@ -537,7 +537,7 @@ def _batched_svd(
 
     if a.ndim > 3:
         # get 3d input arrays by reshape
-        a = a.reshape(prod(a.shape[:-2]), a.shape[-2], a.shape[-1])
+        a = dpnp.reshape(a, (prod(a.shape[:-2]), a.shape[-2], a.shape[-1]))
         reshape = True
 
     batch_size = a.shape[0]
@@ -830,7 +830,7 @@ def _lu_factor(a, res_type):
     if a.ndim > 2:
         orig_shape = a.shape
         # get 3d input arrays by reshape
-        a = a.reshape(-1, n, n)
+        a = dpnp.reshape(a, (-1, n, n))
         batch_size = a.shape[0]
         a_usm_arr = dpnp.get_usm_ndarray(a)
 
@@ -1743,7 +1743,7 @@ def dpnp_cholesky_batch(a, upper_lower, res_type):
 
     orig_shape = a.shape
     # get 3d input arrays by reshape
-    a = a.reshape(-1, n, n)
+    a = dpnp.reshape(a, (-1, n, n))
     batch_size = a.shape[0]
     a_usm_arr = dpnp.get_usm_ndarray(a)
 
@@ -2171,7 +2171,7 @@ def dpnp_matrix_power(a, n):
     # `result` will hold the final matrix power,
     # while `acc` serves as an accumulator for the intermediate matrix powers.
     result = None
-    acc = a.copy()
+    acc = dpnp.copy(a)
     while n > 0:
         n, bit = divmod(n, 2)
         if bit:
diff --git a/tests/test_linalg.py b/tests/test_linalg.py
index 48a4891034c1..b718a2cec873 100644
--- a/tests/test_linalg.py
+++ b/tests/test_linalg.py
@@ -57,6 +57,87 @@ def vvsort(val, vec, size, xp):
         vec[:, imax] = temp
 
 
+# check linear algebra functions from dpnp.linalg
+# with multidimensional usm_ndarray as input
+@pytest.mark.parametrize(
+    "func, gen_kwargs, func_kwargs",
+    [
+        pytest.param("cholesky", {"hermitian": True}, {}),
+        pytest.param("cond", {}, {}),
+        pytest.param("det", {}, {}),
+        pytest.param("eig", {}, {}),
+        pytest.param("eigh", {"hermitian": True}, {}),
+        pytest.param("eigvals", {}, {}),
+        pytest.param("eigvalsh", {"hermitian": True}, {}),
+        pytest.param("inv", {}, {}),
+        pytest.param("matrix_power", {}, {"n": 4}),
+        pytest.param("matrix_rank", {}, {}),
+        pytest.param("norm", {}, {}),
+        pytest.param("pinv", {}, {}),
+        pytest.param("qr", {}, {}),
+        pytest.param("slogdet", {}, {}),
+        pytest.param("solve", {}, {}),
+        pytest.param("svd", {}, {}),
+        pytest.param("tensorinv", {}, {"ind": 1}),
+        pytest.param("tensorsolve", {}, {}),
+    ],
+)
+def test_usm_ndarray_linalg_batch(func, gen_kwargs, func_kwargs):
+    shape = (
+        (2, 2, 3, 3) if func not in ["tensorinv", "tensorsolve"] else (4, 2, 2)
+    )
+
+    if func == "tensorsolve":
+        shape_b = (4,)
+        dpt_args = [
+            dpt.asarray(
+                generate_random_numpy_array(shape, seed_value=81, **gen_kwargs)
+            ),
+            dpt.asarray(
+                generate_random_numpy_array(
+                    shape_b, seed_value=81, **gen_kwargs
+                )
+            ),
+        ]
+    elif func in ["lstsq", "solve"]:
+        dpt_args = [
+            dpt.asarray(
+                generate_random_numpy_array(shape, seed_value=81, **gen_kwargs)
+            )
+            for _ in range(2)
+        ]
+    else:
+        dpt_args = [
+            dpt.asarray(generate_random_numpy_array(shape, **gen_kwargs))
+        ]
+
+    result = getattr(inp.linalg, func)(*dpt_args, **func_kwargs)
+
+    if isinstance(result, tuple):
+        for res in result:
+            assert isinstance(res, inp.ndarray)
+    else:
+        assert isinstance(result, inp.ndarray)
+
+
+# check linear algebra functions from dpnp
+# with multidimensional usm_ndarray as input
+@pytest.mark.parametrize(
+    "func", ["dot", "inner", "kron", "matmul", "outer", "tensordot", "vdot"]
+)
+def test_usm_ndarray_linearalgebra_batch(func):
+    shape = (2, 2, 2, 2)
+
+    dpt_args = [
+        dpt.asarray(generate_random_numpy_array(shape, seed_value=81))
+        for _ in range(2)
+    ]
+
+    result = getattr(inp, func)(*dpt_args)
+
+    assert isinstance(result, inp.ndarray)
+
+
 class TestCholesky:
     @pytest.mark.parametrize(
         "array",

From edcaaa592eced4c77b22034304d48f2a5aa98510 Mon Sep 17 00:00:00 2001
From: Natalia Polina <natalia.polina@intel.com>
Date: Wed, 19 Jun 2024 03:25:39 -0700
Subject: [PATCH 30/49] Clean up legacy linalg implementation from the backend
 (#1887)

* Clean up legacy linalg implementation from the backend

* fix pre-commit
---
 dpnp/backend/CMakeLists.txt               |   1 -
 dpnp/backend/kernels/dpnp_krnl_common.cpp | 502 ------------
 dpnp/backend/kernels/dpnp_krnl_linalg.cpp | 914 ----------------------
 dpnp/backend/src/dpnp_fptr.hpp            |   1 -
 dpnp/backend/src/dpnp_iface_fptr.cpp      |   1 -
 5 files changed, 1419 deletions(-)
 delete mode 100644 dpnp/backend/kernels/dpnp_krnl_linalg.cpp

diff --git a/dpnp/backend/CMakeLists.txt b/dpnp/backend/CMakeLists.txt
index 2ce0dfd5c04a..f1f5b4477721 100644
--- a/dpnp/backend/CMakeLists.txt
+++ b/dpnp/backend/CMakeLists.txt
@@ -30,7 +30,6 @@ set(DPNP_SRC
     kernels/dpnp_krnl_elemwise.cpp
     kernels/dpnp_krnl_fft.cpp
     kernels/dpnp_krnl_indexing.cpp
-    kernels/dpnp_krnl_linalg.cpp
     kernels/dpnp_krnl_logic.cpp
     kernels/dpnp_krnl_manipulation.cpp
     kernels/dpnp_krnl_mathematical.cpp
diff --git a/dpnp/backend/kernels/dpnp_krnl_common.cpp b/dpnp/backend/kernels/dpnp_krnl_common.cpp
index 423851e4bfd3..b1d864327e64 100644
--- a/dpnp/backend/kernels/dpnp_krnl_common.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_common.cpp
@@ -38,69 +38,6 @@ namespace mkl_blas_cm = oneapi::mkl::blas::column_major;
 namespace mkl_blas_rm = oneapi::mkl::blas::row_major;
 namespace mkl_lapack = oneapi::mkl::lapack;
 
-template <typename _DataType, typename _ResultType>
-class dpnp_astype_c_kernel;
-
-template <typename _DataType, typename _ResultType>
-DPCTLSyclEventRef dpnp_astype_c(DPCTLSyclQueueRef q_ref,
-                                const void *array1_in,
-                                void *result1,
-                                const size_t size,
-                                const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array1_in, size);
-    const _DataType *array_in = input1_ptr.get_ptr();
-    _ResultType *result = reinterpret_cast<_ResultType *>(result1);
-
-    if ((array_in == nullptr) || (result == nullptr)) {
-        return event_ref;
-    }
-
-    if (size == 0) {
-        return event_ref;
-    }
-
-    sycl::range<1> gws(size);
-    auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-        size_t i = global_id[0];
-        result[i] = array_in[i];
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_astype_c_kernel<_DataType, _ResultType>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    event = q.submit(kernel_func);
-
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType, typename _ResultType>
-void dpnp_astype_c(const void *array1_in, void *result1, const size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_astype_c<_DataType, _ResultType>(
-        q_ref, array1_in, result1, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType, typename _ResultType>
-void (*dpnp_astype_default_c)(const void *, void *, const size_t) =
-    dpnp_astype_c<_DataType, _ResultType>;
-
 template <typename _KernelNameSpecialization1,
           typename _KernelNameSpecialization2,
           typename _KernelNameSpecialization3>
@@ -521,200 +458,6 @@ DPCTLSyclEventRef (*dpnp_dot_ext_c)(DPCTLSyclQueueRef,
                                     const DPCTLEventVectorRef) =
     dpnp_dot_c<_DataType_output, _DataType_input1, _DataType_input2>;
 
-template <typename _DataType, typename _ResultType>
-DPCTLSyclEventRef dpnp_eig_c(DPCTLSyclQueueRef q_ref,
-                             const void *array_in,
-                             void *result1,
-                             void *result2,
-                             size_t size,
-                             const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // TODO this kernel works with square 2-D array only
-
-    // Kernel Type for calculation is double type
-    // because interface requires float type but calculations are expected in
-    // double type
-
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array_in, size * size, true);
-    DPNPC_ptr_adapter<_ResultType> result1_ptr(q_ref, result1, size, true,
-                                               true);
-    DPNPC_ptr_adapter<_ResultType> result2_ptr(q_ref, result2, size * size,
-                                               true, true);
-    const _DataType *array = input1_ptr.get_ptr();
-    _ResultType *result_val = result1_ptr.get_ptr();
-    _ResultType *result_vec = result2_ptr.get_ptr();
-
-    double *result_val_kern = reinterpret_cast<double *>(
-        sycl::malloc_shared(size * sizeof(double), q));
-    double *result_vec_kern = reinterpret_cast<double *>(
-        sycl::malloc_shared(size * size * sizeof(double), q));
-
-    // type conversion. Also, math library requires copy memory because override
-    for (size_t it = 0; it < (size * size); ++it) {
-        result_vec_kern[it] =
-            array[it]; // TODO use memcpy_c or input1_ptr(array_in, size, true)
-    }
-
-    const std::int64_t lda = std::max<size_t>(1UL, size);
-
-    const std::int64_t scratchpad_size =
-        mkl_lapack::syevd_scratchpad_size<double>(
-            q, oneapi::mkl::job::vec, oneapi::mkl::uplo::upper, size, lda);
-
-    // https://github.com/IntelPython/dpnp/issues/1005
-    // Test tests/test_linalg.py::test_eig_arange raises 2 issues in dpnp_eig_c
-    // on CPU
-    // 1. Call of mkl_lapack::syevd_scratchpad_size<double> returns wrong value
-    // that causes out of memory issue.
-    // 2. Call of the function oneapi::mkl::lapack::syevd causes segfault.
-    // Example of the command to reproduce the issues:
-    // SYCL_DEVICE_FILTER=cpu pytest
-    // tests/test_linalg.py::test_eig_arange[2-float64] High-level reason of the
-    // issues is numpy is imported before dpnp in third party tests. Low-level
-    // reason of the issues could be related to MKL runtime library loaded
-    // during numpy import.
-
-    double *scratchpad = reinterpret_cast<double *>(
-        sycl::malloc_shared(scratchpad_size * sizeof(double), q));
-
-    event = mkl_lapack::syevd(
-        q,                        // queue
-        oneapi::mkl::job::vec,    // jobz
-        oneapi::mkl::uplo::upper, // uplo
-        size,                     // The order of the matrix A (0 <= n)
-        result_vec_kern,          // will be overwritten with eigenvectors
-        lda, result_val_kern, scratchpad, scratchpad_size);
-    event.wait();
-
-    sycl::free(scratchpad, q);
-
-    for (size_t it1 = 0; it1 < size; ++it1) {
-        result_val[it1] =
-            result_val_kern[it1]; // TODO use memcpy_c or dpnpc_transpose_c
-        for (size_t it2 = 0; it2 < size; ++it2) {
-            // copy + transpose
-            result_vec[it2 * size + it1] = result_vec_kern[it1 * size + it2];
-        }
-    }
-
-    sycl::free(result_val_kern, q);
-    sycl::free(result_vec_kern, q);
-
-    return event_ref;
-}
-
-template <typename _DataType, typename _ResultType>
-void dpnp_eig_c(const void *array_in, void *result1, void *result2, size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_eig_c<_DataType, _ResultType>(
-        q_ref, array_in, result1, result2, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType, typename _ResultType>
-void (*dpnp_eig_default_c)(const void *, void *, void *, size_t) =
-    dpnp_eig_c<_DataType, _ResultType>;
-
-template <typename _DataType, typename _ResultType>
-DPCTLSyclEventRef dpnp_eigvals_c(DPCTLSyclQueueRef q_ref,
-                                 const void *array_in,
-                                 void *result1,
-                                 size_t size,
-                                 const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // TODO this kernel works with square 2-D array only
-
-    // Kernel Type for calculation is double type
-    // because interface requires float type but calculations are expected in
-    // double type
-
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array_in, size * size, true);
-    DPNPC_ptr_adapter<_ResultType> result1_ptr(q_ref, result1, size, true,
-                                               true);
-    const _DataType *array = input1_ptr.get_ptr();
-    _ResultType *result_val = result1_ptr.get_ptr();
-
-    double *result_val_kern = reinterpret_cast<double *>(
-        sycl::malloc_shared(size * sizeof(double), q));
-    double *result_vec_kern = reinterpret_cast<double *>(
-        sycl::malloc_shared(size * size * sizeof(double), q));
-
-    // type conversion. Also, math library requires copy memory because override
-    for (size_t it = 0; it < (size * size); ++it) {
-        result_vec_kern[it] = array[it]; // TODO same as previous kernel
-    }
-
-    const std::int64_t lda = std::max<size_t>(1UL, size);
-
-    const std::int64_t scratchpad_size =
-        mkl_lapack::syevd_scratchpad_size<double>(
-            q, oneapi::mkl::job::vec, oneapi::mkl::uplo::upper, size, lda);
-
-    double *scratchpad = reinterpret_cast<double *>(
-        sycl::malloc_shared(scratchpad_size * sizeof(double), q));
-
-    event = mkl_lapack::syevd(q,                        // queue
-                              oneapi::mkl::job::vec,    // jobz
-                              oneapi::mkl::uplo::upper, // uplo
-                              size, // The order of the matrix A (0 <= n)
-                              result_vec_kern, lda, result_val_kern, scratchpad,
-                              scratchpad_size);
-    event.wait();
-
-    sycl::free(scratchpad, q);
-
-    for (size_t it1 = 0; it1 < size; ++it1) {
-        result_val[it1] = result_val_kern[it1];
-    }
-
-    sycl::free(result_val_kern, q);
-
-    return event_ref;
-}
-
-template <typename _DataType, typename _ResultType>
-void dpnp_eigvals_c(const void *array_in, void *result1, size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_eigvals_c<_DataType, _ResultType>(
-        q_ref, array_in, result1, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType, typename _ResultType>
-void (*dpnp_eigvals_default_c)(const void *,
-                               void *,
-                               size_t) = dpnp_eigvals_c<_DataType, _ResultType>;
-
 template <typename _DataType>
 class dpnp_initval_c_kernel;
 
@@ -769,226 +512,8 @@ DPCTLSyclEventRef (*dpnp_initval_ext_c)(DPCTLSyclQueueRef,
                                         const DPCTLEventVectorRef) =
     dpnp_initval_c<_DataType>;
 
-template <typename _KernelNameSpecialization>
-class dpnp_matmul_c_kernel;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_matmul_c(DPCTLSyclQueueRef q_ref,
-                                void *result_out,
-                                const size_t result_size,
-                                const size_t result_ndim,
-                                const shape_elem_type *result_shape,
-                                const shape_elem_type *result_strides,
-                                const void *input1_in,
-                                const size_t input1_size,
-                                const size_t input1_ndim,
-                                const shape_elem_type *input1_shape,
-                                const shape_elem_type *input1_strides,
-                                const void *input2_in,
-                                const size_t input2_size,
-                                const size_t input2_ndim,
-                                const shape_elem_type *input2_shape,
-                                const shape_elem_type *input2_strides,
-                                const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    (void)result_size;
-    (void)result_ndim;
-    (void)result_shape;
-    (void)result_strides;
-    (void)input1_size;
-    (void)input1_ndim;
-    (void)input1_strides;
-    (void)input2_size;
-    (void)input2_ndim;
-    (void)input2_strides;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    size_t size_m = input1_shape[0];
-    size_t size_n = input2_shape[1];
-    size_t size_k = input1_shape[1];
-
-    if (!size_m || !size_n || !size_k) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    std::vector<sycl::event> dep_events = cast_event_vector(dep_event_vec_ref);
-    sycl::event event;
-
-    _DataType *array_1 =
-        reinterpret_cast<_DataType *>(const_cast<void *>(input1_in));
-    _DataType *array_2 =
-        reinterpret_cast<_DataType *>(const_cast<void *>(input2_in));
-    _DataType *result = reinterpret_cast<_DataType *>(result_out);
-
-    if constexpr (std::is_same<_DataType, double>::value ||
-                  std::is_same<_DataType, float>::value)
-    {
-        // using std::max for these ldx variables is required by math library
-        const std::int64_t ld_array_2 =
-            std::max<size_t>(1UL, size_n); // First dimensions of array_2
-        const std::int64_t ld_array_1 =
-            std::max<size_t>(1UL, size_k); // First dimensions of array_1
-        const std::int64_t ld_result =
-            std::max<size_t>(1UL, size_n); // Fast dimensions of result
-
-        event = mkl_blas::gemm(q, oneapi::mkl::transpose::nontrans,
-                               oneapi::mkl::transpose::nontrans, size_n, size_m,
-                               size_k, _DataType(1), array_2, ld_array_2,
-                               array_1, ld_array_1, _DataType(0), result,
-                               ld_result, dep_events);
-    }
-    else {
-        // input1: M x K
-        // input2: K x N
-        // result: M x N
-        const size_t dim_m =
-            size_m; // shape1.front(); // First dimensions of array1
-        const size_t dim_n =
-            size_n; // shape2.back();  // Last dimensions of array2
-        const size_t dim_k =
-            size_k; // shape1.back(); // First dimensions of array2
-
-        sycl::range<2> gws(dim_m, dim_n); // dimensions are: "i" and "j"
-
-        auto kernel_parallel_for_func = [=](sycl::id<2> global_id) {
-            size_t i = global_id[0]; // for (size_t i = 0; i < size; ++i)
-            {
-                size_t j = global_id[1]; // for (size_t j = 0; j < size; ++j)
-                {
-                    _DataType acc = _DataType(0);
-                    for (size_t k = 0; k < dim_k; ++k) {
-                        const size_t index_1 = i * dim_k + k;
-                        const size_t index_2 = k * dim_n + j;
-                        acc += array_1[index_1] * array_2[index_2];
-                    }
-                    const size_t index_result = i * dim_n + j;
-                    result[index_result] = acc;
-                }
-            }
-        };
-
-        auto kernel_func = [&](sycl::handler &cgh) {
-            cgh.depends_on(dep_events);
-            cgh.parallel_for<class dpnp_matmul_c_kernel<_DataType>>(
-                gws, kernel_parallel_for_func);
-        };
-
-        event = q.submit(kernel_func);
-    }
-
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_matmul_c(void *result_out,
-                   const size_t result_size,
-                   const size_t result_ndim,
-                   const shape_elem_type *result_shape,
-                   const shape_elem_type *result_strides,
-                   const void *input1_in,
-                   const size_t input1_size,
-                   const size_t input1_ndim,
-                   const shape_elem_type *input1_shape,
-                   const shape_elem_type *input1_strides,
-                   const void *input2_in,
-                   const size_t input2_size,
-                   const size_t input2_ndim,
-                   const shape_elem_type *input2_shape,
-                   const shape_elem_type *input2_strides)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_matmul_c<_DataType>(
-        q_ref, result_out, result_size, result_ndim, result_shape,
-        result_strides, input1_in, input1_size, input1_ndim, input1_shape,
-        input1_strides, input2_in, input2_size, input2_ndim, input2_shape,
-        input2_strides, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_matmul_default_c)(void *,
-                              const size_t,
-                              const size_t,
-                              const shape_elem_type *,
-                              const shape_elem_type *,
-                              const void *,
-                              const size_t,
-                              const size_t,
-                              const shape_elem_type *,
-                              const shape_elem_type *,
-                              const void *,
-                              const size_t,
-                              const size_t,
-                              const shape_elem_type *,
-                              const shape_elem_type *) =
-    dpnp_matmul_c<_DataType>;
-
 void func_map_init_linalg(func_map_t &fmap)
 {
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_BLN][eft_BLN] = {
-        eft_BLN, (void *)dpnp_astype_default_c<bool, bool>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_BLN][eft_INT] = {
-        eft_INT, (void *)dpnp_astype_default_c<bool, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_BLN][eft_LNG] = {
-        eft_LNG, (void *)dpnp_astype_default_c<bool, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_BLN][eft_FLT] = {
-        eft_FLT, (void *)dpnp_astype_default_c<bool, float>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_BLN][eft_DBL] = {
-        eft_DBL, (void *)dpnp_astype_default_c<bool, double>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_INT][eft_BLN] = {
-        eft_BLN, (void *)dpnp_astype_default_c<int32_t, bool>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_astype_default_c<int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_astype_default_c<int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_INT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_astype_default_c<int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_astype_default_c<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_LNG][eft_BLN] = {
-        eft_BLN, (void *)dpnp_astype_default_c<int64_t, bool>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_LNG][eft_INT] = {
-        eft_INT, (void *)dpnp_astype_default_c<int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_astype_default_c<int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_LNG][eft_FLT] = {
-        eft_FLT, (void *)dpnp_astype_default_c<int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_astype_default_c<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_FLT][eft_BLN] = {
-        eft_BLN, (void *)dpnp_astype_default_c<float, bool>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_FLT][eft_INT] = {
-        eft_INT, (void *)dpnp_astype_default_c<float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_FLT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_astype_default_c<float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_astype_default_c<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_astype_default_c<float, double>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_DBL][eft_BLN] = {
-        eft_BLN, (void *)dpnp_astype_default_c<double, bool>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_DBL][eft_INT] = {
-        eft_INT, (void *)dpnp_astype_default_c<double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_DBL][eft_LNG] = {
-        eft_LNG, (void *)dpnp_astype_default_c<double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_DBL][eft_FLT] = {
-        eft_FLT, (void *)dpnp_astype_default_c<double, float>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_astype_default_c<double, double>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_C64][eft_C64] = {
-        eft_C64,
-        (void *)
-            dpnp_astype_default_c<std::complex<float>, std::complex<float>>};
-    fmap[DPNPFuncName::DPNP_FN_ASTYPE][eft_C128][eft_C128] = {
-        eft_C128,
-        (void *)
-            dpnp_astype_default_c<std::complex<double>, std::complex<double>>};
 
     fmap[DPNPFuncName::DPNP_FN_DOT][eft_INT][eft_INT] = {
         eft_INT, (void *)dpnp_dot_default_c<int32_t, int32_t, int32_t>};
@@ -1057,24 +582,6 @@ void func_map_init_linalg(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_DOT_EXT][eft_DBL][eft_DBL] = {
         eft_DBL, (void *)dpnp_dot_ext_c<double, double, double>};
 
-    fmap[DPNPFuncName::DPNP_FN_EIG][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_eig_default_c<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EIG][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_eig_default_c<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EIG][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_eig_default_c<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_EIG][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_eig_default_c<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_EIGVALS][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_eigvals_default_c<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EIGVALS][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_eigvals_default_c<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EIGVALS][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_eigvals_default_c<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_EIGVALS][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_eigvals_default_c<double, double>};
-
     fmap[DPNPFuncName::DPNP_FN_INITVAL][eft_BLN][eft_BLN] = {
         eft_BLN, (void *)dpnp_initval_default_c<bool>};
     fmap[DPNPFuncName::DPNP_FN_INITVAL][eft_INT][eft_INT] = {
@@ -1103,14 +610,5 @@ void func_map_init_linalg(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_INITVAL_EXT][eft_C128][eft_C128] = {
         eft_C128, (void *)dpnp_initval_ext_c<std::complex<double>>};
 
-    fmap[DPNPFuncName::DPNP_FN_MATMUL][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_matmul_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MATMUL][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_matmul_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MATMUL][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_matmul_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_MATMUL][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_matmul_default_c<double>};
-
     return;
 }
diff --git a/dpnp/backend/kernels/dpnp_krnl_linalg.cpp b/dpnp/backend/kernels/dpnp_krnl_linalg.cpp
deleted file mode 100644
index 1dc2783d48cc..000000000000
--- a/dpnp/backend/kernels/dpnp_krnl_linalg.cpp
+++ /dev/null
@@ -1,914 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2016-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-#include <iostream>
-#include <list>
-
-#include "dpnp_fptr.hpp"
-#include "dpnp_utils.hpp"
-#include "dpnpc_memory_adapter.hpp"
-#include "queue_sycl.hpp"
-#include <dpnp_iface.hpp>
-
-namespace mkl_blas = oneapi::mkl::blas::row_major;
-namespace mkl_lapack = oneapi::mkl::lapack;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_cholesky_c(DPCTLSyclQueueRef q_ref,
-                                  void *array1_in,
-                                  void *result1,
-                                  const size_t size,
-                                  const size_t data_size,
-                                  const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-    if (!data_size) {
-        return event_ref;
-    }
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array1_in, size, true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, result1, size, true, true);
-    _DataType *in_array = input1_ptr.get_ptr();
-    _DataType *result = result_ptr.get_ptr();
-
-    size_t iters = size / (data_size * data_size);
-
-    // math lib func overrides input
-    _DataType *in_a = reinterpret_cast<_DataType *>(
-        sycl::malloc_shared(data_size * data_size * sizeof(_DataType), q));
-
-    for (size_t k = 0; k < iters; ++k) {
-        for (size_t it = 0; it < data_size * data_size; ++it) {
-            in_a[it] = in_array[k * (data_size * data_size) + it];
-        }
-
-        const std::int64_t n = data_size;
-
-        const std::int64_t lda = std::max<size_t>(1UL, n);
-
-        const std::int64_t scratchpad_size =
-            mkl_lapack::potrf_scratchpad_size<_DataType>(
-                q, oneapi::mkl::uplo::upper, n, lda);
-
-        _DataType *scratchpad = reinterpret_cast<_DataType *>(
-            sycl::malloc_shared(scratchpad_size * sizeof(_DataType), q));
-
-        event = mkl_lapack::potrf(q, oneapi::mkl::uplo::upper, n, in_a, lda,
-                                  scratchpad, scratchpad_size);
-
-        event.wait();
-
-        for (size_t i = 0; i < data_size; i++) {
-            bool arg = false;
-            for (size_t j = 0; j < data_size; j++) {
-                if (i == j - 1) {
-                    arg = true;
-                }
-                if (arg) {
-                    in_a[i * data_size + j] = 0;
-                }
-            }
-        }
-
-        sycl::free(scratchpad, q);
-
-        for (size_t t = 0; t < data_size * data_size; ++t) {
-            result[k * (data_size * data_size) + t] = in_a[t];
-        }
-    }
-
-    sycl::free(in_a, q);
-
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_cholesky_c(void *array1_in,
-                     void *result1,
-                     const size_t size,
-                     const size_t data_size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_cholesky_c<_DataType>(
-        q_ref, array1_in, result1, size, data_size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_cholesky_default_c)(void *, void *, const size_t, const size_t) =
-    dpnp_cholesky_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_det_c(DPCTLSyclQueueRef q_ref,
-                             void *array1_in,
-                             void *result1,
-                             shape_elem_type *shape,
-                             size_t ndim,
-                             const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    const size_t input_size = std::accumulate(
-        shape, shape + ndim, 1, std::multiplies<shape_elem_type>());
-    if (!input_size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    size_t n = shape[ndim - 1];
-    size_t size_out = 1;
-    if (ndim != 2) {
-        for (size_t i = 0; i < ndim - 2; i++) {
-            size_out *= shape[i];
-        }
-    }
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array1_in, input_size, true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, result1, size_out, true,
-                                            true);
-    _DataType *array_1 = input1_ptr.get_ptr();
-    _DataType *result = result_ptr.get_ptr();
-
-    _DataType *matrix = new _DataType[n * n];
-    _DataType *elems = new _DataType[n * n];
-
-    for (size_t i = 0; i < size_out; i++) {
-        if (size_out > 1) {
-            for (size_t j = i * n * n; j < (i + 1) * n * n; j++) {
-                elems[j - i * n * n] = array_1[j];
-            }
-
-            for (size_t j = 0; j < n; j++) {
-                for (size_t k = 0; k < n; k++) {
-                    matrix[j * n + k] = elems[j * n + k];
-                }
-            }
-        }
-        else {
-            for (size_t j = 0; j < n; j++) {
-                for (size_t k = 0; k < n; k++) {
-                    matrix[j * n + k] = array_1[j * n + k];
-                }
-            }
-        }
-
-        _DataType det_val = 1;
-        for (size_t l = 0; l < n; l++) {
-            if (matrix[l * n + l] == 0) {
-                for (size_t j = l; j < n; j++) {
-                    if (matrix[j * n + l] != 0) {
-                        for (size_t k = l; k < n; k++) {
-                            _DataType c = matrix[l * n + k];
-                            matrix[l * n + k] = -1 * matrix[j * n + k];
-                            matrix[j * n + k] = c;
-                        }
-                        break;
-                    }
-                    if (j == n - 1 and matrix[j * n + l] == 0) {
-                        det_val = 0;
-                    }
-                }
-            }
-            if (det_val != 0) {
-                for (size_t j = l + 1; j < n; j++) {
-                    _DataType quotient =
-                        -(matrix[j * n + l] / matrix[l * n + l]);
-                    for (size_t k = l + 1; k < n; k++) {
-                        matrix[j * n + k] += quotient * matrix[l * n + k];
-                    }
-                }
-            }
-        }
-
-        if (det_val != 0) {
-            for (size_t l = 0; l < n; l++) {
-                det_val *= matrix[l * n + l];
-            }
-        }
-
-        result[i] = det_val;
-    }
-
-    delete[] elems;
-    delete[] matrix;
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_det_c(void *array1_in,
-                void *result1,
-                shape_elem_type *shape,
-                size_t ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_det_c<_DataType>(
-        q_ref, array1_in, result1, shape, ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_det_default_c)(void *, void *, shape_elem_type *, size_t) =
-    dpnp_det_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef (*dpnp_det_ext_c)(DPCTLSyclQueueRef,
-                                    void *,
-                                    void *,
-                                    shape_elem_type *,
-                                    size_t,
-                                    const DPCTLEventVectorRef) =
-    dpnp_det_c<_DataType>;
-
-template <typename _DataType, typename _ResultType>
-DPCTLSyclEventRef dpnp_inv_c(DPCTLSyclQueueRef q_ref,
-                             void *array1_in,
-                             void *result1,
-                             shape_elem_type *shape,
-                             size_t ndim,
-                             const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)ndim;
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    const size_t input_size = std::accumulate(
-        shape, shape + ndim, 1, std::multiplies<shape_elem_type>());
-    if (!input_size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array1_in, input_size, true);
-    DPNPC_ptr_adapter<_ResultType> result_ptr(q_ref, result1, input_size, true,
-                                              true);
-
-    _DataType *array_1 = input1_ptr.get_ptr();
-    _ResultType *result = result_ptr.get_ptr();
-
-    size_t n = shape[0];
-
-    _ResultType *a_arr = new _ResultType[n * n];
-    _ResultType *e_arr = new _ResultType[n * n];
-
-    for (size_t i = 0; i < n; ++i) {
-        for (size_t j = 0; j < n; ++j) {
-            a_arr[i * n + j] = array_1[i * n + j];
-            if (i == j) {
-                e_arr[i * n + j] = 1;
-            }
-            else {
-                e_arr[i * n + j] = 0;
-            }
-        }
-    }
-
-    for (size_t k = 0; k < n; ++k) {
-        if (a_arr[k * n + k] == 0) {
-            for (size_t i = k; i < n; ++i) {
-                if (a_arr[i * n + k] != 0) {
-                    for (size_t j = 0; j < n; ++j) {
-                        float c = a_arr[k * n + j];
-                        a_arr[k * n + j] = a_arr[i * n + j];
-                        a_arr[i * n + j] = c;
-                        float c_e = e_arr[k * n + j];
-                        e_arr[k * n + j] = e_arr[i * n + j];
-                        e_arr[i * n + j] = c_e;
-                    }
-                    break;
-                }
-            }
-        }
-
-        float temp = a_arr[k * n + k];
-
-        for (size_t j = 0; j < n; ++j) {
-            a_arr[k * n + j] = a_arr[k * n + j] / temp;
-            e_arr[k * n + j] = e_arr[k * n + j] / temp;
-        }
-
-        for (size_t i = k + 1; i < n; ++i) {
-            temp = a_arr[i * n + k];
-            for (size_t j = 0; j < n; j++) {
-                a_arr[i * n + j] = a_arr[i * n + j] - a_arr[k * n + j] * temp;
-                e_arr[i * n + j] = e_arr[i * n + j] - e_arr[k * n + j] * temp;
-            }
-        }
-    }
-
-    for (size_t k = 0; k < n - 1; ++k) {
-        size_t ind_k = n - 1 - k;
-        for (size_t i = 0; i < ind_k; ++i) {
-            size_t ind_i = ind_k - 1 - i;
-
-            float temp = a_arr[ind_i * n + ind_k];
-            for (size_t j = 0; j < n; ++j) {
-                a_arr[ind_i * n + j] =
-                    a_arr[ind_i * n + j] - a_arr[ind_k * n + j] * temp;
-                e_arr[ind_i * n + j] =
-                    e_arr[ind_i * n + j] - e_arr[ind_k * n + j] * temp;
-            }
-        }
-    }
-
-    for (size_t i = 0; i < n; ++i) {
-        for (size_t j = 0; j < n; ++j) {
-            result[i * n + j] = e_arr[i * n + j];
-        }
-    }
-
-    delete[] a_arr;
-    delete[] e_arr;
-    return event_ref;
-}
-
-template <typename _DataType, typename _ResultType>
-void dpnp_inv_c(void *array1_in,
-                void *result1,
-                shape_elem_type *shape,
-                size_t ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_inv_c<_DataType, _ResultType>(
-        q_ref, array1_in, result1, shape, ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType, typename _ResultType>
-void (*dpnp_inv_default_c)(void *, void *, shape_elem_type *, size_t) =
-    dpnp_inv_c<_DataType, _ResultType>;
-
-template <typename _DataType1, typename _DataType2, typename _ResultType>
-class dpnp_kron_c_kernel;
-
-template <typename _DataType1, typename _DataType2, typename _ResultType>
-DPCTLSyclEventRef dpnp_kron_c(DPCTLSyclQueueRef q_ref,
-                              void *array1_in,
-                              void *array2_in,
-                              void *result1,
-                              shape_elem_type *in1_shape,
-                              shape_elem_type *in2_shape,
-                              shape_elem_type *res_shape,
-                              size_t ndim,
-                              const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    const size_t input1_size = std::accumulate(
-        in1_shape, in1_shape + ndim, 1, std::multiplies<shape_elem_type>());
-    const size_t input2_size = std::accumulate(
-        in2_shape, in2_shape + ndim, 1, std::multiplies<shape_elem_type>());
-    const size_t result_size = std::accumulate(
-        res_shape, res_shape + ndim, 1, std::multiplies<shape_elem_type>());
-    if (!(result_size && input1_size && input2_size)) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType1> input1_ptr(q_ref, array1_in, input1_size);
-    DPNPC_ptr_adapter<_DataType2> input2_ptr(q_ref, array2_in, input2_size);
-    DPNPC_ptr_adapter<_ResultType> result_ptr(q_ref, result1, result_size);
-
-    _DataType1 *array1 = input1_ptr.get_ptr();
-    _DataType2 *array2 = input2_ptr.get_ptr();
-    _ResultType *result = result_ptr.get_ptr();
-
-    shape_elem_type *_in1_shape = reinterpret_cast<shape_elem_type *>(
-        sycl::malloc_shared(ndim * sizeof(shape_elem_type), q));
-    shape_elem_type *_in2_shape = reinterpret_cast<shape_elem_type *>(
-        sycl::malloc_shared(ndim * sizeof(shape_elem_type), q));
-
-    q.memcpy(_in1_shape, in1_shape, ndim * sizeof(shape_elem_type)).wait();
-    q.memcpy(_in2_shape, in2_shape, ndim * sizeof(shape_elem_type)).wait();
-
-    shape_elem_type *in1_offsets = reinterpret_cast<shape_elem_type *>(
-        sycl::malloc_shared(ndim * sizeof(shape_elem_type), q));
-    shape_elem_type *in2_offsets = reinterpret_cast<shape_elem_type *>(
-        sycl::malloc_shared(ndim * sizeof(shape_elem_type), q));
-    shape_elem_type *res_offsets = reinterpret_cast<shape_elem_type *>(
-        sycl::malloc_shared(ndim * sizeof(shape_elem_type), q));
-
-    get_shape_offsets_inkernel(in1_shape, ndim, in1_offsets);
-    get_shape_offsets_inkernel(in2_shape, ndim, in2_offsets);
-    get_shape_offsets_inkernel(res_shape, ndim, res_offsets);
-
-    sycl::range<1> gws(result_size);
-    auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-        const size_t idx = global_id[0];
-
-        size_t idx1 = 0;
-        size_t idx2 = 0;
-        size_t reminder = idx;
-        for (size_t axis = 0; axis < ndim; ++axis) {
-            const size_t res_axis = reminder / res_offsets[axis];
-            reminder = reminder - res_axis * res_offsets[axis];
-
-            const size_t in1_axis = res_axis / _in2_shape[axis];
-            const size_t in2_axis = res_axis - in1_axis * _in2_shape[axis];
-
-            idx1 += in1_axis * in1_offsets[axis];
-            idx2 += in2_axis * in2_offsets[axis];
-        }
-
-        result[idx] = array1[idx1] * array2[idx2];
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<
-            class dpnp_kron_c_kernel<_DataType1, _DataType2, _ResultType>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    sycl::event event = q.submit(kernel_func);
-
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType1, typename _DataType2, typename _ResultType>
-void dpnp_kron_c(void *array1_in,
-                 void *array2_in,
-                 void *result1,
-                 shape_elem_type *in1_shape,
-                 shape_elem_type *in2_shape,
-                 shape_elem_type *res_shape,
-                 size_t ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_kron_c<_DataType1, _DataType2, _ResultType>(
-            q_ref, array1_in, array2_in, result1, in1_shape, in2_shape,
-            res_shape, ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType1, typename _DataType2, typename _ResultType>
-void (*dpnp_kron_default_c)(void *,
-                            void *,
-                            void *,
-                            shape_elem_type *,
-                            shape_elem_type *,
-                            shape_elem_type *,
-                            size_t) =
-    dpnp_kron_c<_DataType1, _DataType2, _ResultType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef
-    dpnp_matrix_rank_c(DPCTLSyclQueueRef q_ref,
-                       void *array1_in,
-                       void *result1,
-                       shape_elem_type *shape,
-                       size_t ndim,
-                       const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    const size_t input_size = std::accumulate(
-        shape, shape + ndim, 1, std::multiplies<shape_elem_type>());
-    if (!input_size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array1_in, input_size, true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, result1, 1, true, true);
-    _DataType *array_1 = input1_ptr.get_ptr();
-    _DataType *result = result_ptr.get_ptr();
-
-    shape_elem_type elems = 1;
-    if (ndim > 1) {
-        elems = shape[0];
-        for (size_t i = 1; i < ndim; i++) {
-            if (shape[i] < elems) {
-                elems = shape[i];
-            }
-        }
-    }
-
-    _DataType acc = 0;
-    for (size_t i = 0; i < static_cast<size_t>(elems); i++) {
-        size_t ind = 0;
-        for (size_t j = 0; j < ndim; j++) {
-            ind += (shape[j] - 1) * i;
-        }
-        acc += array_1[ind];
-    }
-    result[0] = acc;
-
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_matrix_rank_c(void *array1_in,
-                        void *result1,
-                        shape_elem_type *shape,
-                        size_t ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_matrix_rank_c<_DataType>(
-        q_ref, array1_in, result1, shape, ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_matrix_rank_default_c)(void *, void *, shape_elem_type *, size_t) =
-    dpnp_matrix_rank_c<_DataType>;
-
-template <typename _InputDT, typename _ComputeDT>
-DPCTLSyclEventRef dpnp_qr_c(DPCTLSyclQueueRef q_ref,
-                            void *array1_in,
-                            void *result1,
-                            void *result2,
-                            void *result3,
-                            size_t size_m,
-                            size_t size_n,
-                            const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-    if (!size_m || !size_n) {
-        return event_ref;
-    }
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_InputDT> input1_ptr(q_ref, array1_in, size_m * size_n,
-                                           true);
-    _InputDT *in_array = input1_ptr.get_ptr();
-
-    // math lib func overrides input
-    _ComputeDT *in_a = reinterpret_cast<_ComputeDT *>(
-        sycl::malloc_shared(size_m * size_n * sizeof(_ComputeDT), q));
-
-    for (size_t i = 0; i < size_m; ++i) {
-        for (size_t j = 0; j < size_n; ++j) {
-            // TODO transpose? use dpnp_transpose_c()
-            in_a[j * size_m + i] = in_array[i * size_n + j];
-        }
-    }
-
-    const size_t min_size_m_n = std::min<size_t>(size_m, size_n);
-    DPNPC_ptr_adapter<_ComputeDT> result1_ptr(
-        q_ref, result1, size_m * min_size_m_n, true, true);
-    DPNPC_ptr_adapter<_ComputeDT> result2_ptr(
-        q_ref, result2, min_size_m_n * size_n, true, true);
-    DPNPC_ptr_adapter<_ComputeDT> result3_ptr(q_ref, result3, min_size_m_n,
-                                              true, true);
-    _ComputeDT *res_q = result1_ptr.get_ptr();
-    _ComputeDT *res_r = result2_ptr.get_ptr();
-    _ComputeDT *tau = result3_ptr.get_ptr();
-
-    const std::int64_t lda = size_m;
-
-    const std::int64_t geqrf_scratchpad_size =
-        mkl_lapack::geqrf_scratchpad_size<_ComputeDT>(q, size_m, size_n, lda);
-
-    _ComputeDT *geqrf_scratchpad = reinterpret_cast<_ComputeDT *>(
-        sycl::malloc_shared(geqrf_scratchpad_size * sizeof(_ComputeDT), q));
-
-    std::vector<sycl::event> depends(1);
-    set_barrier_event(q, depends);
-
-    event = mkl_lapack::geqrf(q, size_m, size_n, in_a, lda, tau,
-                              geqrf_scratchpad, geqrf_scratchpad_size, depends);
-    event.wait();
-
-    if (!depends.empty()) {
-        verbose_print("oneapi::mkl::lapack::geqrf", depends.front(), event);
-    }
-
-    sycl::free(geqrf_scratchpad, q);
-
-    // R
-    size_t mrefl = min_size_m_n;
-    for (size_t i = 0; i < mrefl; ++i) {
-        for (size_t j = 0; j < size_n; ++j) {
-            if (j >= i) {
-                res_r[i * size_n + j] = in_a[j * size_m + i];
-            }
-            else {
-                res_r[i * size_n + j] = _ComputeDT(0);
-            }
-        }
-    }
-
-    // Q
-    const size_t nrefl = min_size_m_n;
-    const std::int64_t orgqr_scratchpad_size =
-        mkl_lapack::orgqr_scratchpad_size<_ComputeDT>(q, size_m, nrefl, nrefl,
-                                                      lda);
-
-    _ComputeDT *orgqr_scratchpad = reinterpret_cast<_ComputeDT *>(
-        sycl::malloc_shared(orgqr_scratchpad_size * sizeof(_ComputeDT), q));
-
-    set_barrier_event(q, depends);
-
-    event = mkl_lapack::orgqr(q, size_m, nrefl, nrefl, in_a, lda, tau,
-                              orgqr_scratchpad, orgqr_scratchpad_size, depends);
-    event.wait();
-
-    if (!depends.empty()) {
-        verbose_print("oneapi::mkl::lapack::orgqr", depends.front(), event);
-    }
-
-    sycl::free(orgqr_scratchpad, q);
-
-    for (size_t i = 0; i < size_m; ++i) {
-        for (size_t j = 0; j < nrefl; ++j) {
-            res_q[i * nrefl + j] = in_a[j * size_m + i];
-        }
-    }
-
-    sycl::free(in_a, q);
-
-    return event_ref;
-}
-
-template <typename _InputDT, typename _ComputeDT>
-void dpnp_qr_c(void *array1_in,
-               void *result1,
-               void *result2,
-               void *result3,
-               size_t size_m,
-               size_t size_n)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_qr_c<_InputDT, _ComputeDT>(
-        q_ref, array1_in, result1, result2, result3, size_m, size_n,
-        dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _InputDT, typename _ComputeDT>
-void (*dpnp_qr_default_c)(void *, void *, void *, void *, size_t, size_t) =
-    dpnp_qr_c<_InputDT, _ComputeDT>;
-
-template <typename _InputDT, typename _ComputeDT, typename _SVDT>
-DPCTLSyclEventRef dpnp_svd_c(DPCTLSyclQueueRef q_ref,
-                             void *array1_in,
-                             void *result1,
-                             void *result2,
-                             void *result3,
-                             size_t size_m,
-                             size_t size_n,
-                             const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_InputDT> input1_ptr(
-        q_ref, array1_in, size_m * size_n,
-        true); // TODO no need this if use dpnp_copy_to()
-    _InputDT *in_array = input1_ptr.get_ptr();
-
-    // math lib gesvd func overrides input
-    _ComputeDT *in_a = reinterpret_cast<_ComputeDT *>(
-        sycl::malloc_shared(size_m * size_n * sizeof(_ComputeDT), q));
-    for (size_t it = 0; it < size_m * size_n; ++it) {
-        in_a[it] = in_array[it]; // TODO Type conversion. memcpy can not be used
-                                 // directly. dpnp_copy_to() ?
-    }
-
-    DPNPC_ptr_adapter<_ComputeDT> result1_ptr(q_ref, result1, size_m * size_m,
-                                              true, true);
-    DPNPC_ptr_adapter<_SVDT> result2_ptr(q_ref, result2,
-                                         std::min(size_m, size_n), true, true);
-    DPNPC_ptr_adapter<_ComputeDT> result3_ptr(q_ref, result3, size_n * size_n,
-                                              true, true);
-    _ComputeDT *res_u = result1_ptr.get_ptr();
-    _SVDT *res_s = result2_ptr.get_ptr();
-    _ComputeDT *res_vt = result3_ptr.get_ptr();
-
-    const std::int64_t m = size_m;
-    const std::int64_t n = size_n;
-
-    const std::int64_t lda = std::max<size_t>(1UL, n);
-    const std::int64_t ldu = std::max<size_t>(1UL, m);
-    const std::int64_t ldvt = std::max<size_t>(1UL, n);
-
-    const std::int64_t scratchpad_size =
-        mkl_lapack::gesvd_scratchpad_size<_ComputeDT>(
-            q, oneapi::mkl::jobsvd::vectors, oneapi::mkl::jobsvd::vectors, n, m,
-            lda, ldvt, ldu);
-
-    _ComputeDT *scratchpad = reinterpret_cast<_ComputeDT *>(
-        sycl::malloc_shared(scratchpad_size * sizeof(_ComputeDT), q));
-
-    event =
-        mkl_lapack::gesvd(q,
-                          oneapi::mkl::jobsvd::vectors, // onemkl::job jobu,
-                          oneapi::mkl::jobsvd::vectors, // onemkl::job jobvt,
-                          n, m, in_a, lda, res_s, res_vt, ldvt, res_u, ldu,
-                          scratchpad, scratchpad_size);
-
-    event.wait();
-
-    sycl::free(scratchpad, q);
-
-    return event_ref;
-}
-
-template <typename _InputDT, typename _ComputeDT, typename _SVDT>
-void dpnp_svd_c(void *array1_in,
-                void *result1,
-                void *result2,
-                void *result3,
-                size_t size_m,
-                size_t size_n)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_svd_c<_InputDT, _ComputeDT, _SVDT>(
-        q_ref, array1_in, result1, result2, result3, size_m, size_n,
-        dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _InputDT, typename _ComputeDT, typename _SVDT>
-void (*dpnp_svd_default_c)(void *, void *, void *, void *, size_t, size_t) =
-    dpnp_svd_c<_InputDT, _ComputeDT, _SVDT>;
-
-void func_map_init_linalg_func(func_map_t &fmap)
-{
-    fmap[DPNPFuncName::DPNP_FN_CHOLESKY][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_cholesky_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_CHOLESKY][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cholesky_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_DET][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_det_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_DET][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_det_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_DET][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_det_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_DET][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_det_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_INV][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_inv_default_c<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_INV][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_inv_default_c<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_INV][eft_FLT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_inv_default_c<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_INV][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_inv_default_c<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_kron_default_c<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_kron_default_c<int32_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_INT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_kron_default_c<int32_t, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_kron_default_c<int32_t, double, double>};
-    // fmap[DPNPFuncName::DPNP_FN_KRON][eft_INT][eft_C128] = {
-    // eft_C128, (void*)dpnp_kron_default_c<int32_t, std::complex<double>,
-    // std::complex<double>>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_kron_default_c<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_kron_default_c<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_LNG][eft_FLT] = {
-        eft_FLT, (void *)dpnp_kron_default_c<int64_t, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_kron_default_c<int64_t, double, double>};
-    // fmap[DPNPFuncName::DPNP_FN_KRON][eft_LNG][eft_C128] = {
-    // eft_C128, (void*)dpnp_kron_default_c<int64_t, std::complex<double>,
-    // std::complex<double>>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_FLT][eft_INT] = {
-        eft_FLT, (void *)dpnp_kron_default_c<float, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_FLT][eft_LNG] = {
-        eft_FLT, (void *)dpnp_kron_default_c<float, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_kron_default_c<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_kron_default_c<float, double, double>};
-    // fmap[DPNPFuncName::DPNP_FN_KRON][eft_FLT][eft_C128] = {
-    // eft_C128, (void*)dpnp_kron_default_c<float, std::complex<double>,
-    // std::complex<double>>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_kron_default_c<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_kron_default_c<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_kron_default_c<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_kron_default_c<double, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_DBL][eft_C128] = {
-        eft_C128, (void *)dpnp_kron_default_c<double, std::complex<double>,
-                                              std::complex<double>>};
-    // fmap[DPNPFuncName::DPNP_FN_KRON][eft_C128][eft_INT] = {
-    // eft_C128, (void*)dpnp_kron_default_c<std::complex<double>, int32_t,
-    // std::complex<double>>};
-    // fmap[DPNPFuncName::DPNP_FN_KRON][eft_C128][eft_LNG] = {
-    // eft_C128, (void*)dpnp_kron_default_c<std::complex<double>, int64_t,
-    // std::complex<double>>};
-    // fmap[DPNPFuncName::DPNP_FN_KRON][eft_C128][eft_FLT] = {
-    // eft_C128, (void*)dpnp_kron_default_c<std::complex<double>, float,
-    // std::complex<double>>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_C128][eft_DBL] = {
-        eft_C128, (void *)dpnp_kron_default_c<std::complex<double>, double,
-                                              std::complex<double>>};
-    fmap[DPNPFuncName::DPNP_FN_KRON][eft_C128][eft_C128] = {
-        eft_C128,
-        (void *)dpnp_kron_default_c<std::complex<double>, std::complex<double>,
-                                    std::complex<double>>};
-
-    fmap[DPNPFuncName::DPNP_FN_MATRIX_RANK][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_matrix_rank_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MATRIX_RANK][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_matrix_rank_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MATRIX_RANK][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_matrix_rank_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_MATRIX_RANK][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_matrix_rank_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_QR][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_qr_default_c<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_QR][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_qr_default_c<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_QR][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_qr_default_c<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_QR][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_qr_default_c<double, double>};
-    // fmap[DPNPFuncName::DPNP_FN_QR][eft_C128][eft_C128] = {
-    // eft_C128, (void*)dpnp_qr_c<std::complex<double>, std::complex<double>>};
-
-    fmap[DPNPFuncName::DPNP_FN_SVD][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_svd_default_c<int32_t, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_SVD][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_svd_default_c<int64_t, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_SVD][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_svd_default_c<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_SVD][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_svd_default_c<double, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_SVD][eft_C128][eft_C128] = {
-        eft_C128, (void *)dpnp_svd_default_c<std::complex<double>,
-                                             std::complex<double>, double>};
-
-    return;
-}
diff --git a/dpnp/backend/src/dpnp_fptr.hpp b/dpnp/backend/src/dpnp_fptr.hpp
index 022e844319df..20fc5305e9a2 100644
--- a/dpnp/backend/src/dpnp_fptr.hpp
+++ b/dpnp/backend/src/dpnp_fptr.hpp
@@ -331,7 +331,6 @@ void func_map_init_elemwise(func_map_t &fmap);
 void func_map_init_fft_func(func_map_t &fmap);
 void func_map_init_indexing_func(func_map_t &fmap);
 void func_map_init_linalg(func_map_t &fmap);
-void func_map_init_linalg_func(func_map_t &fmap);
 void func_map_init_logic(func_map_t &fmap);
 void func_map_init_manipulation(func_map_t &fmap);
 void func_map_init_mathematical(func_map_t &fmap);
diff --git a/dpnp/backend/src/dpnp_iface_fptr.cpp b/dpnp/backend/src/dpnp_iface_fptr.cpp
index a0683d44a968..460896bfa2dd 100644
--- a/dpnp/backend/src/dpnp_iface_fptr.cpp
+++ b/dpnp/backend/src/dpnp_iface_fptr.cpp
@@ -172,7 +172,6 @@ static func_map_t func_map_init()
     func_map_init_fft_func(fmap);
     func_map_init_indexing_func(fmap);
     func_map_init_linalg(fmap);
-    func_map_init_linalg_func(fmap);
     func_map_init_logic(fmap);
     func_map_init_manipulation(fmap);
     func_map_init_mathematical(fmap);

From 3a742d174482d5f94d99c034b9178cee1b4c17ba Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Tue, 25 Jun 2024 23:39:51 +0200
Subject: [PATCH 31/49] Remove the w/a which breaks the build on Windows with
 python `3.12` (#1896)

* Remove a temporary w/a to unblock Windows build

* Channel OVERRIDE_INTEL_IPO env. variable

Set variable in public CI to override using interprocedural optimization
in public CI to avoid insufficient resources failure during compilation
on Windows.
---
 conda-recipe/bld.bat   | 37 ++++++++++++++++++-------------------
 conda-recipe/meta.yaml |  1 +
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/conda-recipe/bld.bat b/conda-recipe/bld.bat
index 960b254bd397..922b6949d2fe 100644
--- a/conda-recipe/bld.bat
+++ b/conda-recipe/bld.bat
@@ -1,14 +1,7 @@
 REM A workaround for activate-dpcpp.bat issue to be addressed in 2021.4
-set "LIB=%BUILD_PREFIX%\Library\lib;%BUILD_PREFIX%\compiler\lib;%LIB%"
+SET "LIB=%BUILD_PREFIX%\Library\lib;%BUILD_PREFIX%\compiler\lib;%LIB%"
 SET "INCLUDE=%BUILD_PREFIX%\include;%INCLUDE%"
 
-REM Since the 60.0.0 release, setuptools includes a local, vendored copy
-REM of distutils (from late copies of CPython) that is enabled by default.
-REM It breaks build for Windows, so use distutils from "stdlib" as before.
-REM @TODO: remove the setting, once transition to build backend on Windows
-REM to cmake is complete.
-SET "SETUPTOOLS_USE_DISTUTILS=stdlib"
-
 "%PYTHON%" setup.py clean --all
 
 set "MKLROOT=%PREFIX%/Library"
@@ -18,10 +11,15 @@ set "DPL_ROOT_HINT=%PREFIX%/Library"
 set "SKBUILD_ARGS=-G Ninja -- -DCMAKE_C_COMPILER:PATH=icx -DCMAKE_CXX_COMPILER:PATH=icx -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON"
 set "SKBUILD_ARGS=%SKBUILD_ARGS% -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON"
 
+REM Overriding IPO is useful for building in resources constrained VMs (public CI)
+if DEFINED OVERRIDE_INTEL_IPO (
+   set "SKBUILD_ARGS=%SKBUILD_ARGS% -DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=FALSE"
+)
+
 FOR %%V IN (14.0.0 14 15.0.0 15 16.0.0 16 17.0.0 17) DO @(
   REM set DIR_HINT if directory exists
   IF EXIST "%BUILD_PREFIX%\Library\lib\clang\%%V\" (
-     SET "SYCL_INCLUDE_DIR_HINT=%BUILD_PREFIX%\Library\lib\clang\%%V"
+    SET "SYCL_INCLUDE_DIR_HINT=%BUILD_PREFIX%\Library\lib\clang\%%V"
   )
 )
 
@@ -40,19 +38,20 @@ if EXIST "%PLATFORM_DIR%" (
 )
 
 if NOT "%WHEELS_OUTPUT_FOLDER%"=="" (
-    rem Install and assemble wheel package from the build bits
-    "%PYTHON%" setup.py install bdist_wheel %SKBUILD_ARGS%
-    if errorlevel 1 exit 1
-    copy dist\dpnp*.whl %WHEELS_OUTPUT_FOLDER%
-    if errorlevel 1 exit 1
+  rem Install and assemble wheel package from the build bits
+  "%PYTHON%" setup.py install bdist_wheel %SKBUILD_ARGS%
+  if errorlevel 1 exit 1
+  copy dist\dpnp*.whl %WHEELS_OUTPUT_FOLDER%
+  if errorlevel 1 exit 1
 ) ELSE (
-    rem Only install
-    "%PYTHON%" setup.py install %SKBUILD_ARGS%
-    if errorlevel 1 exit 1
+  rem Only install
+  "%PYTHON%" setup.py install %SKBUILD_ARGS%
+  if errorlevel 1 exit 1
 )
 
 rem copy back
 if EXIST "%PLATFORM_DIR%" (
-   copy /Y "%FN%" "%PLATFORM_DIR%\%FN%"
-   if errorlevel 1 exit 1
+  rem copy back
+  copy /Y "%FN%" "%PLATFORM_DIR%\%FN%"
+  if errorlevel 1 exit 1
 )
diff --git a/conda-recipe/meta.yaml b/conda-recipe/meta.yaml
index c10cd061345c..6e12e122e179 100644
--- a/conda-recipe/meta.yaml
+++ b/conda-recipe/meta.yaml
@@ -42,6 +42,7 @@ build:
     include_recipe: False
     script_env:
       - WHEELS_OUTPUT_FOLDER
+      - OVERRIDE_INTEL_IPO  # [win]
 
 test:
     requires:

From c78f28a97c0889b7630df479ca9aeeaab99c18fc Mon Sep 17 00:00:00 2001
From: vlad-perevezentsev <vladislav.perevezentsev@intel.com>
Date: Wed, 26 Jun 2024 18:56:30 +0200
Subject: [PATCH 32/49] Skip test_distr in TestNormal and TestRandN (#1899)

* Skip test_distr in TestNormal and TestRandN

* Add jira ticket number
---
 tests/test_random_state.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tests/test_random_state.py b/tests/test_random_state.py
index ed56dbdf7307..2f9b76a43e85 100644
--- a/tests/test_random_state.py
+++ b/tests/test_random_state.py
@@ -35,6 +35,9 @@ def get_default_floating():
 
 
 class TestNormal:
+    # TODO: Temporary skip due to incorrect results in public CI
+    # (ARM architecture) with the new MKL package 2024.2.0 (SAT-7080)
+    @pytest.mark.skipif(is_cpu_device(), reason="SAT-7080")
     @pytest.mark.parametrize(
         "dtype",
         [dpnp.float32, dpnp.float64, dpnp.float, None],
@@ -605,6 +608,9 @@ def test_invalid_usm_type(self, usm_type):
 
 
 class TestRandN:
+    # TODO: Temporary skip due to incorrect results in public CI
+    # (ARM architecture) with the new MKL package 2024.2.0 (SAT-7080)
+    @pytest.mark.skipif(is_cpu_device(), reason="SAT-7080")
     @pytest.mark.parametrize(
         "usm_type",
         ["host", "device", "shared"],

From 323ce509c3d81f7325179e18e6ca5ac80d23b67d Mon Sep 17 00:00:00 2001
From: vlad-perevezentsev <vladislav.perevezentsev@intel.com>
Date: Wed, 26 Jun 2024 20:44:29 +0200
Subject: [PATCH 33/49] Support `out` parameter for `dpnp.all/any()` (#1893)

* Update dpnp.all/any with support out param

* Update cupy tests

* Add TestAllAny

* Update dpnp tests

* Apply comments

---------

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 dpnp/dpnp_iface_logic.py                      | 194 +++++++++++-------
 tests/test_logic.py                           | 104 +++++++---
 tests/test_sycl_queue.py                      |   2 +
 tests/test_usm_type.py                        |   2 +
 .../cupy/logic_tests/test_truth.py            |   3 -
 5 files changed, 200 insertions(+), 105 deletions(-)

diff --git a/dpnp/dpnp_iface_logic.py b/dpnp/dpnp_iface_logic.py
index d780cf578bf8..6dfa1a15dccb 100644
--- a/dpnp/dpnp_iface_logic.py
+++ b/dpnp/dpnp_iface_logic.py
@@ -46,7 +46,7 @@
 
 
 import dpctl.tensor as dpt
-import dpctl.tensor._tensor_elementwise_impl as ti
+import dpctl.tensor._tensor_elementwise_impl as tei
 import numpy
 
 import dpnp
@@ -76,25 +76,48 @@
 ]
 
 
-def all(x, /, axis=None, out=None, keepdims=False, *, where=True):
+def all(a, /, axis=None, out=None, keepdims=False, *, where=True):
     """
     Test whether all array elements along a given axis evaluate to True.
 
     For full documentation refer to :obj:`numpy.all`.
 
+    Parameters
+    ----------
+    a : {dpnp.ndarray, usm_ndarray}
+        Input array.
+    axis : {None, int, tuple of ints}, optional
+        Axis or axes along which a logical AND reduction is performed.
+        The default is to perform a logical AND over all the dimensions
+        of the input array.`axis` may be negative, in which case it counts
+        from the last to the first axis.
+        Default: ``None``.
+    out : {None, dpnp.ndarray, usm_ndarray}, optional
+        Alternative output array in which to place the result. It must have
+        the same shape as the expected output but the type (of the returned
+        values) will be cast if necessary.
+        Default: ``None``.
+    keepdims : bool, optional
+        If ``True``, the reduced axes (dimensions) are included in the result
+        as singleton dimensions, so that the returned array remains
+        compatible with the input array according to Array Broadcasting
+        rules. Otherwise, if ``False``, the reduced axes are not included in
+        the returned array.
+        Default: ``False``.
+
     Returns
     -------
     out : dpnp.ndarray
         An array with a data type of `bool`
-        containing the results of the logical AND reduction.
+        containing the results of the logical AND reduction is returned
+        unless `out` is specified. Otherwise, a reference to `out` is returned.
+        The result has the same shape as `a` if `axis` is not ``None``
+        or `a` is a 0-d array.
 
     Limitations
     -----------
-    Parameters `x` is supported either as :class:`dpnp.ndarray`
-    or :class:`dpctl.tensor.usm_ndarray`.
-    Parameters `out` and `where` are supported with default value.
-    Input array data types are limited by supported DPNP :ref:`Data types`.
-    Otherwise the function will be executed sequentially on CPU.
+    Parameters `where` is only supported with its default value.
+    Otherwise ``NotImplementedError`` exception will be raised.
 
     See Also
     --------
@@ -105,7 +128,7 @@ def all(x, /, axis=None, out=None, keepdims=False, *, where=True):
     Notes
     -----
     Not a Number (NaN), positive infinity and negative infinity
-    evaluate to `True` because these are not equal to zero.
+    evaluate to ``True`` because these are not equal to zero.
 
     Examples
     --------
@@ -125,22 +148,27 @@ def all(x, /, axis=None, out=None, keepdims=False, *, where=True):
     >>> np.all(x3)
     array(True)
 
+    >>> o = np.array(False)
+    >>> z = np.all(x2, out=o)
+    >>> z, o
+    (array(True), array(True))
+    >>> # Check now that `z` is a reference to `o`
+    >>> z is o
+    True
+    >>> id(z), id(o) # identity of `z` and `o`
+    (139884456208480, 139884456208480) # may vary
+
     """
 
-    if dpnp.is_supported_array_type(x):
-        if out is not None:
-            pass
-        elif where is not True:
-            pass
-        else:
-            dpt_array = dpnp.get_usm_ndarray(x)
-            return dpnp_array._create_from_usm_ndarray(
-                dpt.all(dpt_array, axis=axis, keepdims=keepdims)
-            )
+    dpnp.check_limitations(where=where)
 
-    return call_origin(
-        numpy.all, x, axis=axis, out=out, keepdims=keepdims, where=where
+    dpt_array = dpnp.get_usm_ndarray(a)
+    result = dpnp_array._create_from_usm_ndarray(
+        dpt.all(dpt_array, axis=axis, keepdims=keepdims)
     )
+    # TODO: temporary solution until dpt.all supports out parameter
+    result = dpnp.get_result_array(result, out)
+    return result
 
 
 def allclose(a, b, rtol=1.0e-5, atol=1.0e-8, **kwargs):
@@ -238,25 +266,48 @@ def allclose(a, b, rtol=1.0e-5, atol=1.0e-8, **kwargs):
     return call_origin(numpy.allclose, a, b, rtol=rtol, atol=atol, **kwargs)
 
 
-def any(x, /, axis=None, out=None, keepdims=False, *, where=True):
+def any(a, /, axis=None, out=None, keepdims=False, *, where=True):
     """
     Test whether any array element along a given axis evaluates to True.
 
     For full documentation refer to :obj:`numpy.any`.
 
+    Parameters
+    ----------
+    a : {dpnp.ndarray, usm_ndarray}
+        Input array.
+    axis : {None, int, tuple of ints}, optional
+        Axis or axes along which a logical OR reduction is performed.
+        The default is to perform a logical OR over all the dimensions
+        of the input array.`axis` may be negative, in which case it counts
+        from the last to the first axis.
+        Default: ``None``.
+    out : {None, dpnp.ndarray, usm_ndarray}, optional
+        Alternative output array in which to place the result. It must have
+        the same shape as the expected output but the type (of the returned
+        values) will be cast if necessary.
+        Default: ``None``.
+    keepdims : bool, optional
+        If ``True``, the reduced axes (dimensions) are included in the result
+        as singleton dimensions, so that the returned array remains
+        compatible with the input array according to Array Broadcasting
+        rules. Otherwise, if ``False``, the reduced axes are not included in
+        the returned array.
+        Default: ``False``.
+
     Returns
     -------
     out : dpnp.ndarray
         An array with a data type of `bool`
-        containing the results of the logical OR reduction.
+        containing the results of the logical OR reduction is returned
+        unless `out` is specified. Otherwise, a reference to `out` is returned.
+        The result has the same shape as `a` if `axis` is not ``None``
+        or `a` is a 0-d array.
 
     Limitations
     -----------
-    Parameters `x` is supported either as :class:`dpnp.ndarray`
-    or :class:`dpctl.tensor.usm_ndarray`.
-    Parameters `out` and `where` are supported with default value.
-    Input array data types are limited by supported DPNP :ref:`Data types`.
-    Otherwise the function will be executed sequentially on CPU.
+    Parameters `where` is only supported with its default value.
+    Otherwise ``NotImplementedError`` exception will be raised.
 
     See Also
     --------
@@ -267,7 +318,7 @@ def any(x, /, axis=None, out=None, keepdims=False, *, where=True):
     Notes
     -----
     Not a Number (NaN), positive infinity and negative infinity evaluate
-    to `True` because these are not equal to zero.
+    to ``True`` because these are not equal to zero.
 
     Examples
     --------
@@ -279,30 +330,35 @@ def any(x, /, axis=None, out=None, keepdims=False, *, where=True):
     >>> np.any(x, axis=0)
     array([ True,  True])
 
-    >>> x2 = np.array([0, 0, 0])
+    >>> x2 = np.array([-1, 0, 5])
     >>> np.any(x2)
-    array(False)
+    array(True)
 
     >>> x3 = np.array([1.0, np.nan])
     >>> np.any(x3)
     array(True)
 
+    >>> o = np.array(False)
+    >>> z = np.any(x2, out=o)
+    >>> z, o
+    (array(True), array(True))
+    >>> # Check now that `z` is a reference to `o`
+    >>> z is o
+    True
+    >>> id(z), id(o) # identity of `z` and `o`
+    >>> (140053638309840, 140053638309840) # may vary
+
     """
 
-    if dpnp.is_supported_array_type(x):
-        if out is not None:
-            pass
-        elif where is not True:
-            pass
-        else:
-            dpt_array = dpnp.get_usm_ndarray(x)
-            return dpnp_array._create_from_usm_ndarray(
-                dpt.any(dpt_array, axis=axis, keepdims=keepdims)
-            )
+    dpnp.check_limitations(where=where)
 
-    return call_origin(
-        numpy.any, x, axis=axis, out=out, keepdims=keepdims, where=where
+    dpt_array = dpnp.get_usm_ndarray(a)
+    result = dpnp_array._create_from_usm_ndarray(
+        dpt.any(dpt_array, axis=axis, keepdims=keepdims)
     )
+    # TODO: temporary solution until dpt.any supports out parameter
+    result = dpnp.get_result_array(result, out)
+    return result
 
 
 _EQUAL_DOCSTRING = """
@@ -368,8 +424,8 @@ def any(x, /, axis=None, out=None, keepdims=False, *, where=True):
 
 equal = DPNPBinaryFunc(
     "equal",
-    ti._equal_result_type,
-    ti._equal,
+    tei._equal_result_type,
+    tei._equal,
     _EQUAL_DOCSTRING,
 )
 
@@ -431,8 +487,8 @@ def any(x, /, axis=None, out=None, keepdims=False, *, where=True):
 
 greater = DPNPBinaryFunc(
     "greater",
-    ti._greater_result_type,
-    ti._greater,
+    tei._greater_result_type,
+    tei._greater,
     _GREATER_DOCSTRING,
 )
 
@@ -495,8 +551,8 @@ def any(x, /, axis=None, out=None, keepdims=False, *, where=True):
 
 greater_equal = DPNPBinaryFunc(
     "greater",
-    ti._greater_equal_result_type,
-    ti._greater_equal,
+    tei._greater_equal_result_type,
+    tei._greater_equal,
     _GREATER_EQUAL_DOCSTRING,
 )
 
@@ -597,8 +653,8 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 isfinite = DPNPUnaryFunc(
     "isfinite",
-    ti._isfinite_result_type,
-    ti._isfinite,
+    tei._isfinite_result_type,
+    tei._isfinite,
     _ISFINITE_DOCSTRING,
 )
 
@@ -650,8 +706,8 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 isinf = DPNPUnaryFunc(
     "isinf",
-    ti._isinf_result_type,
-    ti._isinf,
+    tei._isinf_result_type,
+    tei._isinf,
     _ISINF_DOCSTRING,
 )
 
@@ -704,8 +760,8 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 isnan = DPNPUnaryFunc(
     "isnan",
-    ti._isnan_result_type,
-    ti._isnan,
+    tei._isnan_result_type,
+    tei._isnan,
     _ISNAN_DOCSTRING,
 )
 
@@ -767,8 +823,8 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 less = DPNPBinaryFunc(
     "less",
-    ti._less_result_type,
-    ti._less,
+    tei._less_result_type,
+    tei._less,
     _LESS_DOCSTRING,
 )
 
@@ -830,8 +886,8 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 less_equal = DPNPBinaryFunc(
     "less_equal",
-    ti._less_equal_result_type,
-    ti._less_equal,
+    tei._less_equal_result_type,
+    tei._less_equal,
     _LESS_EQUAL_DOCSTRING,
 )
 
@@ -895,8 +951,8 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 logical_and = DPNPBinaryFunc(
     "logical_and",
-    ti._logical_and_result_type,
-    ti._logical_and,
+    tei._logical_and_result_type,
+    tei._logical_and,
     _LOGICAL_AND_DOCSTRING,
 )
 
@@ -947,8 +1003,8 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 logical_not = DPNPUnaryFunc(
     "logical_not",
-    ti._logical_not_result_type,
-    ti._logical_not,
+    tei._logical_not_result_type,
+    tei._logical_not,
     _LOGICAL_NOT_DOCSTRING,
 )
 
@@ -1012,8 +1068,8 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 logical_or = DPNPBinaryFunc(
     "logical_or",
-    ti._logical_or_result_type,
-    ti._logical_or,
+    tei._logical_or_result_type,
+    tei._logical_or,
     _LOGICAL_OR_DOCSTRING,
 )
 
@@ -1075,8 +1131,8 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 logical_xor = DPNPBinaryFunc(
     "logical_xor",
-    ti._logical_xor_result_type,
-    ti._logical_xor,
+    tei._logical_xor_result_type,
+    tei._logical_xor,
     _LOGICAL_XOR_DOCSTRING,
 )
 
@@ -1138,7 +1194,7 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 not_equal = DPNPBinaryFunc(
     "not_equal",
-    ti._not_equal_result_type,
-    ti._not_equal,
+    tei._not_equal_result_type,
+    tei._not_equal,
     _NOT_EQUAL_DOCSTRING,
 )
diff --git a/tests/test_logic.py b/tests/test_logic.py
index 1b8e34a6fe8b..e4f103e22c26 100644
--- a/tests/test_logic.py
+++ b/tests/test_logic.py
@@ -1,47 +1,85 @@
 import numpy
 import pytest
-from numpy.testing import assert_allclose, assert_equal
+from numpy.testing import assert_allclose, assert_equal, assert_raises
 
 import dpnp
 
 from .helper import (
     get_all_dtypes,
     get_float_complex_dtypes,
-    has_support_aspect64,
 )
 
 
-@pytest.mark.parametrize("type", get_all_dtypes())
-@pytest.mark.parametrize(
-    "shape",
-    [(0,), (4,), (2, 3), (2, 2, 2)],
-    ids=["(0,)", "(4,)", "(2,3)", "(2,2,2)"],
-)
-def test_all(type, shape):
-    size = 1
-    for i in range(len(shape)):
-        size *= shape[i]
-
-    for i in range(2**size):
-        t = i
-
-        a = numpy.empty(size, dtype=type)
-
-        for j in range(size):
-            a[j] = 0 if t % 2 == 0 else j + 1
-            t = t >> 1
-
-        a = a.reshape(shape)
-
-        ia = dpnp.array(a)
-
-        np_res = numpy.all(a)
-        dpnp_res = dpnp.all(ia)
-        assert_allclose(dpnp_res, np_res)
-
-        np_res = a.all()
-        dpnp_res = ia.all()
-        assert_allclose(dpnp_res, np_res)
+class TestAllAny:
+    @pytest.mark.parametrize("func", ["all", "any"])
+    @pytest.mark.parametrize("dtype", get_all_dtypes())
+    @pytest.mark.parametrize("axis", [None, 0, 1, (0, 1)])
+    @pytest.mark.parametrize("keepdims", [True, False])
+    def test_all_any(self, func, dtype, axis, keepdims):
+        dp_array = dpnp.array([[0, 1, 2], [3, 4, 0]], dtype=dtype)
+        np_array = dpnp.asnumpy(dp_array)
+
+        expected = getattr(numpy, func)(np_array, axis=axis, keepdims=keepdims)
+        result = getattr(dpnp, func)(dp_array, axis=axis, keepdims=keepdims)
+        assert_allclose(result, expected)
+
+    @pytest.mark.parametrize("func", ["all", "any"])
+    @pytest.mark.parametrize("a_dtype", get_all_dtypes())
+    @pytest.mark.parametrize("out_dtype", get_all_dtypes())
+    def test_all_any_out(self, func, a_dtype, out_dtype):
+        dp_array = dpnp.array([[0, 1, 2], [3, 4, 0]], dtype=a_dtype)
+        np_array = dpnp.asnumpy(dp_array)
+
+        expected = getattr(numpy, func)(np_array)
+        out = dpnp.empty(expected.shape, dtype=out_dtype)
+        result = getattr(dpnp, func)(dp_array, out=out)
+        assert out is result
+        assert_allclose(result, expected)
+
+    @pytest.mark.parametrize("func", ["all", "any"])
+    @pytest.mark.parametrize("axis", [None, 0, 1, (0, 1)])
+    @pytest.mark.parametrize("shape", [(2, 3), (2, 0), (0, 3)])
+    def test_all_any_empty(self, func, axis, shape):
+        dp_array = dpnp.empty(shape, dtype=dpnp.int64)
+        np_array = dpnp.asnumpy(dp_array)
+
+        result = getattr(dpnp, func)(dp_array, axis=axis)
+        expected = getattr(numpy, func)(np_array, axis=axis)
+        assert_allclose(result, expected)
+
+    @pytest.mark.parametrize("func", ["all", "any"])
+    def test_all_any_scalar(self, func):
+        dp_array = dpnp.array(0)
+        np_array = dpnp.asnumpy(dp_array)
+
+        result = getattr(dp_array, func)()
+        expected = getattr(np_array, func)()
+        assert_allclose(result, expected)
+
+    @pytest.mark.parametrize("func", ["all", "any"])
+    @pytest.mark.parametrize("axis", [None, 0, 1])
+    @pytest.mark.parametrize("keepdims", [True, False])
+    def test_all_any_nan_inf(self, func, axis, keepdims):
+        dp_array = dpnp.array([[dpnp.nan, 1, 2], [dpnp.inf, -dpnp.inf, 0]])
+        np_array = dpnp.asnumpy(dp_array)
+
+        expected = getattr(numpy, func)(np_array, axis=axis, keepdims=keepdims)
+        result = getattr(dpnp, func)(dp_array, axis=axis, keepdims=keepdims)
+        assert_allclose(result, expected)
+
+    @pytest.mark.parametrize("func", ["all", "any"])
+    def test_all_any_error(self, func):
+        def check_raises(func_name, exception, *args, **kwargs):
+            assert_raises(
+                exception, lambda: getattr(dpnp, func_name)(*args, **kwargs)
+            )
+
+        a = dpnp.arange(5)
+        # unsupported where parameter
+        check_raises(func, NotImplementedError, a, where=False)
+        # unsupported type
+        check_raises(func, TypeError, dpnp.asnumpy(a))
+        check_raises(func, TypeError, [0, 1, 2, 3])
 
 
 @pytest.mark.parametrize("dtype", get_all_dtypes(no_bool=True, no_complex=True))
diff --git a/tests/test_sycl_queue.py b/tests/test_sycl_queue.py
index 99334cfabfcd..3349c0134289 100644
--- a/tests/test_sycl_queue.py
+++ b/tests/test_sycl_queue.py
@@ -394,6 +394,8 @@ def test_meshgrid(device_x, device_y):
 @pytest.mark.parametrize(
     "func,data",
     [
+        pytest.param("all", [-1.0, 0.0, 1.0]),
+        pytest.param("any", [-1.0, 0.0, 1.0]),
         pytest.param("average", [1.0, 2.0, 4.0, 7.0]),
         pytest.param("abs", [-1.2, 1.2]),
         pytest.param("angle", [[1.0 + 1.0j, 2.0 + 3.0j]]),
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index 4f7314ff2db0..427151dcc518 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -510,6 +510,8 @@ def test_norm(usm_type, ord, axis):
 @pytest.mark.parametrize(
     "func,data",
     [
+        pytest.param("all", [-1.0, 0.0, 1.0]),
+        pytest.param("any", [-1.0, 0.0, 1.0]),
         pytest.param("average", [1.0, 2.0, 4.0, 7.0]),
         pytest.param("abs", [-1.2, 1.2]),
         pytest.param("angle", [[1.0 + 1.0j, 2.0 + 3.0j]]),
diff --git a/tests/third_party/cupy/logic_tests/test_truth.py b/tests/third_party/cupy/logic_tests/test_truth.py
index e715aa24405a..c76ccd48aa58 100644
--- a/tests/third_party/cupy/logic_tests/test_truth.py
+++ b/tests/third_party/cupy/logic_tests/test_truth.py
@@ -1,7 +1,6 @@
 import unittest
 
 import numpy
-import pytest
 
 from tests.third_party.cupy import testing
 
@@ -47,7 +46,6 @@ def test_without_out(self, xp, dtype):
         x = xp.asarray(self.x).astype(dtype)
         return getattr(xp, self.f)(x, self.axis, None, self.keepdims)
 
-    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_with_out(self, xp, dtype):
@@ -80,7 +78,6 @@ def test_without_out(self, xp, dtype):
         x = xp.asarray(self.x).astype(dtype)
         return getattr(xp, self.f)(x, self.axis, None, self.keepdims)
 
-    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.for_dtypes((*testing._loops._float_dtypes, numpy.bool_))
     @testing.numpy_cupy_array_equal()
     def test_with_out(self, xp, dtype):

From 73ace1269c5c45fdde499234529f88a421ac6380 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Wed, 26 Jun 2024 23:17:09 +0200
Subject: [PATCH 34/49] Rework implementation of `dpnp.fmod` function (#1883)

* Preparation to reuse common dpctl f/w for VM functions

* PoC to decouple abs implementation to separate source file

* Reuse typedef for function poiter from dpctl.tensor

* Define populating vectors by a separate macro

* Move implementation of utility functions from headers to source to resolve link issues

* Separated implementation of acos function

* Separated implementation of acosh function

* Use function to simplify strides from dpctl tensor headers

* PoC to decouple add implementation to separate source file

* Separated implementation of asin function

* Separated implementation of asinh function

* Separated implementation of atan, atan2, atanh functions

* Resolve issue with calling MKL function for undefined types

* Separated implementation of cbrt, ceil, conj, cos and cosh functions

* Separated implementation of div, exp, exp2, expm1, floor and hypot functions

* Separated implementation of ln, log1p, log2 and log10 functions

* Separated implementation of mul, pow, rint, sin and sinh functions

* Separated implementation of sqr, sqrt, sub, tan, tanh and trunc functions

* Removed unused header with types matrix

* Remove unused functions

* Use passing by reference in unary and binary funcs

* Implement dpnp.fabs function

* Create an instance of DPNPUnaryFunc for fabs

* Enable and add relating tests

* Decouple populate logic to a macro

* Resolve compilation failure on Win

* Implement dpnp.fmod function

* Add vector implementation and dedicated kernel for boolean inputs

* Update python implementation part

* Add MKL function to the VM extension

* Add tests

* Add a link to gh issue in arithmetic tests

* Suppress divide warning

* Resolve compilation warning

* Updated docstring description of inputs per review comment
---
 dpnp/backend/extensions/ufunc/CMakeLists.txt  |   1 +
 .../ufunc/elementwise_functions/common.cpp    |   2 +
 .../ufunc/elementwise_functions/fmod.cpp      | 193 ++++++++
 .../ufunc/elementwise_functions/fmod.hpp      |  35 ++
 .../ufunc/elementwise_functions/populate.hpp  | 110 ++++-
 dpnp/backend/extensions/vm/CMakeLists.txt     |   1 +
 dpnp/backend/extensions/vm/fmod.cpp           | 161 ++++++
 dpnp/backend/extensions/vm/fmod.hpp           |  35 ++
 dpnp/backend/extensions/vm/vm_py.cpp          |   2 +
 dpnp/backend/include/dpnp_iface_fptr.hpp      |  10 +-
 dpnp/backend/kernels/dpnp_krnl_elemwise.cpp   |  14 -
 .../kernels/elementwise_functions/fmod.hpp    |  61 +++
 dpnp/dpnp_algo/dpnp_algo.pxd                  |   1 -
 dpnp/dpnp_algo/dpnp_algo_mathematical.pxi     |   9 -
 dpnp/dpnp_iface_bitwise.py                    |  43 +-
 dpnp/dpnp_iface_logic.py                      |  66 ++-
 dpnp/dpnp_iface_mathematical.py               | 257 +++++-----
 dpnp/dpnp_iface_trigonometric.py              |  49 +-
 tests/test_mathematical.py                    |  51 +-
 tests/test_usm_type.py                        |   1 +
 .../cupy/core_tests/test_ndarray_math.py      |   3 +-
 .../cupy/math_tests/test_arithmetic.py        | 458 ++++++++++++++----
 22 files changed, 1263 insertions(+), 300 deletions(-)
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/fmod.cpp
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/fmod.hpp
 create mode 100644 dpnp/backend/extensions/vm/fmod.cpp
 create mode 100644 dpnp/backend/extensions/vm/fmod.hpp
 create mode 100644 dpnp/backend/kernels/elementwise_functions/fmod.hpp

diff --git a/dpnp/backend/extensions/ufunc/CMakeLists.txt b/dpnp/backend/extensions/ufunc/CMakeLists.txt
index 7f9a240271b1..1d140b066584 100644
--- a/dpnp/backend/extensions/ufunc/CMakeLists.txt
+++ b/dpnp/backend/extensions/ufunc/CMakeLists.txt
@@ -26,6 +26,7 @@
 set(_elementwise_sources
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/common.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/fabs.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/fmod.cpp
 )
 
 set(python_module_name _ufunc_impl)
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp b/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
index 44173fc764fe..b915f9a299a8 100644
--- a/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
@@ -26,6 +26,7 @@
 #include <pybind11/pybind11.h>
 
 #include "fabs.hpp"
+#include "fmod.hpp"
 
 namespace py = pybind11;
 
@@ -37,5 +38,6 @@ namespace dpnp::extensions::ufunc
 void init_elementwise_functions(py::module_ m)
 {
     init_fabs(m);
+    init_fmod(m);
 }
 } // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/fmod.cpp b/dpnp/backend/extensions/ufunc/elementwise_functions/fmod.cpp
new file mode 100644
index 000000000000..dbc215ec1f40
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/fmod.cpp
@@ -0,0 +1,193 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "fmod.hpp"
+#include "kernels/elementwise_functions/fmod.hpp"
+#include "populate.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+namespace impl
+{
+/**
+ * @brief A factory to define pairs of supported types for which
+ * sycl::fmod<T> function is available.
+ *
+ * @tparam T1 Type of input vectors `a`
+ * @tparam T2 Type of input vectors `b`
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1, bool, T2, bool, std::int8_t>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::uint8_t,
+                                        T2,
+                                        std::uint8_t,
+                                        std::uint8_t>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::int8_t,
+                                        T2,
+                                        std::int8_t,
+                                        std::int8_t>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::uint16_t,
+                                        T2,
+                                        std::uint16_t,
+                                        std::uint16_t>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::int16_t,
+                                        T2,
+                                        std::int16_t,
+                                        std::int16_t>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::uint32_t,
+                                        T2,
+                                        std::uint32_t,
+                                        std::uint32_t>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::int32_t,
+                                        T2,
+                                        std::int32_t,
+                                        std::int32_t>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::uint64_t,
+                                        T2,
+                                        std::uint64_t,
+                                        std::uint64_t>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        std::int64_t,
+                                        T2,
+                                        std::int64_t,
+                                        std::int64_t>,
+        td_ns::BinaryTypeMapResultEntry<T1,
+                                        sycl::half,
+                                        T2,
+                                        sycl::half,
+                                        sycl::half>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+using dpnp::kernels::fmod::FmodFunctor;
+
+template <typename argT1,
+          typename argT2,
+          typename resT,
+          unsigned int vec_sz = 4,
+          unsigned int n_vecs = 2,
+          bool enable_sg_loadstore = true>
+using ContigFunctor =
+    ew_cmn_ns::BinaryContigFunctor<argT1,
+                                   argT2,
+                                   resT,
+                                   FmodFunctor<argT1, argT2, resT>,
+                                   vec_sz,
+                                   n_vecs,
+                                   enable_sg_loadstore>;
+
+template <typename argT1, typename argT2, typename resT, typename IndexerT>
+using StridedFunctor =
+    ew_cmn_ns::BinaryStridedFunctor<argT1,
+                                    argT2,
+                                    resT,
+                                    IndexerT,
+                                    FmodFunctor<argT1, argT2, resT>>;
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static binary_contig_impl_fn_ptr_t fmod_contig_dispatch_table[td_ns::num_types]
+                                                             [td_ns::num_types];
+static int fmod_output_typeid_table[td_ns::num_types][td_ns::num_types];
+static binary_strided_impl_fn_ptr_t
+    fmod_strided_dispatch_table[td_ns::num_types][td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(fmod);
+} // namespace impl
+
+void init_fmod(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+    {
+        impl::populate_fmod_dispatch_tables();
+        using impl::fmod_contig_dispatch_table;
+        using impl::fmod_output_typeid_table;
+        using impl::fmod_strided_dispatch_table;
+
+        auto fmod_pyapi = [&](const arrayT &src1, const arrayT &src2,
+                              const arrayT &dst, sycl::queue &exec_q,
+                              const event_vecT &depends = {}) {
+            return py_int::py_binary_ufunc(
+                src1, src2, dst, exec_q, depends, fmod_output_typeid_table,
+                fmod_contig_dispatch_table, fmod_strided_dispatch_table,
+                // no support of C-contig row with broadcasting in OneMKL
+                td_ns::NullPtrTable<
+                    impl::
+                        binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+                td_ns::NullPtrTable<
+                    impl::
+                        binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+        };
+        m.def("_fmod", fmod_pyapi, "", py::arg("src1"), py::arg("src2"),
+              py::arg("dst"), py::arg("sycl_queue"),
+              py::arg("depends") = py::list());
+
+        auto fmod_result_type_pyapi = [&](const py::dtype &dtype1,
+                                          const py::dtype &dtype2) {
+            return py_int::py_binary_ufunc_result_type(
+                dtype1, dtype2, fmod_output_typeid_table);
+        };
+        m.def("_fmod_result_type", fmod_result_type_pyapi);
+    }
+}
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/fmod.hpp b/dpnp/backend/extensions/ufunc/elementwise_functions/fmod.hpp
new file mode 100644
index 000000000000..cfc61ba218f8
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/fmod.hpp
@@ -0,0 +1,35 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <pybind11/pybind11.h>
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+void init_fmod(py::module_ m);
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/populate.hpp b/dpnp/backend/extensions/ufunc/elementwise_functions/populate.hpp
index 6261fcc08eb6..0b3cc8dac152 100644
--- a/dpnp/backend/extensions/ufunc/elementwise_functions/populate.hpp
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/populate.hpp
@@ -26,7 +26,8 @@
 #pragma once
 
 /**
- * @brief A macro used to define factories and a populating universal functions.
+ * @brief A macro used to define factories and a populating unary universal
+ * functions.
  */
 #define MACRO_POPULATE_DISPATCH_VECTORS(__name__)                              \
     template <typename T1, typename T2, unsigned int vec_sz,                   \
@@ -120,3 +121,110 @@
             dvb3;                                                              \
         dvb3.populate_dispatch_vector(__name__##_output_typeid_vector);        \
     };
+
+/**
+ * @brief A macro used to define factories and a populating binary universal
+ * functions.
+ */
+#define MACRO_POPULATE_DISPATCH_TABLES(__name__)                               \
+    template <typename argT1, typename argT2, typename resT,                   \
+              unsigned int vec_sz, unsigned int n_vecs>                        \
+    class __name__##_contig_kernel;                                            \
+                                                                               \
+    template <typename argTy1, typename argTy2>                                \
+    sycl::event __name__##_contig_impl(                                        \
+        sycl::queue &exec_q, size_t nelems, const char *arg1_p,                \
+        py::ssize_t arg1_offset, const char *arg2_p, py::ssize_t arg2_offset,  \
+        char *res_p, py::ssize_t res_offset,                                   \
+        const std::vector<sycl::event> &depends = {})                          \
+    {                                                                          \
+        return ew_cmn_ns::binary_contig_impl<argTy1, argTy2, OutputType,       \
+                                             ContigFunctor,                    \
+                                             __name__##_contig_kernel>(        \
+            exec_q, nelems, arg1_p, arg1_offset, arg2_p, arg2_offset, res_p,   \
+            res_offset, depends);                                              \
+    }                                                                          \
+                                                                               \
+    template <typename fnT, typename T1, typename T2>                          \
+    struct ContigFactory                                                       \
+    {                                                                          \
+        fnT get()                                                              \
+        {                                                                      \
+            if constexpr (std::is_same_v<                                      \
+                              typename OutputType<T1, T2>::value_type, void>)  \
+            {                                                                  \
+                                                                               \
+                fnT fn = nullptr;                                              \
+                return fn;                                                     \
+            }                                                                  \
+            else {                                                             \
+                fnT fn = __name__##_contig_impl<T1, T2>;                       \
+                return fn;                                                     \
+            }                                                                  \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    template <typename fnT, typename T1, typename T2>                          \
+    struct TypeMapFactory                                                      \
+    {                                                                          \
+        std::enable_if_t<std::is_same<fnT, int>::value, int> get()             \
+        {                                                                      \
+            using rT = typename OutputType<T1, T2>::value_type;                \
+            return td_ns::GetTypeid<rT>{}.get();                               \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    template <typename T1, typename T2, typename resT, typename IndexerT>      \
+    class __name__##_strided_kernel;                                           \
+                                                                               \
+    template <typename argTy1, typename argTy2>                                \
+    sycl::event __name__##_strided_impl(                                       \
+        sycl::queue &exec_q, size_t nelems, int nd,                            \
+        const py::ssize_t *shape_and_strides, const char *arg1_p,              \
+        py::ssize_t arg1_offset, const char *arg2_p, py::ssize_t arg2_offset,  \
+        char *res_p, py::ssize_t res_offset,                                   \
+        const std::vector<sycl::event> &depends,                               \
+        const std::vector<sycl::event> &additional_depends)                    \
+    {                                                                          \
+        return ew_cmn_ns::binary_strided_impl<argTy1, argTy2, OutputType,      \
+                                              StridedFunctor,                  \
+                                              __name__##_strided_kernel>(      \
+            exec_q, nelems, nd, shape_and_strides, arg1_p, arg1_offset,        \
+            arg2_p, arg2_offset, res_p, res_offset, depends,                   \
+            additional_depends);                                               \
+    }                                                                          \
+                                                                               \
+    template <typename fnT, typename T1, typename T2>                          \
+    struct StridedFactory                                                      \
+    {                                                                          \
+        fnT get()                                                              \
+        {                                                                      \
+            if constexpr (std::is_same_v<                                      \
+                              typename OutputType<T1, T2>::value_type, void>)  \
+            {                                                                  \
+                fnT fn = nullptr;                                              \
+                return fn;                                                     \
+            }                                                                  \
+            else {                                                             \
+                fnT fn = __name__##_strided_impl<T1, T2>;                      \
+                return fn;                                                     \
+            }                                                                  \
+        }                                                                      \
+    };                                                                         \
+                                                                               \
+    void populate_##__name__##_dispatch_tables(void)                           \
+    {                                                                          \
+        td_ns::DispatchTableBuilder<binary_contig_impl_fn_ptr_t,               \
+                                    ContigFactory, td_ns::num_types>           \
+            dvb1;                                                              \
+        dvb1.populate_dispatch_table(__name__##_contig_dispatch_table);        \
+                                                                               \
+        td_ns::DispatchTableBuilder<binary_strided_impl_fn_ptr_t,              \
+                                    StridedFactory, td_ns::num_types>          \
+            dvb2;                                                              \
+        dvb2.populate_dispatch_table(__name__##_strided_dispatch_table);       \
+                                                                               \
+        td_ns::DispatchTableBuilder<int, TypeMapFactory, td_ns::num_types>     \
+            dvb3;                                                              \
+        dvb3.populate_dispatch_table(__name__##_output_typeid_table);          \
+    };
diff --git a/dpnp/backend/extensions/vm/CMakeLists.txt b/dpnp/backend/extensions/vm/CMakeLists.txt
index ba1e46ea0ed8..de6262581f59 100644
--- a/dpnp/backend/extensions/vm/CMakeLists.txt
+++ b/dpnp/backend/extensions/vm/CMakeLists.txt
@@ -43,6 +43,7 @@ set(_elementwise_sources
     ${CMAKE_CURRENT_SOURCE_DIR}/exp2.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/expm1.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/floor.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/fmod.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/hypot.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/ln.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/log10.cpp
diff --git a/dpnp/backend/extensions/vm/fmod.cpp b/dpnp/backend/extensions/vm/fmod.cpp
new file mode 100644
index 000000000000..e985492de047
--- /dev/null
+++ b/dpnp/backend/extensions/vm/fmod.cpp
@@ -0,0 +1,161 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "fmod.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::fmod<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event fmod_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    py::ssize_t a_offset,
+                                    const char *in_b,
+                                    py::ssize_t b_offset,
+                                    char *out_y,
+                                    py::ssize_t out_offset,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::fmod(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing 1st input vector of size n
+                        b, // pointer `b` containing 2nd input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(fmod);
+} // namespace impl
+
+void init_fmod(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto fmod_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                          const arrayT &src2, const arrayT &dst,
+                          const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_fmod", fmod_pyapi,
+          "Call `fmod` function from OneMKL VM library to performs element "
+          "by element computation of the modulus function of vector `src1` "
+          "with respect to vector `src2` to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto fmod_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                       const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_fmod_to_call", fmod_need_to_call_pyapi,
+          "Check input arguments to answer if `fmod` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/fmod.hpp b/dpnp/backend/extensions/vm/fmod.hpp
new file mode 100644
index 000000000000..492ac8f98899
--- /dev/null
+++ b/dpnp/backend/extensions/vm/fmod.hpp
@@ -0,0 +1,35 @@
+//*****************************************************************************
+// Copyright (c) 2023-2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <pybind11/pybind11.h>
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::vm
+{
+void init_fmod(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/vm_py.cpp b/dpnp/backend/extensions/vm/vm_py.cpp
index 791a8f6d6561..b78ae51ddc30 100644
--- a/dpnp/backend/extensions/vm/vm_py.cpp
+++ b/dpnp/backend/extensions/vm/vm_py.cpp
@@ -46,6 +46,7 @@
 #include "exp2.hpp"
 #include "expm1.hpp"
 #include "floor.hpp"
+#include "fmod.hpp"
 #include "hypot.hpp"
 #include "ln.hpp"
 #include "log10.hpp"
@@ -86,6 +87,7 @@ PYBIND11_MODULE(_vm_impl, m)
     vm_ns::init_exp2(m);
     vm_ns::init_expm1(m);
     vm_ns::init_floor(m);
+    vm_ns::init_fmod(m);
     vm_ns::init_hypot(m);
     vm_ns::init_ln(m);
     vm_ns::init_log10(m);
diff --git a/dpnp/backend/include/dpnp_iface_fptr.hpp b/dpnp/backend/include/dpnp_iface_fptr.hpp
index 0f6ef51bc7ce..1172bcbe4f5f 100644
--- a/dpnp/backend/include/dpnp_iface_fptr.hpp
+++ b/dpnp/backend/include/dpnp_iface_fptr.hpp
@@ -140,12 +140,10 @@ enum class DPNPFuncName : size_t
     DPNP_FN_FLOOR,         /**< Used in numpy.floor() impl  */
     DPNP_FN_FLOOR_DIVIDE,  /**< Used in numpy.floor_divide() impl  */
     DPNP_FN_FMOD,          /**< Used in numpy.fmod() impl  */
-    DPNP_FN_FMOD_EXT,  /**< Used in numpy.fmod() impl, requires extra parameters
-                        */
-    DPNP_FN_FULL,      /**< Used in numpy.full() impl  */
-    DPNP_FN_FULL_LIKE, /**< Used in numpy.full_like() impl  */
-    DPNP_FN_HYPOT,     /**< Used in numpy.hypot() impl  */
-    DPNP_FN_IDENTITY,  /**< Used in numpy.identity() impl  */
+    DPNP_FN_FULL,          /**< Used in numpy.full() impl  */
+    DPNP_FN_FULL_LIKE,     /**< Used in numpy.full_like() impl  */
+    DPNP_FN_HYPOT,         /**< Used in numpy.hypot() impl  */
+    DPNP_FN_IDENTITY,      /**< Used in numpy.identity() impl  */
     DPNP_FN_INITVAL, /**< Used in numpy ones, ones_like, zeros, zeros_like impls
                       */
     DPNP_FN_INITVAL_EXT, /**< Used in numpy ones, ones_like, zeros, zeros_like
diff --git a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
index 122a3ccdedd3..486851516dcf 100644
--- a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
@@ -1401,20 +1401,6 @@ static void func_map_elemwise_2arg_3type_core(func_map_t &fmap)
 template <DPNPFuncType FT1, DPNPFuncType... FTs>
 static void func_map_elemwise_2arg_3type_short_core(func_map_t &fmap)
 {
-    ((fmap[DPNPFuncName::DPNP_FN_FMOD_EXT][FT1][FTs] =
-          {get_floating_res_type<FT1, FTs, std::true_type, std::true_type>(),
-           (void *)
-               dpnp_fmod_c_ext<func_type_map_t::find_type<get_floating_res_type<
-                                   FT1, FTs, std::true_type, std::true_type>()>,
-                               func_type_map_t::find_type<FT1>,
-                               func_type_map_t::find_type<FTs>>,
-           get_floating_res_type<FT1, FTs, std::false_type, std::true_type>(),
-           (void *)dpnp_fmod_c_ext<
-               func_type_map_t::find_type<get_floating_res_type<
-                   FT1, FTs, std::false_type, std::true_type>()>,
-               func_type_map_t::find_type<FT1>,
-               func_type_map_t::find_type<FTs>>}),
-     ...);
     ((fmap[DPNPFuncName::DPNP_FN_MAXIMUM_EXT][FT1][FTs] =
           {get_floating_res_type<FT1, FTs, std::true_type, std::true_type>(),
            (void *)dpnp_maximum_c_ext<
diff --git a/dpnp/backend/kernels/elementwise_functions/fmod.hpp b/dpnp/backend/kernels/elementwise_functions/fmod.hpp
new file mode 100644
index 000000000000..e97b257cb066
--- /dev/null
+++ b/dpnp/backend/kernels/elementwise_functions/fmod.hpp
@@ -0,0 +1,61 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <sycl/sycl.hpp>
+
+namespace dpnp::kernels::fmod
+{
+template <typename argT1, typename argT2, typename resT>
+struct FmodFunctor
+{
+    using supports_sg_loadstore = typename std::true_type;
+    using supports_vec = std::negation<
+        std::conjunction<std::is_integral<argT1>, std::is_integral<argT2>>>;
+
+    resT operator()(const argT1 &in1, const argT2 &in2) const
+    {
+        if constexpr (std::is_integral<argT1>::value &&
+                      std::is_integral<argT2>::value) {
+            if (in2 == argT2(0)) {
+                return resT(0);
+            }
+            return in1 % in2;
+        }
+        else {
+            return sycl::fmod(in1, in2);
+        }
+    }
+
+    template <int vec_sz>
+    sycl::vec<resT, vec_sz>
+        operator()(const sycl::vec<argT1, vec_sz> &in1,
+                   const sycl::vec<argT2, vec_sz> &in2) const
+    {
+        return sycl::fmod(in1, in2);
+    }
+};
+} // namespace dpnp::kernels::fmod
diff --git a/dpnp/dpnp_algo/dpnp_algo.pxd b/dpnp/dpnp_algo/dpnp_algo.pxd
index f6df42981a9f..4e91151697c0 100644
--- a/dpnp/dpnp_algo/dpnp_algo.pxd
+++ b/dpnp/dpnp_algo/dpnp_algo.pxd
@@ -42,7 +42,6 @@ cdef extern from "dpnp_iface_fptr.hpp" namespace "DPNPFuncName":  # need this na
         DPNP_FN_ERF_EXT
         DPNP_FN_FFT_FFT_EXT
         DPNP_FN_FFT_RFFT_EXT
-        DPNP_FN_FMOD_EXT
         DPNP_FN_MAXIMUM_EXT
         DPNP_FN_MEDIAN_EXT
         DPNP_FN_MINIMUM_EXT
diff --git a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
index 405037da7829..fca1e6dc3036 100644
--- a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
+++ b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
@@ -37,7 +37,6 @@ and the rest of the library
 
 __all__ += [
     "dpnp_ediff1d",
-    "dpnp_fmod",
     "dpnp_fmax",
     "dpnp_fmin",
     "dpnp_modf",
@@ -109,14 +108,6 @@ cpdef utils.dpnp_descriptor dpnp_ediff1d(utils.dpnp_descriptor x1):
     return result
 
 
-cpdef utils.dpnp_descriptor dpnp_fmod(utils.dpnp_descriptor x1_obj,
-                                      utils.dpnp_descriptor x2_obj,
-                                      object dtype=None,
-                                      utils.dpnp_descriptor out=None,
-                                      object where=True):
-    return call_fptr_2in_1out_strides(DPNP_FN_FMOD_EXT, x1_obj, x2_obj, dtype, out, where)
-
-
 cpdef utils.dpnp_descriptor dpnp_fmax(utils.dpnp_descriptor x1_obj,
                                          utils.dpnp_descriptor x2_obj,
                                          object dtype=None,
diff --git a/dpnp/dpnp_iface_bitwise.py b/dpnp/dpnp_iface_bitwise.py
index 21ee7cc3d827..6a9c44b813e8 100644
--- a/dpnp/dpnp_iface_bitwise.py
+++ b/dpnp/dpnp_iface_bitwise.py
@@ -65,14 +65,16 @@
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have integer or boolean data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have integer or boolean data
-    type.
+    type. Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -132,14 +134,16 @@
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have integer or boolean data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have integer or boolean data
-    type.
+    type. Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -194,14 +198,16 @@
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have integer or boolean data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have integer or boolean data
-    type.
+    type. Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -264,6 +270,7 @@
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -326,14 +333,17 @@
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have integer data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have integer data type.
-    Each element must be greater than or equal to 0.
+    Each element must be greater than or equal to ``0``.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -384,14 +394,17 @@
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have integer data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have integer data type.
-    Each element must be greater than or equal to 0.
+    Each element must be greater than or equal to ``0``.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
diff --git a/dpnp/dpnp_iface_logic.py b/dpnp/dpnp_iface_logic.py
index 6dfa1a15dccb..70f928306376 100644
--- a/dpnp/dpnp_iface_logic.py
+++ b/dpnp/dpnp_iface_logic.py
@@ -369,10 +369,12 @@ def any(a, /, axis=None, out=None, keepdims=False, *, where=True):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array have the correct shape and the expected data type.
@@ -438,13 +440,16 @@ def any(a, /, axis=None, out=None, keepdims=False, *, where=True):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -501,13 +506,16 @@ def any(a, /, axis=None, out=None, keepdims=False, *, where=True):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -612,6 +620,7 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -671,6 +680,7 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -724,6 +734,7 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -774,13 +785,16 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -837,13 +851,16 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -900,13 +917,16 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -969,6 +989,7 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1017,13 +1038,16 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1082,13 +1106,16 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1145,13 +1172,16 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
diff --git a/dpnp/dpnp_iface_mathematical.py b/dpnp/dpnp_iface_mathematical.py
index 2f34be46312b..1fe7839f5967 100644
--- a/dpnp/dpnp_iface_mathematical.py
+++ b/dpnp/dpnp_iface_mathematical.py
@@ -63,7 +63,6 @@
     dpnp_ediff1d,
     dpnp_fmax,
     dpnp_fmin,
-    dpnp_fmod,
     dpnp_modf,
     dpnp_trapz,
 )
@@ -343,6 +342,7 @@ def _gradient_num_diff_edges(
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -404,13 +404,16 @@ def _gradient_num_diff_edges(
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -480,6 +483,7 @@ def _gradient_num_diff_edges(
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -537,6 +541,7 @@ def around(x, /, decimals=0, out=None):
     out : {None, dpnp.ndarray, usm_ndarray}, optional
         Output array to populate.
         Array must have the correct shape and the expected data type.
+        Default: ``None``.
 
     Returns
     -------
@@ -573,6 +578,7 @@ def around(x, /, decimals=0, out=None):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -699,6 +705,7 @@ def clip(a, a_min, a_max, *, out=None, order="K", **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -762,14 +769,17 @@ def convolve(a, v, mode="full"):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have a real floating-point data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have a real floating-point data
     type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1236,13 +1246,16 @@ def diff(a, n=1, axis=-1, prepend=None, append=None):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1363,6 +1376,7 @@ def ediff1d(x1, to_end=None, to_begin=None):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1411,6 +1425,7 @@ def ediff1d(x1, to_end=None, to_begin=None):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1463,13 +1478,16 @@ def ediff1d(x1, to_end=None, to_begin=None):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1748,116 +1766,78 @@ def fmin(x1, x2, /, out=None, *, where=True, dtype=None, subok=True, **kwargs):
     )
 
 
-def fmod(x1, x2, /, out=None, *, where=True, dtype=None, subok=True, **kwargs):
-    """
-    Returns the element-wise remainder of division.
+_FMOD_DOCSTRING = """
+Calculates the remainder of division for each element `x1_i` of the input array
+`x1` with the respective element `x2_i` of the input array `x2`.
 
-    For full documentation refer to :obj:`numpy.fmod`.
+This function is equivalent to the Matlab(TM) ``rem`` function and should not
+be confused with the Python modulus operator ``x1 % x2``.
 
-    Returns
-    -------
-    out : dpnp.ndarray
-        The remainder of the division of `x1` by `x2`.
+For full documentation refer to :obj:`numpy.fmod`.
 
-    Limitations
-    -----------
-    Parameters `x1` and `x2` are supported as either scalar,
-    :class:`dpnp.ndarray` or :class:`dpctl.tensor.usm_ndarray`, but both `x1`
-    and `x2` can not be scalars at the same time.
-    Parameters `where`, `dtype` and `subok` are supported with their default
-    values.
-    Keyword argument `kwargs` is currently unsupported.
-    Otherwise the function will be executed sequentially on CPU.
-    Input array data types are limited by supported DPNP :ref:`Data types`.
-
-    See Also
-    --------
-    :obj:`dpnp.remainder` : Remainder complementary to floor_divide.
-    :obj:`dpnp.divide` : Standard division.
-
-    Examples
-    --------
-    >>> import dpnp as np
-    >>> a = np.array([-3, -2, -1, 1, 2, 3])
-    >>> np.fmod(a, 2)
-    array([-1,  0, -1,  1,  0,  1])
-    >>> np.remainder(a, 2)
-    array([1, 0, 1, 1, 0, 1])
-
-    >>> a = np.array([5, 3])
-    >>> b = np.array([2, 2.])
-    >>> np.fmod(a, b)
-    array([1., 1.])
-
-    >>> a = np.arange(-3, 3).reshape(3, 2)
-    >>> a
-    array([[-3, -2],
-           [-1,  0],
-           [ 1,  2]])
-    >>> b = np.array([2, 2])
-    >>> np.fmod(a, b)
-    array([[-1,  0],
-           [-1,  0],
-           [ 1,  0]])
+Parameters
+----------
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
+    First input array, expected to have a real-valued data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
+    Second input array, also expected to have a real-valued data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+out : {None, dpnp.ndarray, usm_ndarray}, optional
+    Output array to populate.
+    Array must have the correct shape and the expected data type.
+    Default: ``None``.
+order : {"C", "F", "A", "K"}, optional
+    Memory layout of the newly output array, if parameter `out` is ``None``.
+    Default: ``"K"``.
 
-    """
+Returns
+-------
+out : dpnp.ndarray
+    An array containing the element-wise remainders. The data type of the
+    returned array is determined by the Type Promotion Rules.
 
-    if kwargs:
-        pass
-    elif where is not True:
-        pass
-    elif dtype is not None:
-        pass
-    elif subok is not True:
-        pass
-    elif dpnp.isscalar(x1) and dpnp.isscalar(x2):
-        # at least either x1 or x2 has to be an array
-        pass
-    else:
-        # get USM type and queue to copy scalar from the host memory into
-        # a USM allocation
-        usm_type, queue = (
-            get_usm_allocations([x1, x2])
-            if dpnp.isscalar(x1) or dpnp.isscalar(x2)
-            else (None, None)
-        )
+Limitations
+----------
+Parameters `where` and `subok` are supported with their default values.
+Keyword argument `kwargs` is currently unsupported.
+Otherwise ``NotImplementedError`` exception will be raised.
 
-        x1_desc = dpnp.get_dpnp_descriptor(
-            x1,
-            copy_when_strides=False,
-            copy_when_nondefault_queue=False,
-            alloc_usm_type=usm_type,
-            alloc_queue=queue,
-        )
-        x2_desc = dpnp.get_dpnp_descriptor(
-            x2,
-            copy_when_strides=False,
-            copy_when_nondefault_queue=False,
-            alloc_usm_type=usm_type,
-            alloc_queue=queue,
-        )
-        if x1_desc and x2_desc:
-            if out is not None:
-                if not dpnp.is_supported_array_type(out):
-                    raise TypeError(
-                        "return array must be of supported array type"
-                    )
-                out_desc = (
-                    dpnp.get_dpnp_descriptor(
-                        out, copy_when_nondefault_queue=False
-                    )
-                    or None
-                )
-            else:
-                out_desc = None
+See Also
+--------
+:obj:`dpnp.remainder` : Equivalent to the Python ``%`` operator.
+:obj:`dpnp.divide` : Standard division.
 
-            return dpnp_fmod(
-                x1_desc, x2_desc, dtype=dtype, out=out_desc, where=where
-            ).get_pyobj()
+Examples
+--------
+>>> import dpnp as np
+>>> a = np.array([-3, -2, -1, 1, 2, 3])
+>>> np.fmod(a, 2)
+array([-1,  0, -1,  1,  0,  1])
+>>> np.remainder(a, 2)
+array([1, 0, 1, 1, 0, 1])
+
+>>> np.fmod(np.array([5, 3]), np.array([2, 2.]))
+array([1., 1.])
+>>> a = np.arange(-3, 3).reshape(3, 2)
+>>> a
+array([[-3, -2],
+       [-1,  0],
+       [ 1,  2]])
+>>> np.fmod(a, np.array([2, 2]))
+array([[-1,  0],
+       [-1,  0],
+       [ 1,  0]])
+"""
 
-    return call_origin(
-        numpy.fmod, x1, x2, dtype=dtype, out=out, where=where, **kwargs
-    )
+fmod = DPNPBinaryFunc(
+    "fmod",
+    ufi._fmod_result_type,
+    ufi._fmod,
+    _FMOD_DOCSTRING,
+    mkl_fn_to_call=vmi._mkl_fmod_to_call,
+    mkl_impl_fn=vmi._fmod,
+)
 
 
 def gradient(f, *varargs, axis=None, edge_order=1):
@@ -2074,6 +2054,7 @@ def gradient(f, *varargs, axis=None, edge_order=1):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2124,13 +2105,16 @@ def gradient(f, *varargs, axis=None, edge_order=1):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2196,13 +2180,16 @@ def gradient(f, *varargs, axis=None, edge_order=1):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2298,13 +2285,16 @@ def modf(x1, **kwargs):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2371,6 +2361,7 @@ def modf(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2426,6 +2417,7 @@ def modf(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2481,13 +2473,16 @@ def modf(x1, **kwargs):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
-    Output array to populate. Array must have the correct
-    shape and the expected data type.
+    Output array to populate. Array must have the correct shape and
+    the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2668,6 +2663,7 @@ def prod(
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2744,13 +2740,16 @@ def prod(
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have a real-valued data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have a real-valued data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2826,6 +2825,7 @@ def prod(
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2885,6 +2885,7 @@ def prod(
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 
 Returns
 -------
@@ -2941,6 +2942,7 @@ def prod(
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -2995,6 +2997,7 @@ def prod(
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -3041,13 +3044,16 @@ def prod(
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have numeric data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -3331,6 +3337,7 @@ def trapz(y1, x1=None, dx=1.0, axis=-1):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
diff --git a/dpnp/dpnp_iface_trigonometric.py b/dpnp/dpnp_iface_trigonometric.py
index d38af96ea2cf..4d5703cfc6ce 100644
--- a/dpnp/dpnp_iface_trigonometric.py
+++ b/dpnp/dpnp_iface_trigonometric.py
@@ -121,6 +121,7 @@ def _get_accumulation_res_dt(a, dtype, _out):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -175,6 +176,7 @@ def _get_accumulation_res_dt(a, dtype, _out):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -229,6 +231,7 @@ def _get_accumulation_res_dt(a, dtype, _out):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -282,7 +285,8 @@ def _get_accumulation_res_dt(a, dtype, _out):
     Input array, expected to have numeric data type.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
-    Array must have the correct shape and the expected data type..
+    Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -336,7 +340,8 @@ def _get_accumulation_res_dt(a, dtype, _out):
     Input array, expected to have numeric data type.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
-    Array must have the correct shape and the expected data type..
+    Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -390,15 +395,18 @@ def _get_accumulation_res_dt(a, dtype, _out):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have a real-valued floating-point
     data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have a real-valued
     floating-point data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -466,6 +474,7 @@ def _get_accumulation_res_dt(a, dtype, _out):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -520,6 +529,7 @@ def _get_accumulation_res_dt(a, dtype, _out):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -571,6 +581,7 @@ def _get_accumulation_res_dt(a, dtype, _out):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -624,6 +635,7 @@ def _get_accumulation_res_dt(a, dtype, _out):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -820,6 +832,7 @@ def degrees(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -872,6 +885,7 @@ def degrees(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -927,6 +941,7 @@ def degrees(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -982,13 +997,16 @@ def degrees(x1, **kwargs):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have a real-valued data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have a real-valued data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1049,6 +1067,7 @@ def degrees(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1103,6 +1122,7 @@ def degrees(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1162,6 +1182,7 @@ def degrees(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1221,6 +1242,7 @@ def degrees(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1278,15 +1300,18 @@ def degrees(x1, **kwargs):
 
 Parameters
 ----------
-x1 : {dpnp.ndarray, usm_ndarray}
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
     First input array, expected to have a real-valued floating-point
     data type.
-x2 : {dpnp.ndarray, usm_ndarray}
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
     Second input array, also expected to have a real-valued
     floating-point data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1426,6 +1451,7 @@ def logsumexp(x, /, *, axis=None, dtype=None, keepdims=False, out=None):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1556,6 +1582,7 @@ def reduce_hypot(x, /, *, axis=None, dtype=None, keepdims=False, out=None):
 out : ({None, dpnp.ndarray, usm_ndarray}, optional):
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : ({'C', 'F', 'A', 'K'}, optional):
     Memory layout of the newly output array, if parameter `out` is `None`.
     Default: ``"K"``
@@ -1660,6 +1687,7 @@ def radians(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1713,6 +1741,7 @@ def radians(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1765,6 +1794,7 @@ def radians(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1820,6 +1850,7 @@ def radians(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1874,6 +1905,7 @@ def radians(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
@@ -1927,6 +1959,7 @@ def radians(x1, **kwargs):
 out : {None, dpnp.ndarray, usm_ndarray}, optional
     Output array to populate.
     Array must have the correct shape and the expected data type.
+    Default: ``None``.
 order : {"C", "F", "A", "K"}, optional
     Memory layout of the newly output array, if parameter `out` is ``None``.
     Default: ``"K"``.
diff --git a/tests/test_mathematical.py b/tests/test_mathematical.py
index 6cf52e91deb0..ae2c73748b56 100644
--- a/tests/test_mathematical.py
+++ b/tests/test_mathematical.py
@@ -1040,12 +1040,11 @@ def test_fmax(self, dtype, lhs, rhs):
     def test_fmin(self, dtype, lhs, rhs):
         self._test_mathematical("fmin", dtype, lhs, rhs, check_type=False)
 
-    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @pytest.mark.parametrize(
         "dtype", get_all_dtypes(no_bool=True, no_complex=True)
     )
     def test_fmod(self, dtype, lhs, rhs):
-        if rhs == 0.3:
+        if rhs == 0.3 and not has_support_aspect64():
             """
             Due to accuracy reason, the results are different for `float32` and `float64`
                 >>> numpy.fmod(numpy.array([3.9], dtype=numpy.float32), 0.3)
@@ -1053,7 +1052,7 @@ def test_fmod(self, dtype, lhs, rhs):
 
                 >>> numpy.fmod(numpy.array([3.9], dtype=numpy.float64), 0.3)
                 array([9.53674318e-08])
-            On a gpu without support for `float64`, dpnp produces results similar to the second one.
+            On a gpu without fp64 support, dpnp produces results similar to the second one.
             """
             pytest.skip("Due to accuracy reason, the results are different.")
         self._test_mathematical("fmod", dtype, lhs, rhs, check_type=False)
@@ -1300,6 +1299,52 @@ def test_positive_boolean():
         dpnp.positive(dpnp_a)
 
 
+@pytest.mark.parametrize("dtype", get_float_dtypes(no_float16=False))
+def test_float_remainder_magnitude(dtype):
+    b = numpy.array(1.0, dtype=dtype)
+    a = numpy.nextafter(numpy.array(0.0, dtype=dtype), -b)
+
+    ia = dpnp.array(a)
+    ib = dpnp.array(b)
+
+    result = dpnp.remainder(ia, ib)
+    expected = numpy.remainder(a, b)
+    assert_equal(result, expected)
+
+    result = dpnp.remainder(-ia, -ib)
+    expected = numpy.remainder(-a, -b)
+    assert_equal(result, expected)
+
+
+@pytest.mark.usefixtures("suppress_divide_numpy_warnings")
+@pytest.mark.usefixtures("suppress_invalid_numpy_warnings")
+@pytest.mark.parametrize("func", ["remainder", "fmod"])
+@pytest.mark.parametrize("dtype", get_float_dtypes(no_float16=False))
+@pytest.mark.parametrize(
+    "lhs, rhs",
+    [
+        pytest.param(1.0, 0.0, id="one-zero"),
+        pytest.param(1.0, numpy.inf, id="one-inf"),
+        pytest.param(numpy.inf, 1.0, id="inf-one"),
+        pytest.param(numpy.inf, numpy.inf, id="inf-inf"),
+        pytest.param(numpy.inf, 0.0, id="inf-zero"),
+        pytest.param(1.0, numpy.nan, id="one-nan"),
+        pytest.param(numpy.nan, 0.0, id="nan-zero"),
+        pytest.param(numpy.nan, 1.0, id="nan-one"),
+    ],
+)
+def test_float_remainder_fmod_nans_inf(func, dtype, lhs, rhs):
+    a = numpy.array(lhs, dtype=dtype)
+    b = numpy.array(rhs, dtype=dtype)
+
+    ia = dpnp.array(a)
+    ib = dpnp.array(b)
+
+    result = getattr(dpnp, func)(ia, ib)
+    expected = getattr(numpy, func)(a, b)
+    assert_equal(result, expected)
+
+
 class TestProd:
     @pytest.mark.parametrize("axis", [None, 0, 1, -1, 2, -2, (1, 2), (0, -2)])
     @pytest.mark.parametrize("keepdims", [False, True])
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index 427151dcc518..5dafcfb7582a 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -627,6 +627,7 @@ def test_1in_1out(func, data, usm_type):
         pytest.param("dot", [3 + 2j, 4 + 1j, 5], [1, 2 + 3j, 3]),
         pytest.param("fmax", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
         pytest.param("fmin", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
+        pytest.param("fmod", [5, 3], [2, 2.0]),
         pytest.param(
             "gradient", [1, 2, 4, 7, 11, 16], [0.0, 1.0, 1.5, 3.5, 4.0, 6.0]
         ),
diff --git a/tests/third_party/cupy/core_tests/test_ndarray_math.py b/tests/third_party/cupy/core_tests/test_ndarray_math.py
index 81caf2c8ceb0..40ae44174aee 100644
--- a/tests/third_party/cupy/core_tests/test_ndarray_math.py
+++ b/tests/third_party/cupy/core_tests/test_ndarray_math.py
@@ -3,7 +3,6 @@
 import numpy
 import pytest
 
-import dpnp as cupy
 from tests.helper import has_support_aspect64
 from tests.third_party.cupy import testing
 
@@ -87,7 +86,7 @@ def test_round_halfway_int(self, xp, dtype):
         a -= a.size + 1
         scale = 10 ** abs(self.decimals)
         if self.decimals < 0:
-            a *= xp.array(scale, dtype=dtype)
+            a *= xp.array(scale).astype(dtype)
         a >>= 1
 
         return a.round(self.decimals)
diff --git a/tests/third_party/cupy/math_tests/test_arithmetic.py b/tests/third_party/cupy/math_tests/test_arithmetic.py
index 36593a2a99ef..7a7d10143887 100644
--- a/tests/third_party/cupy/math_tests/test_arithmetic.py
+++ b/tests/third_party/cupy/math_tests/test_arithmetic.py
@@ -1,29 +1,28 @@
 import itertools
-import unittest
 import warnings
 
 import numpy
 import pytest
 
 import dpnp as cupy
-from tests.helper import has_support_aspect64
+from tests.helper import has_support_aspect16, has_support_aspect64
 from tests.third_party.cupy import testing
 
-float_types = list(testing._loops._float_dtypes)
-complex_types = []
-signed_int_types = [numpy.int32, numpy.int64]
-unsigned_int_types = []
+float_types = [numpy.float16, numpy.float32, numpy.float64]
+complex_types = [numpy.complex64, numpy.complex128]
+signed_int_types = [numpy.int8, numpy.int16, numpy.int32, numpy.int64]
+unsigned_int_types = [numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint64]
 int_types = signed_int_types + unsigned_int_types
-all_types = float_types + int_types + complex_types
+all_types = [numpy.bool_] + float_types + int_types + complex_types
+negative_types = [numpy.bool_] + float_types + signed_int_types + complex_types
 negative_types_wo_fp16 = (
     [numpy.bool_]
-    + float_types
+    + [numpy.float32, numpy.float64]
     + [numpy.int16, numpy.int32, numpy.int64]
     + complex_types
 )
-negative_types = float_types + signed_int_types + complex_types
-negative_no_complex_types = float_types + signed_int_types
-no_complex_types = float_types + int_types
+negative_no_complex_types = [numpy.bool_] + float_types + signed_int_types
+no_complex_types = [numpy.bool_] + float_types + int_types
 
 
 @testing.parameterize(
@@ -31,12 +30,7 @@
         testing.product(
             {
                 "nargs": [1],
-                "name": [
-                    "reciprocal",
-                    "conj",
-                    "conjugate",
-                    "angle",
-                ],
+                "name": ["reciprocal", "conj", "conjugate", "angle"],
             }
         )
         + testing.product(
@@ -52,7 +46,6 @@
                     "floor_divide",
                     "fmod",
                     "remainder",
-                    "mod",
                 ],
             }
         )
@@ -128,47 +121,38 @@ class TestArithmeticUnary:
     @testing.numpy_cupy_allclose(atol=1e-5, type_check=has_support_aspect64())
     def test_unary(self, xp):
         arg1 = self.arg1
-        arg1 = xp.asarray(arg1)
+        if isinstance(arg1, numpy.ndarray):
+            arg1 = xp.asarray(arg1)
 
         if self.name in ("reciprocal") and xp is numpy:
             # In NumPy, for integer arguments with absolute value larger than 1 the result is always zero.
             # We need to convert the input data type to float then compare the output with DPNP.
-            if isinstance(arg1, numpy.ndarray) and numpy.issubdtype(
-                arg1.dtype, numpy.integer
-            ):
-                np_dtype = (
-                    numpy.float64 if has_support_aspect64() else numpy.float32
-                )
+            if numpy.issubdtype(arg1.dtype, numpy.integer):
+                if arg1.dtype.char in "bB":  # int8
+                    np_dtype = numpy.float16
+                elif arg1.dtype.char in "hH":  # int16
+                    np_dtype = numpy.float32
+                else:  # int32, int64
+                    if has_support_aspect64():
+                        np_dtype = numpy.float64
+                    else:
+                        np_dtype = numpy.float32
                 arg1 = xp.asarray(arg1, dtype=np_dtype)
 
         if self.name in {"angle"}:
             y = getattr(xp, self.name)(arg1, self.deg)
-            # In NumPy, for boolean arguments the output data type is always default floating data type.
-            # while data type of output in DPNP is determined by Type Promotion Rules.
-            if (
-                isinstance(arg1, cupy.ndarray)
-                and cupy.issubdtype(arg1.dtype, cupy.bool)
-                and has_support_aspect64()
-            ):
-                y = y.astype(cupy.float64)
+            if isinstance(arg1, cupy.ndarray):
+                if arg1.dtype == cupy.bool and has_support_aspect64():
+                    # In NumPy, for boolean input the output data type is always default floating data type.
+                    # while data type of output in DPNP is determined by Type Promotion Rules.
+                    y = y.astype(cupy.float64)
+                elif arg1.dtype.char in "bBe" and has_support_aspect16():
+                    # In NumPy, for int8, uint8 and float16 inputs the output data type is always float16.
+                    # while data type of output in DPNP is float32.
+                    y = y.astype(cupy.float16)
         else:
             y = getattr(xp, self.name)(arg1)
 
-        # if self.name in ("real", "imag"):
-        # Some NumPy functions return Python scalars for Python scalar
-        # inputs.
-        # We need to convert them to arrays to compare with CuPy outputs.
-        # if xp is numpy and isinstance(arg1, (bool, int, float, complex)):
-        #    y = xp.asarray(y)
-
-        # TODO(niboshi): Fix this
-        # numpy.real and numpy.imag return Python int if the input is
-        # Python bool. CuPy should return an array of dtype.int32 or
-        # dtype.int64 (depending on the platform) in such cases, instead
-        # of an array of dtype.bool.
-        # if xp is cupy and isinstance(arg1, bool):
-        #    y = y.astype(int)
-
         return y
 
 
@@ -210,9 +194,61 @@ def test_imag_nocomplex(self, xp, dtype):
         imag = xp.imag(x)
         return imag
 
+    @pytest.mark.skip("'dpnp_array' object has no attribute 'base' yet")
+    @testing.for_complex_dtypes()
+    @testing.numpy_cupy_array_equal()
+    def test_real_ndarray_complex(self, xp, dtype):
+        x = testing.shaped_arange(self.shape, xp, dtype=dtype)
+        x_ = x.copy()
+        real = x_.real
+        # real returns a view
+        assert real.base is x_
+        x_ += 1 + 1j
+        testing.assert_array_equal(real, x.real + 1)
+        return real
+
+    @pytest.mark.skip("'dpnp_array' object has no attribute 'base' yet")
+    @testing.for_complex_dtypes()
+    @testing.numpy_cupy_array_equal()
+    def test_real_complex(self, xp, dtype):
+        x = testing.shaped_arange(self.shape, xp, dtype=dtype)
+        x_ = x.copy()
+        real = xp.real(x_)
+        # real returns a view
+        assert real.base is x_
+        x_ += 1 + 1j
+        testing.assert_array_equal(real, x.real + 1)
+        return real
+
+    @pytest.mark.skip("'dpnp_array' object has no attribute 'base' yet")
+    @testing.for_complex_dtypes()
+    @testing.numpy_cupy_array_equal()
+    def test_imag_ndarray_complex(self, xp, dtype):
+        x = testing.shaped_arange(self.shape, xp, dtype=dtype)
+        x_ = x.copy()
+        imag = x_.imag
+        # imag returns a view
+        assert imag.base is x_
+        x_ += 1 + 1j
+        testing.assert_array_equal(imag, x.imag + 1)
+        return imag
+
+    @pytest.mark.skip("'dpnp_array' object has no attribute 'base' yet")
+    @testing.for_complex_dtypes()
+    @testing.numpy_cupy_array_equal()
+    def test_imag_complex(self, xp, dtype):
+        x = testing.shaped_arange(self.shape, xp, dtype=dtype)
+        x_ = x.copy()
+        imag = xp.imag(x_)
+        # imag returns a view
+        assert imag.base is x_
+        x_ += 1 + 1j
+        testing.assert_array_equal(imag, x.imag + 1)
+        return imag
+
 
 class ArithmeticBinaryBase:
-    @testing.numpy_cupy_allclose(atol=1e-4, type_check=False)
+    @testing.numpy_cupy_allclose(rtol=1e-4, type_check=has_support_aspect64())
     def check_binary(self, xp):
         arg1 = self.arg1
         arg2 = self.arg2
@@ -221,15 +257,37 @@ def check_binary(self, xp):
         dtype1 = np1.dtype
         dtype2 = np2.dtype
 
-        # TODO(niboshi): Fix this: xp.add(0j, xp.array([2.], 'f')).dtype
-        #     numpy => complex64
-        # #     cupy => complex128
-        # if isinstance(arg1, complex):
-        #     if dtype2 in (numpy.float16, numpy.float32):
-        #         return xp.array(True)
-
-        arg1 = xp.asarray(arg1)
-        arg2 = xp.asarray(arg2)
+        if xp.isscalar(arg1) and xp.isscalar(arg2):
+            pytest.skip("both scalar inputs is not supported")
+
+        if self.name == "power":
+            # TODO(niboshi): Fix this: power(0, 1j)
+            #     numpy => 1+0j
+            #     cupy => 0j
+            if dtype2 in complex_types and (np1 == 0).any():
+                return xp.array(True)
+            # TODO: Fix this: power(0j, 0)
+            #     numpy => 1+0j
+            #     cupy => nan+nanj
+            elif dtype1 in complex_types and (np2 == 0).any():
+                return xp.array(True)
+
+        if self.name in ("true_divide", "floor_divide", "fmod", "remainder"):
+            if dtype1.kind in "u" and xp.isscalar(arg2) and arg2 < 0:
+                # TODO: Fix this: array(3, dtype=uint) / -2
+                #     numpy => -1.5
+                #     cupy => 0.01181102
+                pytest.skip("due to dpctl gh-1711")
+            if dtype2.kind in "u" and xp.isscalar(arg1) and arg1 < 0:
+                # TODO: Fix this: 2 / array(3, dtype=uint)
+                #     numpy => -0.666667
+                #     cupy => 84.666667
+                pytest.skip("due to dpctl gh-1711")
+
+        if isinstance(arg1, numpy.ndarray):
+            arg1 = xp.asarray(arg1)
+        if isinstance(arg2, numpy.ndarray):
+            arg2 = xp.asarray(arg2)
 
         # Subtraction between booleans is not allowed.
         if (
@@ -255,15 +313,6 @@ def check_binary(self, xp):
             if dtype1 in (numpy.float16, numpy.float32):
                 y = y.astype(numpy.complex64)
 
-        # NumPy returns an output array of another type than DPNP when input ones have different types.
-        if xp is numpy and dtype1 != dtype2:
-            is_array_arg1 = not xp.isscalar(arg1)
-            is_array_arg2 = not xp.isscalar(arg2)
-
-            is_int_float = lambda _x, _y: numpy.issubdtype(
-                _x, numpy.integer
-            ) and numpy.issubdtype(_y, numpy.floating)
-
         return y
 
 
@@ -271,16 +320,17 @@ def check_binary(self, xp):
     *(
         testing.product(
             {
+                # TODO(unno): boolean subtract causes DeprecationWarning in numpy>=1.13
                 "arg1": [
                     testing.shaped_arange((2, 3), numpy, dtype=d)
                     for d in all_types
                 ]
-                + [0, 0.0, 2, 2.0],
+                + [0, 0.0, 0j, 2, 2.0, 2j, True, False],
                 "arg2": [
                     testing.shaped_reverse_arange((2, 3), numpy, dtype=d)
                     for d in all_types
                 ]
-                + [0, 0.0, 2, 2.0],
+                + [0, 0.0, 0j, 2, 2.0, 2j, True, False],
                 "name": ["add", "multiply", "power", "subtract"],
             }
         )
@@ -290,19 +340,18 @@ def check_binary(self, xp):
                     numpy.array([-3, -2, -1, 1, 2, 3], dtype=d)
                     for d in negative_types
                 ]
-                + [0, 0.0, 2, 2.0, -2, -2.0],
+                + [0, 0.0, 0j, 2, 2.0, 2j, -2, -2.0, -2j, True, False],
                 "arg2": [
                     numpy.array([-3, -2, -1, 1, 2, 3], dtype=d)
                     for d in negative_types
                 ]
-                + [0, 0.0, 2, 2.0, -2, -2.0],
+                + [0, 0.0, 0j, 2, 2.0, 2j, -2, -2.0, -2j, True, False],
                 "name": ["divide", "true_divide", "subtract"],
             }
         )
     )
 )
-@pytest.mark.usefixtures("allow_fall_back_on_numpy")
-class TestArithmeticBinary(ArithmeticBinaryBase, unittest.TestCase):
+class TestArithmeticBinary(ArithmeticBinaryBase):
     def test_binary(self):
         self.use_dtype = False
         self.check_binary()
@@ -311,19 +360,36 @@ def test_binary(self):
 @testing.parameterize(
     *(
         testing.product(
+            {
+                "arg1": [
+                    numpy.array([3, 2, 1, 1, 2, 3], dtype=d)
+                    for d in unsigned_int_types
+                ]
+                + [0, 0.0, 2, 2.0, -2, -2.0, True, False],
+                "arg2": [
+                    numpy.array([3, 2, 1, 1, 2, 3], dtype=d)
+                    for d in unsigned_int_types
+                ]
+                + [0, 0.0, 2, 2.0, -2, -2.0, True, False],
+                "name": ["true_divide"],
+                "dtype": [cupy.default_float_type()],
+                "use_dtype": [True, False],
+            }
+        )
+        + testing.product(
             {
                 "arg1": [
                     numpy.array([-3, -2, -1, 1, 2, 3], dtype=d)
-                    for d in int_types
+                    for d in signed_int_types
                 ]
-                + [0, 0.0, 2, 2.0, -2, -2.0],
+                + [0, 0.0, 2, 2.0, -2, -2.0, True, False],
                 "arg2": [
                     numpy.array([-3, -2, -1, 1, 2, 3], dtype=d)
-                    for d in int_types
+                    for d in signed_int_types
                 ]
-                + [0, 0.0, 2, 2.0, -2, -2.0],
+                + [0, 0.0, 2, 2.0, -2, -2.0, True, False],
                 "name": ["true_divide"],
-                "dtype": float_types,
+                "dtype": [cupy.default_float_type()],
                 "use_dtype": [True, False],
             }
         )
@@ -340,7 +406,7 @@ def test_binary(self):
                 ]
                 + [0.0, 2.0, -2.0],
                 "name": ["power", "true_divide", "subtract"],
-                "dtype": float_types,
+                "dtype": [cupy.default_float_type()],
                 "use_dtype": [True, False],
             }
         )
@@ -350,14 +416,14 @@ def test_binary(self):
                     testing.shaped_arange((2, 3), numpy, dtype=d)
                     for d in no_complex_types
                 ]
-                + [0, 0.0, 2, 2.0, -2, -2.0],
+                + [0, 0.0, 2, 2.0, -2, -2.0, True, False],
                 "arg2": [
                     testing.shaped_reverse_arange((2, 3), numpy, dtype=d)
                     for d in no_complex_types
                 ]
-                + [0, 0.0, 2, 2.0, -2, -2.0],
-                "name": ["floor_divide", "fmod", "remainder", "mod"],
-                "dtype": float_types,
+                + [0, 0.0, 2, 2.0, -2, -2.0, True, False],
+                "name": ["floor_divide", "fmod", "remainder"],
+                "dtype": [cupy.default_float_type()],
                 "use_dtype": [True, False],
             }
         )
@@ -367,31 +433,229 @@ def test_binary(self):
                     numpy.array([-3, -2, -1, 1, 2, 3], dtype=d)
                     for d in negative_no_complex_types
                 ]
-                + [0, 0.0, 2, 2.0, -2, -2.0],
+                + [0, 0.0, 2, 2.0, -2, -2.0, True, False],
                 "arg2": [
                     numpy.array([-3, -2, -1, 1, 2, 3], dtype=d)
                     for d in negative_no_complex_types
                 ]
-                + [0, 0.0, 2, 2.0, -2, -2.0],
-                "name": ["floor_divide", "fmod", "remainder", "mod"],
-                "dtype": float_types,
+                + [0, 0.0, 2, 2.0, -2, -2.0, True, False],
+                "name": ["floor_divide", "fmod", "remainder"],
+                "dtype": [cupy.default_float_type()],
                 "use_dtype": [True, False],
             }
         )
     )
 )
-@pytest.mark.usefixtures("allow_fall_back_on_numpy")
-class TestArithmeticBinary2(ArithmeticBinaryBase, unittest.TestCase):
+class TestArithmeticBinary2(ArithmeticBinaryBase):
     def test_binary(self):
-        if (
-            self.use_dtype
-            and numpy.lib.NumpyVersion(numpy.__version__) < "1.10.0"
-        ):
-            raise unittest.SkipTest("NumPy>=1.10")
         self.check_binary()
 
 
-class TestArithmeticModf(unittest.TestCase):
+@pytest.mark.skip("'casting' keyword is not supported yet")
+class UfuncTestBase:
+    @testing.numpy_cupy_allclose(accept_error=TypeError)
+    def check_casting_out(self, in0_type, in1_type, out_type, casting, xp):
+        a = testing.shaped_arange((2, 3), xp, in0_type)
+        b = testing.shaped_arange((2, 3), xp, in1_type)
+        c = xp.zeros((2, 3), out_type)
+        if casting != "unsafe":
+            # may raise TypeError
+            return xp.add(a, b, out=c, casting=casting)
+
+        with warnings.catch_warnings(record=True) as ws:
+            warnings.simplefilter("always")
+            ret = xp.add(a, b, out=c, casting=casting)
+        ws = [w.category for w in ws]
+        assert all([w == numpy.ComplexWarning for w in ws]), str(ws)
+        return ret, xp.array(len(ws))
+
+    @testing.numpy_cupy_allclose(accept_error=TypeError)
+    def check_casting_dtype(self, in0_type, in1_type, dtype, casting, xp):
+        a = testing.shaped_arange((2, 3), xp, in0_type)
+        b = testing.shaped_arange((2, 3), xp, in1_type)
+        if casting != "unsafe":
+            # may raise TypeError
+            return xp.add(a, b, dtype=dtype, casting=casting)
+
+        with warnings.catch_warnings(record=True) as ws:
+            warnings.simplefilter("always")
+            ret = xp.add(a, b, dtype=dtype, casting="unsafe")
+        ws = [w.category for w in ws]
+        assert all([w == numpy.ComplexWarning for w in ws]), str(ws)
+        return ret, xp.array(len(ws))
+
+    # delete this, once check_casting_dtype passes
+    @testing.numpy_cupy_allclose()
+    def check_casting_dtype_unsafe_ignore_warnings(
+        self, in0_type, in1_type, dtype, xp
+    ):
+        a = testing.shaped_arange((2, 3), xp, in0_type)
+        b = testing.shaped_arange((2, 3), xp, in1_type)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            return xp.add(a, b, dtype=dtype, casting="unsafe")
+
+
+class TestUfunc(UfuncTestBase):
+    @pytest.mark.parametrize(
+        "casting",
+        [
+            "no",
+            "equiv",
+            "safe",
+            "same_kind",
+            "unsafe",
+        ],
+    )
+    @testing.for_all_dtypes_combination(names=["in_type", "out_type"])
+    def test_casting_out_only(self, in_type, out_type, casting):
+        self.check_casting_out(in_type, in_type, out_type, casting)
+
+    @pytest.mark.parametrize(
+        "casting",
+        [
+            pytest.param("no", marks=pytest.mark.skip("flaky xfail")),
+            pytest.param("equiv", marks=pytest.mark.skip("flaky xfail")),
+            "safe",
+            "same_kind",
+            "unsafe",
+        ],
+    )
+    @testing.for_all_dtypes_combination(
+        names=["in0_type", "in1_type", "out_type"], full=False
+    )
+    def test_casting_in_out(self, in0_type, in1_type, out_type, casting):
+        self.check_casting_out(in0_type, in1_type, out_type, casting)
+
+    @pytest.mark.xfail()
+    @pytest.mark.parametrize(
+        "casting",
+        [
+            "no",
+            "equiv",
+        ],
+    )
+    @pytest.mark.parametrize(
+        ("in0_type", "in1_type", "out_type"),
+        [
+            (numpy.int16, numpy.int32, numpy.int32),
+        ],
+    )
+    def test_casting_in_xfail1(self, in0_type, in1_type, out_type, casting):
+        self.check_casting_out(in0_type, in1_type, out_type, casting)
+
+    @pytest.mark.skip("flaky xfail")
+    @pytest.mark.parametrize(
+        "casting",
+        [
+            "no",
+            "equiv",
+            "safe",
+            "same_kind",
+            "unsafe",
+        ],
+    )
+    @testing.for_all_dtypes_combination(
+        names=["in0_type", "in1_type", "dtype"], full=False
+    )
+    def test_casting_dtype(self, in0_type, in1_type, dtype, casting):
+        self.check_casting_dtype(in0_type, in1_type, dtype, casting)
+
+    @pytest.mark.xfail()
+    @pytest.mark.parametrize(
+        "casting",
+        [
+            "no",
+            "equiv",
+        ],
+    )
+    @pytest.mark.parametrize(
+        ("in0_type", "in1_type", "dtype"),
+        [
+            (numpy.int16, numpy.int32, numpy.int32),
+        ],
+    )
+    def test_casting_dtype_xfail1(self, in0_type, in1_type, dtype, casting):
+        self.check_casting_dtype(in0_type, in1_type, dtype, casting)
+
+    @pytest.mark.xfail()
+    @pytest.mark.parametrize(
+        "casting",
+        [
+            "no",
+            "equiv",
+            "safe",
+            "same_kind",
+        ],
+    )
+    @pytest.mark.parametrize(
+        ("in0_type", "in1_type", "dtype"),
+        [
+            (numpy.int32, numpy.int32, numpy.bool_),
+            (numpy.float64, numpy.float64, numpy.int32),
+        ],
+    )
+    def test_casting_dtype_xfail2(self, in0_type, in1_type, dtype, casting):
+        self.check_casting_dtype(in0_type, in1_type, dtype, casting)
+
+    @testing.for_all_dtypes_combination(
+        names=["in0_type", "in1_type", "dtype"], full=False
+    )
+    def test_casting_dtype_unsafe_ignore_warnings(
+        self, in0_type, in1_type, dtype
+    ):
+        self.check_casting_dtype_unsafe_ignore_warnings(
+            in0_type, in1_type, dtype
+        )
+
+
+@testing.slow
+class TestUfuncSlow(UfuncTestBase):
+    @pytest.mark.parametrize(
+        "casting",
+        [
+            pytest.param("no", marks=pytest.mark.xfail()),
+            pytest.param("equiv", marks=pytest.mark.xfail()),
+            "safe",
+            "same_kind",
+            "unsafe",
+        ],
+    )
+    @testing.for_all_dtypes_combination(
+        names=["in0_type", "in1_type", "out_type"], full=True
+    )
+    def test_casting_out(self, in0_type, in1_type, out_type, casting):
+        self.check_casting_out(in0_type, in1_type, out_type, casting)
+
+    @pytest.mark.xfail()
+    @pytest.mark.parametrize(
+        "casting",
+        [
+            "no",
+            "equiv",
+            "safe",
+            "same_kind",
+            "unsafe",
+        ],
+    )
+    @testing.for_all_dtypes_combination(
+        names=["in0_type", "in1_type", "dtype"], full=True
+    )
+    def test_casting_dtype(self, in0_type, in1_type, dtype, casting):
+        self.check_casting_dtype(in0_type, in1_type, dtype, casting)
+
+    @testing.for_all_dtypes_combination(
+        names=["in0_type", "in1_type", "dtype"], full=True
+    )
+    def test_casting_dtype_unsafe_ignore_warnings(
+        self, in0_type, in1_type, dtype
+    ):
+        self.check_casting_dtype_unsafe_ignore_warnings(
+            in0_type, in1_type, dtype
+        )
+
+
+class TestArithmeticModf:
     @testing.for_float_dtypes()
     @testing.numpy_cupy_allclose()
     def test_modf(self, xp, dtype):
@@ -406,11 +670,9 @@ def test_modf(self, xp, dtype):
 @testing.parameterize(
     *testing.product({"xp": [numpy, cupy], "shape": [(3, 2), (), (3, 0, 2)]})
 )
-class TestBoolSubtract(unittest.TestCase):
+class TestBoolSubtract:
     def test_bool_subtract(self):
         xp = self.xp
-        if xp is numpy and not testing.numpy_satisfies(">=1.14.0"):
-            raise unittest.SkipTest("NumPy<1.14.0")
         shape = self.shape
         x = testing.shaped_random(shape, xp, dtype=numpy.bool_)
         y = testing.shaped_random(shape, xp, dtype=numpy.bool_)

From 0b7c230a8c3880e628d5f2b860ecd67d1966e850 Mon Sep 17 00:00:00 2001
From: vlad-perevezentsev <vladislav.perevezentsev@intel.com>
Date: Thu, 27 Jun 2024 00:49:13 +0200
Subject: [PATCH 35/49] Implement `dpnp.isneginf` and `dpnp.isposinf` (#1888)

* Implement dpnp.isneginf()

* Add tests for dpnp.isneginf()

* Implement dpnp.isposinf()

* Add tests for dpnp.isposinf()

* Add new functions to gen docs

* Add additional checks

* Add test_infinity_sign_errors

* Add sycl_queue/usm tests for logic functions

* Update tests

* Remove out dtype check

* Add TODO with support different out dtype

* Update test_logic_op_2in

---------

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 doc/reference/logic.rst                       |   2 +
 dpnp/dpnp_iface_logic.py                      | 146 ++++++++++++++++++
 tests/test_logic.py                           |  47 ++++++
 tests/test_sycl_queue.py                      |  83 ++++++++++
 tests/test_usm_type.py                        |  34 ++--
 .../cupy/logic_tests/test_content.py          |  13 ++
 6 files changed, 314 insertions(+), 11 deletions(-)

diff --git a/doc/reference/logic.rst b/doc/reference/logic.rst
index f5b3e646e662..57133259c710 100644
--- a/doc/reference/logic.rst
+++ b/doc/reference/logic.rst
@@ -26,6 +26,8 @@ Infinities and NaNs
    dpnp.isfinite
    dpnp.isinf
    dpnp.isnan
+   dpnp.isneginf
+   dpnp.isposinf
 
 
 Array type testing
diff --git a/dpnp/dpnp_iface_logic.py b/dpnp/dpnp_iface_logic.py
index 70f928306376..8203d4e2ad12 100644
--- a/dpnp/dpnp_iface_logic.py
+++ b/dpnp/dpnp_iface_logic.py
@@ -66,6 +66,8 @@
     "isfinite",
     "isinf",
     "isnan",
+    "isneginf",
+    "isposinf",
     "less",
     "less_equal",
     "logical_and",
@@ -777,6 +779,150 @@ def isclose(x1, x2, rtol=1e-05, atol=1e-08, equal_nan=False):
 )
 
 
+def isneginf(x, out=None):
+    """
+    Test element-wise for negative infinity, return result as bool array.
+
+    For full documentation refer to :obj:`numpy.isneginf`.
+
+    Parameters
+    ----------
+    x : {dpnp.ndarray, usm_ndarray}
+        Input array.
+    out : {None, dpnp.ndarray, usm_ndarray}, optional
+        A location into which the result is stored. If provided, it must have a
+        shape that the input broadcasts to and a boolean data type.
+        If not provided or ``None``, a freshly-allocated boolean array
+        is returned.
+        Default: ``None``.
+
+    Returns
+    -------
+    out : dpnp.ndarray
+        Boolean array of same shape as ``x``.
+
+    See Also
+    --------
+    :obj:`dpnp.isinf` : Test element-wise for positive or negative infinity.
+    :obj:`dpnp.isposinf` : Test element-wise for positive infinity,
+                            return result as bool array.
+    :obj:`dpnp.isnan` : Test element-wise for NaN and
+                    return result as a boolean array.
+    :obj:`dpnp.isfinite` : Test element-wise for finiteness.
+
+    Examples
+    --------
+    >>> import dpnp as np
+    >>> x = np.array(np.inf)
+    >>> np.isneginf(-x)
+    array(True)
+    >>> np.isneginf(x)
+    array(False)
+
+    >>> x = np.array([-np.inf, 0., np.inf])
+    >>> np.isneginf(x)
+    array([ True, False, False])
+
+    >>> x = np.array([-np.inf, 0., np.inf])
+    >>> y = np.zeros(x.shape, dtype='bool')
+    >>> np.isneginf(x, y)
+    array([ True, False, False])
+    >>> y
+    array([ True, False, False])
+
+    """
+
+    dpnp.check_supported_arrays_type(x)
+
+    if out is not None:
+        dpnp.check_supported_arrays_type(out)
+
+    x_dtype = x.dtype
+    if dpnp.issubdtype(x_dtype, dpnp.complexfloating):
+        raise TypeError(
+            f"This operation is not supported for {x_dtype} values "
+            "because it would be ambiguous."
+        )
+
+    is_inf = dpnp.isinf(x)
+    signbit = dpnp.signbit(x)
+
+    # TODO: support different out dtype #1717(dpctl)
+    return dpnp.logical_and(is_inf, signbit, out=out)
+
+
+def isposinf(x, out=None):
+    """
+    Test element-wise for positive infinity, return result as bool array.
+
+    For full documentation refer to :obj:`numpy.isposinf`.
+
+    Parameters
+    ----------
+    x : {dpnp.ndarray, usm_ndarray}
+        Input array.
+    out : {None, dpnp.ndarray, usm_ndarray}, optional
+        A location into which the result is stored. If provided, it must have a
+        shape that the input broadcasts to and a boolean data type.
+        If not provided or ``None``, a freshly-allocated boolean array
+        is returned.
+        Default: ``None``.
+
+    Returns
+    -------
+    out : dpnp.ndarray
+        Boolean array of same shape as ``x``.
+
+    See Also
+    --------
+    :obj:`dpnp.isinf` : Test element-wise for positive or negative infinity.
+    :obj:`dpnp.isneginf` : Test element-wise for negative infinity,
+                            return result as bool array.
+    :obj:`dpnp.isnan` : Test element-wise for NaN and
+                    return result as a boolean array.
+    :obj:`dpnp.isfinite` : Test element-wise for finiteness.
+
+    Examples
+    --------
+    >>> import dpnp as np
+    >>> x = np.array(np.inf)
+    >>> np.isposinf(x)
+    array(True)
+    >>> np.isposinf(-x)
+    array(False)
+
+    >>> x = np.array([-np.inf, 0., np.inf])
+    >>> np.isposinf(x)
+    array([False, False,  True])
+
+    >>> x = np.array([-np.inf, 0., np.inf])
+    >>> y = np.zeros(x.shape, dtype='bool')
+    >>> np.isposinf(x, y)
+    array([False, False,  True])
+    >>> y
+    array([False, False,  True])
+
+    """
+
+    dpnp.check_supported_arrays_type(x)
+
+    if out is not None:
+        dpnp.check_supported_arrays_type(out)
+
+    x_dtype = x.dtype
+    if dpnp.issubdtype(x_dtype, dpnp.complexfloating):
+        raise TypeError(
+            f"This operation is not supported for {x_dtype} values "
+            "because it would be ambiguous."
+        )
+
+    is_inf = dpnp.isinf(x)
+    signbit = ~dpnp.signbit(x)
+
+    # TODO: support different out dtype #1717(dpctl)
+    return dpnp.logical_and(is_inf, signbit, out=out)
+
+
 _LESS_DOCSTRING = """
 Computes the less-than test results for each element `x1_i` of
 the input array `x1` with the respective element `x2_i` of the input array `x2`.
diff --git a/tests/test_logic.py b/tests/test_logic.py
index e4f103e22c26..2b5e68d7d726 100644
--- a/tests/test_logic.py
+++ b/tests/test_logic.py
@@ -7,6 +7,7 @@
 from .helper import (
     get_all_dtypes,
     get_float_complex_dtypes,
+    get_float_dtypes,
 )
 
 
@@ -432,3 +433,49 @@ def test_finite(op, data, dtype):
     dpnp_res = getattr(dpnp, op)(x, out=dp_out)
     assert dp_out is dpnp_res
     assert_equal(dpnp_res, np_res)
+
+
+@pytest.mark.parametrize("func", ["isneginf", "isposinf"])
+@pytest.mark.parametrize(
+    "data",
+    [
+        [dpnp.inf, -1, 0, 1, dpnp.nan, -dpnp.inf],
+        [[dpnp.inf, dpnp.nan], [dpnp.nan, 0], [1, -dpnp.inf]],
+    ],
+    ids=[
+        "1D array",
+        "2D array",
+    ],
+)
+@pytest.mark.parametrize("dtype", get_float_dtypes())
+def test_infinity_sign(func, data, dtype):
+    x = dpnp.asarray(data, dtype=dtype)
+    np_res = getattr(numpy, func)(x.asnumpy())
+    dpnp_res = getattr(dpnp, func)(x)
+    assert_equal(dpnp_res, np_res)
+
+    dp_out = dpnp.empty(np_res.shape, dtype=dpnp.bool)
+    dpnp_res = getattr(dpnp, func)(x, out=dp_out)
+    assert dp_out is dpnp_res
+    assert_equal(dpnp_res, np_res)
+
+
+@pytest.mark.parametrize("func", ["isneginf", "isposinf"])
+def test_infinity_sign_errors(func):
+    data = [dpnp.inf, 0, -dpnp.inf]
+
+    # unsupported data type
+    x = dpnp.asarray(data, dtype="c8")
+    x_np = dpnp.asnumpy(x)
+    assert_raises(TypeError, getattr(dpnp, func), x)
+    assert_raises(TypeError, getattr(numpy, func), x_np)
+
+    # unsupported type
+    assert_raises(TypeError, getattr(dpnp, func), data)
+    assert_raises(TypeError, getattr(dpnp, func), x_np)
+
+    # unsupported `out` data type
+    x = dpnp.asarray(data, dtype=dpnp.default_float_type())
+    out = dpnp.empty_like(x, dtype="int32")
+    with pytest.raises(ValueError):
+        getattr(dpnp, func)(x, out=out)
diff --git a/tests/test_sycl_queue.py b/tests/test_sycl_queue.py
index 3349c0134289..378ecaf9b197 100644
--- a/tests/test_sycl_queue.py
+++ b/tests/test_sycl_queue.py
@@ -501,6 +501,40 @@ def test_1in_1out(func, data, device):
     assert_sycl_queue_equal(result_queue, expected_queue)
 
 
+@pytest.mark.parametrize(
+    "op",
+    [
+        "all",
+        "any",
+        "isfinite",
+        "isinf",
+        "isnan",
+        "isneginf",
+        "isposinf",
+        "logical_not",
+    ],
+)
+@pytest.mark.parametrize(
+    "device",
+    valid_devices,
+    ids=[device.filter_string for device in valid_devices],
+)
+def test_logic_op_1in(op, device):
+    x = dpnp.array(
+        [-dpnp.inf, -1.0, 0.0, 1.0, dpnp.inf, dpnp.nan], device=device
+    )
+    result = getattr(dpnp, op)(x)
+
+    x_orig = dpnp.asnumpy(x)
+    expected = getattr(numpy, op)(x_orig)
+    assert_dtype_allclose(result, expected)
+
+    expected_queue = x.get_array().sycl_queue
+    result_queue = result.get_array().sycl_queue
+
+    assert_sycl_queue_equal(result_queue, expected_queue)
+
+
 @pytest.mark.parametrize(
     "device",
     valid_devices,
@@ -705,6 +739,55 @@ def test_2in_1out(func, data1, data2, device):
     assert_sycl_queue_equal(result.sycl_queue, x2.sycl_queue)
 
 
+@pytest.mark.parametrize(
+    "op",
+    [
+        "equal",
+        "greater",
+        "greater_equal",
+        # TODO: unblock when dpnp.isclose() is updated
+        # "isclose",
+        "less",
+        "less_equal",
+        "logical_and",
+        "logical_or",
+        "logical_xor",
+        "not_equal",
+    ],
+)
+@pytest.mark.parametrize(
+    "device",
+    valid_devices,
+    ids=[device.filter_string for device in valid_devices],
+)
+def test_logic_op_2in(op, device):
+    x1 = dpnp.array(
+        [-dpnp.inf, -1.0, 0.0, 1.0, dpnp.inf, dpnp.nan], device=device
+    )
+    x2 = dpnp.array(
+        [dpnp.inf, 1.0, 0.0, -1.0, -dpnp.inf, dpnp.nan], device=device
+    )
+    # Remove NaN value from input arrays because numpy raises RuntimeWarning
+    if op in [
+        "greater",
+        "greater_equal",
+        "less",
+        "less_equal",
+    ]:
+        x1 = x1[:-1]
+        x2 = x2[:-1]
+    result = getattr(dpnp, op)(x1, x2)
+
+    x1_orig = dpnp.asnumpy(x1)
+    x2_orig = dpnp.asnumpy(x2)
+    expected = getattr(numpy, op)(x1_orig, x2_orig)
+
+    assert_dtype_allclose(result, expected)
+
+    assert_sycl_queue_equal(result.sycl_queue, x1.sycl_queue)
+    assert_sycl_queue_equal(result.sycl_queue, x2.sycl_queue)
+
+
 @pytest.mark.parametrize(
     "func, data, scalar",
     [
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index 5dafcfb7582a..8d43bccd75a0 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -357,20 +357,32 @@ def test_tril_triu(func, usm_type):
 @pytest.mark.parametrize(
     "op",
     [
-        "equal",
-        "greater",
-        "greater_equal",
-        "less",
-        "less_equal",
-        "logical_and",
-        "logical_or",
-        "logical_xor",
-        "not_equal",
+        "all",
+        "any",
+        "isfinite",
+        "isinf",
+        "isnan",
+        "isneginf",
+        "isposinf",
+        "logical_not",
     ],
-    ids=[
+)
+@pytest.mark.parametrize("usm_type_x", list_of_usm_types, ids=list_of_usm_types)
+def test_coerced_usm_types_logic_op_1in(op, usm_type_x):
+    x = dp.arange(-10, 10, usm_type=usm_type_x)
+    res = getattr(dp, op)(x)
+
+    assert x.usm_type == res.usm_type == usm_type_x
+
+
+@pytest.mark.parametrize(
+    "op",
+    [
         "equal",
         "greater",
         "greater_equal",
+        # TODO: unblock when dpnp.isclose() is updated
+        # "isclose",
         "less",
         "less_equal",
         "logical_and",
@@ -381,7 +393,7 @@ def test_tril_triu(func, usm_type):
 )
 @pytest.mark.parametrize("usm_type_x", list_of_usm_types, ids=list_of_usm_types)
 @pytest.mark.parametrize("usm_type_y", list_of_usm_types, ids=list_of_usm_types)
-def test_coerced_usm_types_logic_op(op, usm_type_x, usm_type_y):
+def test_coerced_usm_types_logic_op_2in(op, usm_type_x, usm_type_y):
     x = dp.arange(100, usm_type=usm_type_x)
     y = dp.arange(100, usm_type=usm_type_y)[::-1]
 
diff --git a/tests/third_party/cupy/logic_tests/test_content.py b/tests/third_party/cupy/logic_tests/test_content.py
index fe2446d68b2c..3f0a88c6781c 100644
--- a/tests/third_party/cupy/logic_tests/test_content.py
+++ b/tests/third_party/cupy/logic_tests/test_content.py
@@ -29,3 +29,16 @@ def test_isinf(self):
 
     def test_isnan(self):
         self.check_unary_nan("isnan")
+
+
+class TestUfuncLike(unittest.TestCase):
+    @testing.numpy_cupy_array_equal()
+    def check_unary(self, name, xp):
+        a = xp.array([-3, xp.inf, -1, -xp.inf, 0, 1, 2, xp.nan])
+        return getattr(xp, name)(a)
+
+    def test_isneginf(self):
+        self.check_unary("isneginf")
+
+    def test_isposinf(self):
+        self.check_unary("isposinf")

From acb74b98d8e3c22c4c6880d156ef23c4d835efdf Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Thu, 27 Jun 2024 13:52:03 +0200
Subject: [PATCH 36/49] Remove MKL_VERSION_2024 variable from cmake files
 (#1889)

* Get rid of MKL_VERSION_2024 variable

* Return back a quite discovery
---
 CMakeLists.txt                                | 13 +++----------
 dpnp/backend/CMakeLists.txt                   |  7 +------
 dpnp/backend/extensions/blas/CMakeLists.txt   |  6 +-----
 dpnp/backend/extensions/lapack/CMakeLists.txt |  6 +-----
 dpnp/backend/extensions/vm/CMakeLists.txt     |  6 +-----
 5 files changed, 7 insertions(+), 31 deletions(-)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9d061b8020c6..dfcb16674387 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -26,18 +26,11 @@ endif()
 set(MKL_ARCH "intel64")
 set(MKL_LINK "dynamic")
 set(MKL_THREADING "tbb_thread")
-set(MKL_VERSION_2024 FALSE)
+set(MKL_INTERFACE "ilp64")
 find_package(MKL QUIET)
 if(MKL_FOUND)
-    if(MKL_VERSION VERSION_GREATER_EQUAL "2024.0.0")
-        set(MKL_VERSION_2024 TRUE)
-        set(MKL_INTERFACE "ilp64")
-        find_package(MKL REQUIRED)
-    endif()
-endif()
-
-if(NOT MKL_VERSION_2024)
-    set(MKL_INTERFACE_FULL "intel_ilp64")
+    find_package(MKL REQUIRED)
+else()
     find_package(MKL REQUIRED PATHS ${CMAKE_SOURCE_DIR}/dpnp/backend/cmake/Modules NO_DEFAULT_PATH)
 endif()
 
diff --git a/dpnp/backend/CMakeLists.txt b/dpnp/backend/CMakeLists.txt
index f1f5b4477721..f9eb1a35d288 100644
--- a/dpnp/backend/CMakeLists.txt
+++ b/dpnp/backend/CMakeLists.txt
@@ -84,12 +84,7 @@ if(DPNP_GENERATE_COVERAGE)
     target_link_options(${_trgt} PRIVATE -fprofile-instr-generate -fcoverage-mapping)
 endif()
 
-if (MKL_VERSION_2024)
-    target_link_libraries(${_trgt} PUBLIC MKL::MKL_SYCL)
-else()
-    target_link_libraries(${_trgt} PUBLIC MKL::MKL_DPCPP)
-endif()
-
+target_link_libraries(${_trgt} PUBLIC MKL::MKL_SYCL)
 target_link_libraries(${_trgt} PUBLIC oneDPL)
 
 if (UNIX)
diff --git a/dpnp/backend/extensions/blas/CMakeLists.txt b/dpnp/backend/extensions/blas/CMakeLists.txt
index 8ef4e7d79e1b..7e2ce8318701 100644
--- a/dpnp/backend/extensions/blas/CMakeLists.txt
+++ b/dpnp/backend/extensions/blas/CMakeLists.txt
@@ -69,11 +69,7 @@ if (DPNP_GENERATE_COVERAGE)
     target_link_options(${python_module_name} PRIVATE -fprofile-instr-generate -fcoverage-mapping)
 endif()
 
-if (MKL_VERSION_2024)
-    target_link_libraries(${python_module_name} PUBLIC MKL::MKL_SYCL::BLAS)
-else()
-    target_link_libraries(${python_module_name} PUBLIC MKL::MKL_DPCPP)
-endif()
+target_link_libraries(${python_module_name} PUBLIC MKL::MKL_SYCL::BLAS)
 
 install(TARGETS ${python_module_name}
   DESTINATION "dpnp/backend/extensions/blas"
diff --git a/dpnp/backend/extensions/lapack/CMakeLists.txt b/dpnp/backend/extensions/lapack/CMakeLists.txt
index c25ef1d97bcb..f21f61c84dfd 100644
--- a/dpnp/backend/extensions/lapack/CMakeLists.txt
+++ b/dpnp/backend/extensions/lapack/CMakeLists.txt
@@ -82,11 +82,7 @@ if (DPNP_GENERATE_COVERAGE)
     target_link_options(${python_module_name} PRIVATE -fprofile-instr-generate -fcoverage-mapping)
 endif()
 
-if (MKL_VERSION_2024)
-    target_link_libraries(${python_module_name} PUBLIC MKL::MKL_SYCL::LAPACK)
-else()
-    target_link_libraries(${python_module_name} PUBLIC MKL::MKL_DPCPP)
-endif()
+target_link_libraries(${python_module_name} PUBLIC MKL::MKL_SYCL::LAPACK)
 
 install(TARGETS ${python_module_name}
   DESTINATION "dpnp/backend/extensions/lapack"
diff --git a/dpnp/backend/extensions/vm/CMakeLists.txt b/dpnp/backend/extensions/vm/CMakeLists.txt
index de6262581f59..0a7646cfc57e 100644
--- a/dpnp/backend/extensions/vm/CMakeLists.txt
+++ b/dpnp/backend/extensions/vm/CMakeLists.txt
@@ -109,11 +109,7 @@ if (DPNP_GENERATE_COVERAGE)
     target_link_options(${python_module_name} PRIVATE -fprofile-instr-generate -fcoverage-mapping)
 endif()
 
-if (MKL_VERSION_2024)
-    target_link_libraries(${python_module_name} PUBLIC MKL::MKL_SYCL::VM)
-else()
-    target_link_libraries(${python_module_name} PUBLIC MKL::MKL_DPCPP)
-endif()
+target_link_libraries(${python_module_name} PUBLIC MKL::MKL_SYCL::VM)
 
 install(TARGETS ${python_module_name}
   DESTINATION "dpnp/backend/extensions/vm"

From 090ae64568b0a415742003207883ad1ff019a79a Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Thu, 27 Jun 2024 15:25:01 +0200
Subject: [PATCH 37/49] Bump `test_basic.py` with the latest content (#1901)

* Bump test_basic.py with the latest content

* Add test scope to public CI
---
 .github/workflows/conda-package.yml           |   1 +
 tests/skipped_tests.tbl                       |  18 --
 tests/skipped_tests_gpu.tbl                   |  18 --
 .../cupy/creation_tests/test_basic.py         | 201 +++++++++++-------
 4 files changed, 124 insertions(+), 114 deletions(-)

diff --git a/.github/workflows/conda-package.yml b/.github/workflows/conda-package.yml
index 8f474e5398e0..d11f1a3038c5 100644
--- a/.github/workflows/conda-package.yml
+++ b/.github/workflows/conda-package.yml
@@ -49,6 +49,7 @@ env:
       test_umath.py
       test_usm_type.py
       third_party/cupy/core_tests
+      third_party/cupy/creation_tests
       third_party/cupy/indexing_tests/test_indexing.py
       third_party/cupy/lib_tests
       third_party/cupy/linalg_tests
diff --git a/tests/skipped_tests.tbl b/tests/skipped_tests.tbl
index c86b0d848c5b..37285be810fe 100644
--- a/tests/skipped_tests.tbl
+++ b/tests/skipped_tests.tbl
@@ -89,24 +89,6 @@ tests/third_party/cupy/core_tests/test_ndarray_reduction.py::TestCubReduction_pa
 tests/third_party/cupy/core_tests/test_ndarray_reduction.py::TestCubReduction_param_7_{order='F', shape=(10, 20, 30, 40)}::test_cub_max
 tests/third_party/cupy/core_tests/test_ndarray_reduction.py::TestCubReduction_param_7_{order='F', shape=(10, 20, 30, 40)}::test_cub_min
 
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_0_{shape=4}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_1_{shape=(4,)}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_2_{shape=(4, 2)}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_3_{shape=(4, 2, 3)}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_4_{shape=(5, 4, 2, 3)}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_huge_size
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_huge_size_fill0
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_int_huge_size
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_int_huge_size_fill0
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_like_invalid_order
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_like_K_strides
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_like_subok
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_zero_sized_array_strides
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_full_like_subok
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_ones_like_subok
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_zeros_like_subok
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_zeros_strides
-
 tests/third_party/cupy/indexing_tests/test_generate.py::TestAxisConcatenator::test_AxisConcatenator_init1
 tests/third_party/cupy/indexing_tests/test_generate.py::TestAxisConcatenator::test_len
 tests/third_party/cupy/indexing_tests/test_generate.py::TestC_::test_c_1
diff --git a/tests/skipped_tests_gpu.tbl b/tests/skipped_tests_gpu.tbl
index 45b41f2dafbc..55fd91b0defb 100644
--- a/tests/skipped_tests_gpu.tbl
+++ b/tests/skipped_tests_gpu.tbl
@@ -115,24 +115,6 @@ tests/third_party/cupy/core_tests/test_ndarray_reduction.py::TestCubReduction_pa
 tests/third_party/cupy/core_tests/test_ndarray_reduction.py::TestCubReduction_param_7_{order='F', shape=(10, 20, 30, 40)}::test_cub_max
 tests/third_party/cupy/core_tests/test_ndarray_reduction.py::TestCubReduction_param_7_{order='F', shape=(10, 20, 30, 40)}::test_cub_min
 
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_0_{shape=4}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_1_{shape=(4,)}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_2_{shape=(4, 2)}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_3_{shape=(4, 2, 3)}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasicReshape_param_4_{shape=(5, 4, 2, 3)}::test_empty_like_K_strides_reshape
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_huge_size
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_huge_size_fill0
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_int_huge_size
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_int_huge_size_fill0
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_like_invalid_order
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_like_K_strides
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_like_subok
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_empty_zero_sized_array_strides
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_full_like_subok
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_ones_like_subok
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_zeros_like_subok
-tests/third_party/cupy/creation_tests/test_basic.py::TestBasic::test_zeros_strides
-
 tests/third_party/cupy/fft_tests/test_fft.py::TestFft2_param_1_{axes=None, norm=None, s=(1, None), shape=(3, 4)}::test_fft2
 tests/third_party/cupy/fft_tests/test_fft.py::TestFft2_param_7_{axes=(), norm=None, s=None, shape=(3, 4)}::test_fft2
 tests/third_party/cupy/fft_tests/test_fft.py::TestFft2_param_7_{axes=(), norm=None, s=None, shape=(3, 4)}::test_ifft2
diff --git a/tests/third_party/cupy/creation_tests/test_basic.py b/tests/third_party/cupy/creation_tests/test_basic.py
index f2fe44b9facd..4623a39d3835 100644
--- a/tests/third_party/cupy/creation_tests/test_basic.py
+++ b/tests/third_party/cupy/creation_tests/test_basic.py
@@ -1,4 +1,4 @@
-import unittest
+import warnings
 
 import numpy
 import pytest
@@ -7,7 +7,7 @@
 from tests.third_party.cupy import testing
 
 
-class TestBasic(unittest.TestCase):
+class TestBasic:
     @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
@@ -20,19 +20,17 @@ def test_empty(self, xp, dtype, order):
     def test_empty_huge_size(self):
         a = cupy.empty((1024, 2048, 1024), dtype="b")
         a.fill(123)
-        self.assertTrue((a == 123).all())
+        assert (a == 123).all()
         # Free huge memory for slow test
         del a
-        cupy.get_default_memory_pool().free_all_blocks()
 
     @testing.slow
     def test_empty_huge_size_fill0(self):
         a = cupy.empty((1024, 2048, 1024), dtype="b")
         a.fill(0)
-        self.assertTrue((a == 0).all())
+        assert (a == 0).all()
         # Free huge memory for slow test
         del a
-        cupy.get_default_memory_pool().free_all_blocks()
 
     @testing.for_CF_orders()
     @testing.for_all_dtypes()
@@ -42,6 +40,17 @@ def test_empty_scalar(self, xp, dtype, order):
         a.fill(0)
         return a
 
+    @pytest.mark.skip("passing 'None' into shape arguments is not supported")
+    @testing.with_requires("numpy>=1.20")
+    @testing.for_CF_orders()
+    @testing.for_all_dtypes()
+    @testing.numpy_cupy_array_equal()
+    def test_empty_scalar_none(self, xp, dtype, order):
+        with testing.assert_warns(DeprecationWarning):
+            a = xp.empty(None, dtype=dtype, order=order)
+        a.fill(0)
+        return a
+
     @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
@@ -54,7 +63,7 @@ def test_empty_int(self, xp, dtype, order):
     def test_empty_int_huge_size(self):
         a = cupy.empty(2**31, dtype="b")
         a.fill(123)
-        self.assertTrue((a == 123).all())
+        assert (a == 123).all()
         # Free huge memory for slow test
         del a
         cupy.get_default_memory_pool().free_all_blocks()
@@ -63,12 +72,12 @@ def test_empty_int_huge_size(self):
     def test_empty_int_huge_size_fill0(self):
         a = cupy.empty(2**31, dtype="b")
         a.fill(0)
-        self.assertTrue((a == 0).all())
+        assert (a == 0).all()
         # Free huge memory for slow test
         del a
         cupy.get_default_memory_pool().free_all_blocks()
 
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_empty_like(self, xp, dtype, order):
@@ -77,7 +86,7 @@ def test_empty_like(self, xp, dtype, order):
         b.fill(0)
         return b
 
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_empty_like_contiguity(self, xp, dtype, order):
@@ -85,12 +94,12 @@ def test_empty_like_contiguity(self, xp, dtype, order):
         b = xp.empty_like(a, order=order)
         b.fill(0)
         if order in ["f", "F"]:
-            self.assertTrue(b.flags.f_contiguous)
+            assert b.flags.f_contiguous
         else:
-            self.assertTrue(b.flags.c_contiguous)
+            assert b.flags.c_contiguous
         return b
 
-    @testing.for_orders("C")
+    @testing.for_orders("CF")
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_empty_like_contiguity2(self, xp, dtype, order):
@@ -99,12 +108,12 @@ def test_empty_like_contiguity2(self, xp, dtype, order):
         b = xp.empty_like(a, order=order)
         b.fill(0)
         if order in ["c", "C"]:
-            self.assertTrue(b.flags.c_contiguous)
+            assert b.flags.c_contiguous
         else:
-            self.assertTrue(b.flags.f_contiguous)
+            assert b.flags.f_contiguous
         return b
 
-    @testing.for_orders("C")
+    @testing.for_orders("CF")
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_empty_like_contiguity3(self, xp, dtype, order):
@@ -114,16 +123,17 @@ def test_empty_like_contiguity3(self, xp, dtype, order):
         b = xp.empty_like(a, order=order)
         b.fill(0)
         if order in ["k", "K", None]:
-            self.assertFalse(b.flags.c_contiguous)
-            self.assertFalse(b.flags.f_contiguous)
+            assert not b.flags.c_contiguous
+            assert not b.flags.f_contiguous
         elif order in ["f", "F"]:
-            self.assertFalse(b.flags.c_contiguous)
-            self.assertTrue(b.flags.f_contiguous)
+            assert not b.flags.c_contiguous
+            assert b.flags.f_contiguous
         else:
-            self.assertTrue(b.flags.c_contiguous)
-            self.assertFalse(b.flags.f_contiguous)
+            assert b.flags.c_contiguous
+            assert not b.flags.f_contiguous
         return b
 
+    @pytest.mark.skip("order 'K' is not supported")
     @testing.for_all_dtypes()
     def test_empty_like_K_strides(self, dtype):
         # test strides that are both non-contiguous and non-descending
@@ -139,31 +149,37 @@ def test_empty_like_K_strides(self, dtype):
         bg.fill(0)
 
         # make sure NumPy and CuPy strides agree
-        self.assertEqual(b.strides, bg.strides)
+        assert b.strides == bg.strides
         return
 
+    @testing.with_requires("numpy>=1.19")
     @testing.for_all_dtypes()
     def test_empty_like_invalid_order(self, dtype):
         for xp in (numpy, cupy):
             a = testing.shaped_arange((2, 3, 4), xp, dtype)
-            with pytest.raises(TypeError):
+            with pytest.raises(ValueError):
                 xp.empty_like(a, order="Q")
 
+    @pytest.mark.skip("subok keyword is not supported")
     def test_empty_like_subok(self):
         a = testing.shaped_arange((2, 3, 4), cupy)
         with pytest.raises(TypeError):
             cupy.empty_like(a, subok=True)
 
+    @pytest.mark.skip("strides for zero sized array is different")
     @testing.for_CF_orders()
+    @testing.with_requires("numpy>=1.23")
     def test_empty_zero_sized_array_strides(self, order):
         a = numpy.empty((1, 0, 2), dtype="d", order=order)
         b = cupy.empty((1, 0, 2), dtype="d", order=order)
-        self.assertEqual(b.strides, a.strides)
+        assert b.strides == a.strides
 
+    @pytest.mark.parametrize("offset", [1, -1, 1 << 63, -(1 << 63)])
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
-    def test_eye(self, xp, dtype):
-        return xp.eye(5, 4, k=1, dtype=dtype)
+    def test_eye(self, xp, dtype, order, offset):
+        return xp.eye(5, 4, offset, dtype, order=order)
 
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
@@ -182,6 +198,15 @@ def test_zeros(self, xp, dtype, order):
     def test_zeros_scalar(self, xp, dtype, order):
         return xp.zeros((), dtype=dtype, order=order)
 
+    @pytest.mark.skip("passing 'None' into shape arguments is not supported")
+    @testing.with_requires("numpy>=1.20")
+    @testing.for_CF_orders()
+    @testing.for_all_dtypes()
+    @testing.numpy_cupy_array_equal()
+    def test_zeros_scalar_none(self, xp, dtype, order):
+        with testing.assert_warns(DeprecationWarning):
+            return xp.zeros(None, dtype=dtype, order=order)
+
     @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
@@ -190,61 +215,80 @@ def test_zeros_int(self, xp, dtype, order):
 
     @testing.for_CF_orders()
     def test_zeros_strides(self, order):
-        a = numpy.zeros((2, 3), dtype="d", order=order)
-        b = cupy.zeros((2, 3), dtype="d", order=order)
-        self.assertEqual(b.strides, a.strides)
+        a = numpy.zeros((2, 3), dtype="f", order=order)
+        b = cupy.zeros((2, 3), dtype="f", order=order)
+        b_strides = tuple(x * b.itemsize for x in b.strides)
+        assert b_strides == a.strides
 
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_zeros_like(self, xp, dtype, order):
         a = xp.ndarray((2, 3, 4), dtype=dtype)
         return xp.zeros_like(a, order=order)
 
+    @pytest.mark.skip("subok keyword is not supported")
     def test_zeros_like_subok(self):
         a = cupy.ndarray((2, 3, 4))
         with pytest.raises(TypeError):
             cupy.zeros_like(a, subok=True)
 
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
-    def test_ones(self, xp, dtype):
-        return xp.ones((2, 3, 4), dtype=dtype)
+    def test_ones(self, xp, dtype, order):
+        return xp.ones((2, 3, 4), dtype=dtype, order=order)
 
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_ones_like(self, xp, dtype, order):
         a = xp.ndarray((2, 3, 4), dtype=dtype)
         return xp.ones_like(a, order=order)
 
+    @pytest.mark.skip("subok keyword is not supported")
     def test_ones_like_subok(self):
         a = cupy.ndarray((2, 3, 4))
         with pytest.raises(TypeError):
             cupy.ones_like(a, subok=True)
 
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
-    def test_full(self, xp, dtype):
-        return xp.full((2, 3, 4), 1, dtype=dtype)
+    def test_full(self, xp, dtype, order):
+        return xp.full((2, 3, 4), 1, dtype=dtype, order=order)
 
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
-    def test_full_default_dtype(self, xp, dtype):
-        return xp.full((2, 3, 4), xp.array(1, dtype=dtype))
+    def test_full_default_dtype(self, xp, dtype, order):
+        return xp.full((2, 3, 4), xp.array(1, dtype=dtype), order=order)
 
-    @testing.for_all_dtypes()
+    @testing.for_all_dtypes_combination(("dtype1", "dtype2"))
     @testing.numpy_cupy_array_equal()
-    def test_full_default_dtype_cpu_input(self, xp, dtype):
-        return xp.full((2, 3, 4), numpy.array(1, dtype=dtype))
+    def test_full_dtypes_cpu_input(self, xp, dtype1, dtype2):
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", numpy.ComplexWarning)
+            return xp.full(
+                (2, 3, 4), numpy.array(1, dtype=dtype1), dtype=dtype2
+            )
 
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_full_like(self, xp, dtype, order):
         a = xp.ndarray((2, 3, 4), dtype=dtype)
         return xp.full_like(a, 1, order=order)
 
+    @testing.for_all_dtypes_combination(("dtype1", "dtype2"))
+    @testing.numpy_cupy_array_equal()
+    def test_full_like_dtypes_cpu_input(self, xp, dtype1, dtype2):
+        a = xp.ndarray((2, 3, 4), dtype=dtype1)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", numpy.ComplexWarning)
+            return xp.full_like(a, numpy.array(1, dtype=dtype1))
+
+    @pytest.mark.skip("subok keyword is not supported")
     def test_full_like_subok(self):
         a = cupy.ndarray((2, 3, 4))
         with pytest.raises(TypeError):
@@ -258,9 +302,9 @@ def test_full_like_subok(self):
         }
     )
 )
-class TestBasicReshape(unittest.TestCase):
+class TestBasicReshape:
     @testing.with_requires("numpy>=1.17.0")
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_empty_like_reshape(self, xp, dtype, order):
@@ -281,7 +325,7 @@ def test_empty_like_reshape_cupy_only(self, dtype, order):
         testing.assert_array_equal(b, c)
 
     @testing.with_requires("numpy>=1.17.0")
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_empty_like_reshape_contiguity(self, xp, dtype, order):
@@ -289,12 +333,12 @@ def test_empty_like_reshape_contiguity(self, xp, dtype, order):
         b = xp.empty_like(a, order=order, shape=self.shape)
         b.fill(0)
         if order in ["f", "F"]:
-            self.assertTrue(b.flags.f_contiguous)
+            assert b.flags.f_contiguous
         else:
-            self.assertTrue(b.flags.c_contiguous)
+            assert b.flags.c_contiguous
         return b
 
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     def test_empty_like_reshape_contiguity_cupy_only(self, dtype, order):
         a = testing.shaped_arange((2, 3, 4), cupy, dtype)
@@ -303,13 +347,13 @@ def test_empty_like_reshape_contiguity_cupy_only(self, dtype, order):
         c = cupy.empty(self.shape)
         c.fill(0)
         if order in ["f", "F"]:
-            self.assertTrue(b.flags.f_contiguous)
+            assert b.flags.f_contiguous
         else:
-            self.assertTrue(b.flags.c_contiguous)
+            assert b.flags.c_contiguous
         testing.assert_array_equal(b, c)
 
     @testing.with_requires("numpy>=1.17.0")
-    @testing.for_orders("C")
+    @testing.for_orders("CF")
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_empty_like_reshape_contiguity2(self, xp, dtype, order):
@@ -321,12 +365,12 @@ def test_empty_like_reshape_contiguity2(self, xp, dtype, order):
         if order in ["c", "C"] or (
             order in ["k", "K", None] and len(shape) != a.ndim
         ):
-            self.assertTrue(b.flags.c_contiguous)
+            assert b.flags.c_contiguous
         else:
-            self.assertTrue(b.flags.f_contiguous)
+            assert b.flags.f_contiguous
         return b
 
-    @testing.for_orders("C")
+    @testing.for_orders("CF")
     @testing.for_all_dtypes()
     def test_empty_like_reshape_contiguity2_cupy_only(self, dtype, order):
         a = testing.shaped_arange((2, 3, 4), cupy, dtype)
@@ -339,13 +383,13 @@ def test_empty_like_reshape_contiguity2_cupy_only(self, dtype, order):
         if order in ["c", "C"] or (
             order in ["k", "K", None] and len(shape) != a.ndim
         ):
-            self.assertTrue(b.flags.c_contiguous)
+            assert b.flags.c_contiguous
         else:
-            self.assertTrue(b.flags.f_contiguous)
+            assert b.flags.f_contiguous
         testing.assert_array_equal(b, c)
 
     @testing.with_requires("numpy>=1.17.0")
-    @testing.for_orders("C")
+    @testing.for_orders("CF")
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_empty_like_reshape_contiguity3(self, xp, dtype, order):
@@ -356,20 +400,20 @@ def test_empty_like_reshape_contiguity3(self, xp, dtype, order):
         b.fill(0)
         shape = self.shape if not numpy.isscalar(self.shape) else (self.shape,)
         if len(shape) == 1:
-            self.assertTrue(b.flags.c_contiguous)
-            self.assertTrue(b.flags.f_contiguous)
+            assert b.flags.c_contiguous
+            assert b.flags.f_contiguous
         elif order in ["k", "K", None] and len(shape) == a.ndim:
-            self.assertFalse(b.flags.c_contiguous)
-            self.assertFalse(b.flags.f_contiguous)
+            assert not b.flags.c_contiguous
+            assert not b.flags.f_contiguous
         elif order in ["f", "F"]:
-            self.assertFalse(b.flags.c_contiguous)
-            self.assertTrue(b.flags.f_contiguous)
+            assert not b.flags.c_contiguous
+            assert b.flags.f_contiguous
         else:
-            self.assertTrue(b.flags.c_contiguous)
-            self.assertFalse(b.flags.f_contiguous)
+            assert b.flags.c_contiguous
+            assert not b.flags.f_contiguous
         return b
 
-    @testing.for_orders("C")
+    @testing.for_orders("CF")
     @testing.for_all_dtypes()
     def test_empty_like_reshape_contiguity3_cupy_only(self, dtype, order):
         a = testing.shaped_arange((2, 3, 4), cupy, dtype)
@@ -379,22 +423,23 @@ def test_empty_like_reshape_contiguity3_cupy_only(self, dtype, order):
         b.fill(0)
         shape = self.shape if not numpy.isscalar(self.shape) else (self.shape,)
         if len(shape) == 1:
-            self.assertTrue(b.flags.c_contiguous)
-            self.assertTrue(b.flags.f_contiguous)
+            assert b.flags.c_contiguous
+            assert b.flags.f_contiguous
         elif order in ["k", "K", None] and len(shape) == a.ndim:
-            self.assertFalse(b.flags.c_contiguous)
-            self.assertFalse(b.flags.f_contiguous)
+            assert not b.flags.c_contiguous
+            assert not b.flags.f_contiguous
         elif order in ["f", "F"]:
-            self.assertFalse(b.flags.c_contiguous)
-            self.assertTrue(b.flags.f_contiguous)
+            assert not b.flags.c_contiguous
+            assert b.flags.f_contiguous
         else:
-            self.assertTrue(b.flags.c_contiguous)
-            self.assertFalse(b.flags.f_contiguous)
+            assert b.flags.c_contiguous
+            assert not b.flags.f_contiguous
 
         c = cupy.zeros(self.shape)
         c.fill(0)
         testing.assert_array_equal(b, c)
 
+    @pytest.mark.skip("order 'K' is not supported")
     @testing.with_requires("numpy>=1.17.0")
     @testing.for_all_dtypes()
     def test_empty_like_K_strides_reshape(self, dtype):
@@ -411,11 +456,11 @@ def test_empty_like_K_strides_reshape(self, dtype):
         bg.fill(0)
 
         # make sure NumPy and CuPy strides agree
-        self.assertEqual(b.strides, bg.strides)
+        assert b.strides == bg.strides
         return
 
     @testing.with_requires("numpy>=1.17.0")
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_zeros_like_reshape(self, xp, dtype, order):
@@ -432,7 +477,7 @@ def test_zeros_like_reshape_cupy_only(self, dtype, order):
         testing.assert_array_equal(b, c)
 
     @testing.with_requires("numpy>=1.17.0")
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_ones_like_reshape(self, xp, dtype, order):
@@ -448,7 +493,7 @@ def test_ones_like_reshape_cupy_only(self, dtype):
         testing.assert_array_equal(b, c)
 
     @testing.with_requires("numpy>=1.17.0")
-    @testing.for_orders("C")
+    @testing.for_CF_orders()
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_full_like_reshape(self, xp, dtype, order):

From 067a7849835b3e9add5418cd8ae2b81e08f2bd60 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Thu, 27 Jun 2024 17:05:21 +0200
Subject: [PATCH 38/49] Bump conda-index from 0.4.0 to 0.5.0 (#1902)

---
 .github/workflows/conda-package.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/conda-package.yml b/.github/workflows/conda-package.yml
index d11f1a3038c5..c67b74874295 100644
--- a/.github/workflows/conda-package.yml
+++ b/.github/workflows/conda-package.yml
@@ -13,7 +13,7 @@ env:
   MODULE_NAME: dpnp
   CHANNELS: '-c dppy/label/dev -c intel -c conda-forge --override-channels'
   CONDA_BUILD_VERSION: '24.5.1'
-  CONDA_INDEX_VERSION: '0.4.0'
+  CONDA_INDEX_VERSION: '0.5.0'
   RUN_TESTS_MAX_ATTEMPTS: 2
   TEST_ENV_NAME: 'test'
   TEST_SCOPE: >-

From 437f0468fd8fa9afd8e34c305d4df60218e0c6d1 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Fri, 28 Jun 2024 12:37:54 +0200
Subject: [PATCH 39/49] Check type of input in `dpnp.repeat` to raise a proper
 validation exception if any (#1894)

* Check type of input to raise a proper validation exception if any

* Update dpnp/dpnp_iface_manipulation.py

Co-authored-by: vtavana <120411540+vtavana@users.noreply.github.com>

---------

Co-authored-by: vtavana <120411540+vtavana@users.noreply.github.com>
---
 dpnp/dpnp_iface_manipulation.py               |  27 +-
 tests/test_arraymanipulation.py               | 111 --------
 tests/test_manipulation.py                    | 236 ++++++++++++++++--
 .../cupy/manipulation_tests/test_tiling.py    |   4 -
 4 files changed, 237 insertions(+), 141 deletions(-)

diff --git a/dpnp/dpnp_iface_manipulation.py b/dpnp/dpnp_iface_manipulation.py
index 056ac7907208..bf3c66d7fda9 100644
--- a/dpnp/dpnp_iface_manipulation.py
+++ b/dpnp/dpnp_iface_manipulation.py
@@ -1248,12 +1248,16 @@ def repeat(a, repeats, axis=None):
     ----------
     x : {dpnp.ndarray, usm_ndarray}
         Input array.
-    repeat : int or array of int
+    repeats : {int, tuple, list, range, dpnp.ndarray, usm_ndarray}
         The number of repetitions for each element. `repeats` is broadcasted to
         fit the shape of the given axis.
-    axis : int, optional
+        If `repeats` is an array, it must have an integer data type.
+        Otherwise, `repeats` must be a Python integer or sequence of Python
+        integers (i.e., a tuple, list, or range).
+    axis : {None, int}, optional
         The axis along which to repeat values. By default, use the flattened
         input array, and return a flat output array.
+        Default: ``None``.
 
     Returns
     -------
@@ -1263,8 +1267,8 @@ def repeat(a, repeats, axis=None):
 
     See Also
     --------
-    :obj:`dpnp.tile` : Construct an array by repeating A the number of times
-                       given by reps.
+    :obj:`dpnp.tile` : Tile an array.
+    :obj:`dpnp.unique` : Find the unique elements of an array.
 
     Examples
     --------
@@ -1286,14 +1290,15 @@ def repeat(a, repeats, axis=None):
 
     """
 
-    rep = repeats
-    if isinstance(repeats, dpnp_array):
-        rep = dpnp.get_usm_ndarray(repeats)
+    dpnp.check_supported_arrays_type(a)
+    if not isinstance(repeats, (int, tuple, list, range)):
+        repeats = dpnp.get_usm_ndarray(repeats)
+
     if axis is None and a.ndim > 1:
-        usm_arr = dpnp.get_usm_ndarray(a.flatten())
-    else:
-        usm_arr = dpnp.get_usm_ndarray(a)
-    usm_arr = dpt.repeat(usm_arr, rep, axis=axis)
+        a = dpnp.ravel(a)
+
+    usm_arr = dpnp.get_usm_ndarray(a)
+    usm_arr = dpt.repeat(usm_arr, repeats, axis=axis)
     return dpnp_array._create_from_usm_ndarray(usm_arr)
 
 
diff --git a/tests/test_arraymanipulation.py b/tests/test_arraymanipulation.py
index 12f14bf41098..a6bbfd0e9879 100644
--- a/tests/test_arraymanipulation.py
+++ b/tests/test_arraymanipulation.py
@@ -1016,114 +1016,3 @@ def test_can_cast():
     assert dpnp.can_cast(X, "float32") == numpy.can_cast(X_np, "float32")
     assert dpnp.can_cast(X, dpnp.int32) == numpy.can_cast(X_np, numpy.int32)
     assert dpnp.can_cast(X, dpnp.int64) == numpy.can_cast(X_np, numpy.int64)
-
-
-def test_repeat_scalar_sequence_agreement():
-    x = dpnp.arange(5, dtype="i4")
-    expected_res = dpnp.empty(10, dtype="i4")
-    expected_res[1::2], expected_res[::2] = x, x
-
-    # scalar case
-    reps = 2
-    res = dpnp.repeat(x, reps)
-    assert dpnp.all(res == expected_res)
-
-    # tuple
-    reps = (2, 2, 2, 2, 2)
-    res = dpnp.repeat(x, reps)
-    assert dpnp.all(res == expected_res)
-
-
-def test_repeat_as_broadcasting():
-    reps = 5
-    x = dpnp.arange(reps, dtype="i4")
-    x1 = x[:, dpnp.newaxis]
-    expected_res = dpnp.broadcast_to(x1, (reps, reps))
-
-    res = dpnp.repeat(x1, reps, axis=1)
-    assert dpnp.all(res == expected_res)
-
-    x2 = x[dpnp.newaxis, :]
-    expected_res = dpnp.broadcast_to(x2, (reps, reps))
-
-    res = dpnp.repeat(x2, reps, axis=0)
-    assert dpnp.all(res == expected_res)
-
-
-def test_repeat_axes():
-    reps = 2
-    x = dpnp.reshape(dpnp.arange(5 * 10, dtype="i4"), (5, 10))
-    expected_res = dpnp.empty((x.shape[0] * 2, x.shape[1]), dtype=x.dtype)
-    expected_res[::2, :], expected_res[1::2] = x, x
-    res = dpnp.repeat(x, reps, axis=0)
-    assert dpnp.all(res == expected_res)
-
-    expected_res = dpnp.empty((x.shape[0], x.shape[1] * 2), dtype=x.dtype)
-    expected_res[:, ::2], expected_res[:, 1::2] = x, x
-    res = dpnp.repeat(x, reps, axis=1)
-    assert dpnp.all(res == expected_res)
-
-
-def test_repeat_size_0_outputs():
-    x = dpnp.ones((3, 0, 5), dtype="i4")
-    reps = 10
-    res = dpnp.repeat(x, reps, axis=0)
-    assert res.size == 0
-    assert res.shape == (30, 0, 5)
-
-    res = dpnp.repeat(x, reps, axis=1)
-    assert res.size == 0
-    assert res.shape == (3, 0, 5)
-
-    res = dpnp.repeat(x, (2, 2, 2), axis=0)
-    assert res.size == 0
-    assert res.shape == (6, 0, 5)
-
-    x = dpnp.ones((3, 2, 5))
-    res = dpnp.repeat(x, 0, axis=1)
-    assert res.size == 0
-    assert res.shape == (3, 0, 5)
-
-    x = dpnp.ones((3, 2, 5))
-    res = dpnp.repeat(x, (0, 0), axis=1)
-    assert res.size == 0
-    assert res.shape == (3, 0, 5)
-
-
-def test_repeat_strides():
-    reps = 2
-    x = dpnp.reshape(dpnp.arange(10 * 10, dtype="i4"), (10, 10))
-    x1 = x[:, ::-2]
-    expected_res = dpnp.empty((10, 10), dtype="i4")
-    expected_res[:, ::2], expected_res[:, 1::2] = x1, x1
-    res = dpnp.repeat(x1, reps, axis=1)
-    assert dpnp.all(res == expected_res)
-    res = dpnp.repeat(x1, (reps,) * x1.shape[1], axis=1)
-    assert dpnp.all(res == expected_res)
-
-    x1 = x[::-2, :]
-    expected_res = dpnp.empty((10, 10), dtype="i4")
-    expected_res[::2, :], expected_res[1::2, :] = x1, x1
-    res = dpnp.repeat(x1, reps, axis=0)
-    assert dpnp.all(res == expected_res)
-    res = dpnp.repeat(x1, (reps,) * x1.shape[0], axis=0)
-    assert dpnp.all(res == expected_res)
-
-
-def test_repeat_casting():
-    x = dpnp.arange(5, dtype="i4")
-    # i4 is cast to i8
-    reps = dpnp.ones(5, dtype="i4")
-    res = dpnp.repeat(x, reps)
-    assert res.shape == x.shape
-    assert dpnp.all(res == x)
-
-
-def test_repeat_strided_repeats():
-    x = dpnp.arange(5, dtype="i4")
-    reps = dpnp.ones(10, dtype="i8")
-    reps[::2] = 0
-    reps = reps[::-2]
-    res = dpnp.repeat(x, reps)
-    assert res.shape == x.shape
-    assert dpnp.all(res == x)
diff --git a/tests/test_manipulation.py b/tests/test_manipulation.py
index 9c0869024a56..0178ff9a28b9 100644
--- a/tests/test_manipulation.py
+++ b/tests/test_manipulation.py
@@ -1,6 +1,7 @@
+import dpctl.tensor as dpt
 import numpy
 import pytest
-from numpy.testing import assert_array_equal
+from numpy.testing import assert_array_equal, assert_raises
 
 import dpnp
 
@@ -58,20 +59,6 @@ def test_copyto_where_raises(where):
         dpnp.copyto(a, b, where=where)
 
 
-@pytest.mark.usefixtures("allow_fall_back_on_numpy")
-@pytest.mark.parametrize(
-    "arr",
-    [[], [1, 2, 3, 4], [[1, 2], [3, 4]], [[[1], [2]], [[3], [4]]]],
-    ids=["[]", "[1, 2, 3, 4]", "[[1, 2], [3, 4]]", "[[[1], [2]], [[3], [4]]]"],
-)
-def test_repeat(arr):
-    a = numpy.array(arr)
-    dpnp_a = dpnp.array(arr)
-    expected = numpy.repeat(a, 2)
-    result = dpnp.repeat(dpnp_a, 2)
-    assert_array_equal(expected, result)
-
-
 def test_result_type():
     X = [dpnp.ones((2), dtype=dpnp.int64), dpnp.int32, "float32"]
     X_np = [numpy.ones((2), dtype=numpy.int64), numpy.int32, "float32"]
@@ -114,6 +101,225 @@ def test_unique(array):
     assert_array_equal(expected, result)
 
 
+class TestRepeat:
+    @pytest.mark.parametrize(
+        "data",
+        [[], [1, 2, 3, 4], [[1, 2], [3, 4]], [[[1], [2]], [[3], [4]]]],
+        ids=[
+            "[]",
+            "[1, 2, 3, 4]",
+            "[[1, 2], [3, 4]]",
+            "[[[1], [2]], [[3], [4]]]",
+        ],
+    )
+    @pytest.mark.parametrize("dtype", get_all_dtypes())
+    def test_data(self, data, dtype):
+        a = numpy.array(data, dtype=dtype)
+        ia = dpnp.array(a)
+
+        expected = numpy.repeat(a, 2)
+        result = dpnp.repeat(ia, 2)
+        assert_array_equal(expected, result)
+
+    @pytest.mark.parametrize(
+        "repeats", [2, (2, 2, 2, 2, 2)], ids=["scalar", "tuple"]
+    )
+    def test_scalar_sequence_agreement(self, repeats):
+        a = numpy.arange(5, dtype="i4")
+        ia = dpnp.array(a)
+
+        expected = numpy.repeat(a, repeats)
+        result = dpnp.repeat(ia, repeats)
+        assert_array_equal(expected, result)
+
+    @pytest.mark.parametrize("axis", [0, 1])
+    def test_broadcasting(self, axis):
+        reps = 5
+        a = numpy.arange(reps, dtype="i4")
+        if axis == 0:
+            sh = (reps, 1)
+        else:
+            sh = (1, reps)
+        a = a.reshape(sh)
+        ia = dpnp.array(a)
+
+        expected = numpy.repeat(a, reps)
+        result = dpnp.repeat(ia, reps)
+        assert_array_equal(expected, result)
+
+    @pytest.mark.parametrize("axis", [0, 1])
+    def test_axes(self, axis):
+        reps = 2
+        a = numpy.arange(5 * 10, dtype="i4").reshape((5, 10))
+        ia = dpnp.array(a)
+
+        expected = numpy.repeat(a, reps, axis=axis)
+        result = dpnp.repeat(ia, reps, axis=axis)
+        assert_array_equal(expected, result)
+
+    def test_size_0_outputs(self):
+        reps = 10
+        a = dpnp.ones((3, 0, 5), dtype="i4")
+        ia = dpnp.array(a)
+
+        expected = numpy.repeat(a, reps, axis=0)
+        result = dpnp.repeat(ia, reps, axis=0)
+        assert_array_equal(expected, result)
+
+        expected = numpy.repeat(a, reps, axis=1)
+        result = dpnp.repeat(ia, reps, axis=1)
+        assert_array_equal(expected, result)
+
+        reps = (2, 2, 2)
+        expected = numpy.repeat(a, reps, axis=0)
+        result = dpnp.repeat(ia, reps, axis=0)
+        assert_array_equal(expected, result)
+
+        a = numpy.ones((3, 2, 5))
+        ia = dpnp.array(a)
+
+        reps = 0
+        expected = numpy.repeat(a, reps, axis=1)
+        result = dpnp.repeat(ia, reps, axis=1)
+        assert_array_equal(expected, result)
+
+        reps = (0, 0)
+        expected = numpy.repeat(a, reps, axis=1)
+        result = dpnp.repeat(ia, reps, axis=1)
+        assert_array_equal(expected, result)
+
+    def test_strides_0(self):
+        reps = 2
+        a = numpy.arange(10 * 10, dtype="i4").reshape((10, 10))
+        ia = dpnp.array(a)
+
+        a = a[::-2, :]
+        ia = ia[::-2, :]
+
+        expected = numpy.repeat(a, reps, axis=0)
+        result = dpnp.repeat(ia, reps, axis=0)
+        assert_array_equal(expected, result)
+
+        expected = numpy.repeat(a, (reps,) * a.shape[0], axis=0)
+        result = dpnp.repeat(ia, (reps,) * ia.shape[0], axis=0)
+        assert_array_equal(expected, result)
+
+    def test_strides_1(self):
+        reps = 2
+        a = numpy.arange(10 * 10, dtype="i4").reshape((10, 10))
+        ia = dpnp.array(a)
+
+        a = a[:, ::-2]
+        ia = ia[:, ::-2]
+
+        expected = numpy.repeat(a, reps, axis=1)
+        result = dpnp.repeat(ia, reps, axis=1)
+        assert_array_equal(expected, result)
+
+        expected = numpy.repeat(a, (reps,) * a.shape[1], axis=1)
+        result = dpnp.repeat(ia, (reps,) * ia.shape[1], axis=1)
+        assert_array_equal(expected, result)
+
+    def test_casting(self):
+        a = numpy.arange(5, dtype="i4")
+        ia = dpnp.array(a)
+
+        # i4 is cast to i8
+        reps = numpy.ones(5, dtype="i4")
+        ireps = dpnp.array(reps)
+
+        expected = numpy.repeat(a, reps)
+        result = dpnp.repeat(ia, ireps)
+        assert_array_equal(expected, result)
+
+    def test_strided_repeats(self):
+        a = numpy.arange(5, dtype="i4")
+        ia = dpnp.array(a)
+
+        reps = numpy.ones(10, dtype="i8")
+        reps[::2] = 0
+        ireps = dpnp.array(reps)
+
+        reps = reps[::-2]
+        ireps = ireps[::-2]
+
+        expected = numpy.repeat(a, reps)
+        result = dpnp.repeat(ia, ireps)
+        assert_array_equal(expected, result)
+
+    def test_usm_ndarray_as_input_array(self):
+        reps = [1, 3, 2, 1, 1, 2]
+        a = numpy.array([[1, 2, 3, 4, 5, 6]])
+        ia = dpt.asarray(a)
+
+        expected = numpy.repeat(a, reps)
+        result = dpnp.repeat(ia, reps)
+        assert_array_equal(expected, result)
+        assert isinstance(result, dpnp.ndarray)
+
+    def test_scalar_as_input_array(self):
+        assert_raises(TypeError, dpnp.repeat, 3, 2)
+
+    def test_usm_ndarray_as_repeats(self):
+        a = numpy.array([1, 2, 3, 4, 5, 6]).reshape((2, 3))
+        ia = dpnp.asarray(a)
+
+        reps = numpy.array([1, 3, 2])
+        ireps = dpt.asarray(reps)
+
+        expected = a.repeat(reps, axis=1)
+        result = ia.repeat(ireps, axis=1)
+        assert_array_equal(expected, result)
+        assert isinstance(result, dpnp.ndarray)
+
+    def test_unsupported_array_as_repeats(self):
+        assert_raises(TypeError, dpnp.arange(5, dtype="i4"), numpy.array(3))
+
+    @pytest.mark.parametrize(
+        "data, dtype",
+        [
+            pytest.param([1, 2**7 - 1, -(2**7)], numpy.int8, id="int8"),
+            pytest.param([1, 2**15 - 1, -(2**15)], numpy.int16, id="int16"),
+            pytest.param([1, 2**31 - 1, -(2**31)], numpy.int32, id="int32"),
+            pytest.param([1, 2**63 - 1, -(2**63)], numpy.int64, id="int64"),
+        ],
+    )
+    def test_maximum_signed_integers(self, data, dtype):
+        reps = 129
+        a = numpy.array(data, dtype=dtype)
+        ia = dpnp.asarray(a)
+
+        expected = a.repeat(reps)
+        result = ia.repeat(reps)
+        assert_array_equal(expected, result)
+
+    @pytest.mark.parametrize(
+        "data, dtype",
+        [
+            pytest.param(
+                [1, -(2**7), -(2**7) + 1, 2**7 - 1], numpy.int8, id="int8"
+            ),
+            pytest.param(
+                [1, -(2**15), -(2**15) + 1, 2**15 - 1], numpy.int16, id="int16"
+            ),
+            pytest.param(
+                [1, -(2**31), -(2**31) + 1, 2**31 - 1], numpy.int32, id="int32"
+            ),
+            pytest.param(
+                [1, -(2**63), -(2**63) + 1, 2**63 - 1], numpy.int64, id="int64"
+            ),
+        ],
+    )
+    def test_minimum_signed_integers(self, data, dtype):
+        reps = 129
+        a = numpy.array(data, dtype=dtype)
+        ia = dpnp.asarray(a)
+
+        expected = a.repeat(reps)
+        result = ia.repeat(reps)
+        assert_array_equal(expected, result)
+
+
 class TestTranspose:
     @pytest.mark.parametrize("axes", [(0, 1), (1, 0), [0, 1]])
     def test_2d_with_axes(self, axes):
diff --git a/tests/third_party/cupy/manipulation_tests/test_tiling.py b/tests/third_party/cupy/manipulation_tests/test_tiling.py
index eb29036d248d..365a01f7e142 100644
--- a/tests/third_party/cupy/manipulation_tests/test_tiling.py
+++ b/tests/third_party/cupy/manipulation_tests/test_tiling.py
@@ -16,7 +16,6 @@
     {"repeats": [1, 2, 3], "axis": 1},
     {"repeats": [1, 2, 3], "axis": -2},
 )
-@pytest.mark.usefixtures("allow_fall_back_on_numpy")
 class TestRepeat(unittest.TestCase):
     @testing.numpy_cupy_array_equal()
     def test_array_repeat(self, xp):
@@ -42,7 +41,6 @@ def test_method(self):
     {"repeats": [2], "axis": None},
     {"repeats": [2], "axis": 1},
 )
-@pytest.mark.usefixtures("allow_fall_back_on_numpy")
 class TestRepeatListBroadcast(unittest.TestCase):
     """Test for `repeats` argument using single element list.
 
@@ -62,7 +60,6 @@ def test_array_repeat(self, xp):
     {"repeats": [1, 2, 3, 4], "axis": None},
     {"repeats": [1, 2, 3, 4], "axis": 0},
 )
-@pytest.mark.usefixtures("allow_fall_back_on_numpy")
 class TestRepeat1D(unittest.TestCase):
     @testing.numpy_cupy_array_equal()
     def test_array_repeat(self, xp):
@@ -91,7 +88,6 @@ def test_array_repeat(self, xp):
     {"repeats": 2, "axis": -4},
     {"repeats": 2, "axis": 3},
 )
-@pytest.mark.usefixtures("allow_fall_back_on_numpy")
 class TestRepeatFailure(unittest.TestCase):
     def test_repeat_failure(self):
         for xp in (numpy, cupy):

From a265663d211cb074625781dee5e4b3ba7328cca0 Mon Sep 17 00:00:00 2001
From: Natalia Polina <natalia.polina@intel.com>
Date: Fri, 28 Jun 2024 04:44:50 -0700
Subject: [PATCH 40/49] Clean up legacy element-wise implementation from the
 backend (#1890)

* Clean up legacy element-wise implementation from the backend

* return legacy copy implementation for partition function

* Apply comments

* Fix pre-commit

* Fix pre-commit

* Clean-up MACRO_2ARG_2TYPES_LOGIC_OP. Clean-up /backend/include

---------

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
Co-authored-by: Anton Volkov <antonwolfy@gmail.com>
---
 dpnp/backend/CMakeLists.txt                   |    1 -
 .../include/dpnp_gen_1arg_1type_tbl.hpp       |   15 -
 .../include/dpnp_gen_1arg_2type_tbl.hpp       |   80 --
 .../include/dpnp_gen_2arg_1type_tbl.hpp       |  105 --
 .../include/dpnp_gen_2arg_2type_tbl.hpp       |   94 --
 .../include/dpnp_gen_2arg_3type_tbl.hpp       |   52 -
 dpnp/backend/include/dpnp_iface.hpp           |  613 ----------
 dpnp/backend/include/dpnp_iface_fptr.hpp      |   95 +-
 dpnp/backend/kernels/dpnp_krnl_bitwise.cpp    |  438 -------
 dpnp/backend/kernels/dpnp_krnl_elemwise.cpp   |  690 +----------
 dpnp/backend/kernels/dpnp_krnl_logic.cpp      |  272 -----
 .../kernels/dpnp_krnl_mathematical.cpp        | 1034 -----------------
 dpnp/backend/src/dpnp_fptr.hpp                |    1 -
 dpnp/backend/src/dpnp_iface_fptr.cpp          |    1 -
 dpnp/dpnp_algo/CMakeLists.txt                 |    1 -
 dpnp/dpnp_algo/dpnp_algo.pxd                  |    6 -
 dpnp/dpnp_algo/dpnp_algo.pyx                  |    1 -
 dpnp/dpnp_algo/dpnp_algo_arraycreation.pxi    |   78 --
 dpnp/dpnp_algo/dpnp_algo_sorting.pxi          |    2 +-
 19 files changed, 16 insertions(+), 3563 deletions(-)
 delete mode 100644 dpnp/backend/include/dpnp_gen_2arg_1type_tbl.hpp
 delete mode 100644 dpnp/backend/include/dpnp_gen_2arg_2type_tbl.hpp
 delete mode 100644 dpnp/backend/kernels/dpnp_krnl_bitwise.cpp
 delete mode 100644 dpnp/dpnp_algo/dpnp_algo_arraycreation.pxi

diff --git a/dpnp/backend/CMakeLists.txt b/dpnp/backend/CMakeLists.txt
index f9eb1a35d288..d96320bf0acd 100644
--- a/dpnp/backend/CMakeLists.txt
+++ b/dpnp/backend/CMakeLists.txt
@@ -25,7 +25,6 @@
 
 set(DPNP_SRC
     kernels/dpnp_krnl_arraycreation.cpp
-    kernels/dpnp_krnl_bitwise.cpp
     kernels/dpnp_krnl_common.cpp
     kernels/dpnp_krnl_elemwise.cpp
     kernels/dpnp_krnl_fft.cpp
diff --git a/dpnp/backend/include/dpnp_gen_1arg_1type_tbl.hpp b/dpnp/backend/include/dpnp_gen_1arg_1type_tbl.hpp
index ea1c477173f6..32df8aeda723 100644
--- a/dpnp/backend/include/dpnp_gen_1arg_1type_tbl.hpp
+++ b/dpnp/backend/include/dpnp_gen_1arg_1type_tbl.hpp
@@ -83,24 +83,9 @@
 
 #endif // _SECTION_DOCUMENTATION_GENERATION_
 
-MACRO_1ARG_1TYPE_OP(dpnp_conjugate_c,
-                    std::conj(input_elem),
-                    q.submit(kernel_func))
-MACRO_1ARG_1TYPE_OP(dpnp_copy_c, input_elem, q.submit(kernel_func))
 MACRO_1ARG_1TYPE_OP(dpnp_erf_c,
                     dispatch_erf_op(input_elem),
                     oneapi::mkl::vm::erf(q, input1_size, input1_data, result))
-MACRO_1ARG_1TYPE_OP(dpnp_negative_c, -input_elem, q.submit(kernel_func))
-MACRO_1ARG_1TYPE_OP(
-    dpnp_recip_c,
-    _DataType(1) / input_elem,
-    q.submit(kernel_func)) // error: no member named 'recip' in namespace 'sycl'
-MACRO_1ARG_1TYPE_OP(dpnp_sign_c,
-                    dispatch_sign_op(input_elem),
-                    q.submit(kernel_func)) // no sycl::sign for int and long
-MACRO_1ARG_1TYPE_OP(dpnp_square_c,
-                    input_elem *input_elem,
-                    oneapi::mkl::vm::sqr(q, input1_size, input1_data, result))
 
 #undef MACRO_1ARG_1TYPE_OP
 
diff --git a/dpnp/backend/include/dpnp_gen_1arg_2type_tbl.hpp b/dpnp/backend/include/dpnp_gen_1arg_2type_tbl.hpp
index 3abc54c7212b..a27353866d27 100644
--- a/dpnp/backend/include/dpnp_gen_1arg_2type_tbl.hpp
+++ b/dpnp/backend/include/dpnp_gen_1arg_2type_tbl.hpp
@@ -85,95 +85,15 @@
 
 #endif
 
-MACRO_1ARG_2TYPES_OP(dpnp_acos_c,
-                     sycl::acos(input_elem),
-                     oneapi::mkl::vm::acos(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(
-    dpnp_acosh_c,
-    sycl::acosh(input_elem),
-    oneapi::mkl::vm::acosh(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_asin_c,
-                     sycl::asin(input_elem),
-                     oneapi::mkl::vm::asin(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(
-    dpnp_asinh_c,
-    sycl::asinh(input_elem),
-    oneapi::mkl::vm::asinh(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_atan_c,
-                     sycl::atan(input_elem),
-                     oneapi::mkl::vm::atan(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(
-    dpnp_atanh_c,
-    sycl::atanh(input_elem),
-    oneapi::mkl::vm::atanh(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_cbrt_c,
-                     sycl::cbrt(input_elem),
-                     oneapi::mkl::vm::cbrt(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_ceil_c,
-                     sycl::ceil(input_elem),
-                     oneapi::mkl::vm::ceil(q, input1_size, input1_data, result))
 MACRO_1ARG_2TYPES_OP(dpnp_copyto_c, input_elem, q.submit(kernel_func))
-MACRO_1ARG_2TYPES_OP(dpnp_cos_c,
-                     sycl::cos(input_elem),
-                     oneapi::mkl::vm::cos(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_cosh_c,
-                     sycl::cosh(input_elem),
-                     oneapi::mkl::vm::cosh(q, input1_size, input1_data, result))
 MACRO_1ARG_2TYPES_OP(dpnp_degrees_c,
                      sycl::degrees(input_elem),
                      q.submit(kernel_func))
-MACRO_1ARG_2TYPES_OP(dpnp_exp2_c,
-                     sycl::exp2(input_elem),
-                     oneapi::mkl::vm::exp2(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_exp_c,
-                     sycl::exp(input_elem),
-                     oneapi::mkl::vm::exp(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(
-    dpnp_expm1_c,
-    sycl::expm1(input_elem),
-    oneapi::mkl::vm::expm1(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_fabs_c,
-                     sycl::fabs(input_elem),
-                     oneapi::mkl::vm::abs(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(
-    dpnp_floor_c,
-    sycl::floor(input_elem),
-    oneapi::mkl::vm::floor(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(
-    dpnp_log10_c,
-    sycl::log10(input_elem),
-    oneapi::mkl::vm::log10(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(
-    dpnp_log1p_c,
-    sycl::log1p(input_elem),
-    oneapi::mkl::vm::log1p(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_log2_c,
-                     sycl::log2(input_elem),
-                     oneapi::mkl::vm::log2(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_log_c,
-                     sycl::log(input_elem),
-                     oneapi::mkl::vm::ln(q, input1_size, input1_data, result))
 MACRO_1ARG_2TYPES_OP(dpnp_radians_c,
                      sycl::radians(input_elem),
                      q.submit(kernel_func))
-MACRO_1ARG_2TYPES_OP(dpnp_sin_c,
-                     sycl::sin(input_elem),
-                     oneapi::mkl::vm::sin(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_sinh_c,
-                     sycl::sinh(input_elem),
-                     oneapi::mkl::vm::sinh(q, input1_size, input1_data, result))
 MACRO_1ARG_2TYPES_OP(dpnp_sqrt_c,
                      sycl::sqrt(input_elem),
                      oneapi::mkl::vm::sqrt(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_tan_c,
-                     sycl::tan(input_elem),
-                     oneapi::mkl::vm::tan(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(dpnp_tanh_c,
-                     sycl::tanh(input_elem),
-                     oneapi::mkl::vm::tanh(q, input1_size, input1_data, result))
-MACRO_1ARG_2TYPES_OP(
-    dpnp_trunc_c,
-    sycl::trunc(input_elem),
-    oneapi::mkl::vm::trunc(q, input1_size, input1_data, result))
 
 #undef MACRO_1ARG_2TYPES_OP
diff --git a/dpnp/backend/include/dpnp_gen_2arg_1type_tbl.hpp b/dpnp/backend/include/dpnp_gen_2arg_1type_tbl.hpp
deleted file mode 100644
index 130283e5834d..000000000000
--- a/dpnp/backend/include/dpnp_gen_2arg_1type_tbl.hpp
+++ /dev/null
@@ -1,105 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2016-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-/*
- * This header file contains single argument bitwise functions definitions
- *
- * Macro `MACRO_2ARG_1TYPE_OP` must be defined before usage
- *
- * Parameters:
- * - public name of the function and kernel name
- * - operation used to calculate the result
- *
- */
-
-#ifndef MACRO_2ARG_1TYPE_OP
-#error "MACRO_2ARG_1TYPE_OP is not defined"
-#endif
-
-#ifdef _SECTION_DOCUMENTATION_GENERATION_
-
-#define MACRO_2ARG_1TYPE_OP(__name__, __operation__)                           \
-    /** @ingroup BACKEND_API */                                                \
-    /** @brief Element wise operation function __name__ */                     \
-    /** */                                                                     \
-    /** Function "__name__" executes operator "__operation__" over             \
-     * corresponding elements of input arrays            */                    \
-    /** */                                                                     \
-    /** @param[in]  q_ref              Reference to SYCL queue. */             \
-    /** @param[out] result_out         Output array. */                        \
-    /** @param[in]  result_size        Output array size. */                   \
-    /** @param[in]  result_ndim        Number of output array dimensions.      \
-     */                                                                        \
-    /** @param[in]  result_shape       Output array shape. */                  \
-    /** @param[in]  result_strides     Output array strides. */                \
-    /** @param[in]  input1_in          Input array 1. */                       \
-    /** @param[in]  input1_size        Input array 1 size. */                  \
-    /** @param[in]  input1_ndim        Number of input array 1 dimensions.     \
-     */                                                                        \
-    /** @param[in]  input1_shape       Input array 1 shape. */                 \
-    /** @param[in]  input1_strides     Input array 1 strides. */               \
-    /** @param[in]  input2_in          Input array 2. */                       \
-    /** @param[in]  input2_size        Input array 2 size. */                  \
-    /** @param[in]  input2_ndim        Number of input array 2 dimensions.     \
-     */                                                                        \
-    /** @param[in]  input2_shape       Input array 2 shape. */                 \
-    /** @param[in]  input2_strides     Input array 2 strides. */               \
-    /** @param[in]  where              Where condition. */                     \
-    /** @param[in]  dep_event_vec_ref  Reference to vector of SYCL events.     \
-     */                                                                        \
-    template <typename _DataType>                                              \
-    DPCTLSyclEventRef __name__(                                                \
-        DPCTLSyclQueueRef q_ref, void *result_out, const size_t result_size,   \
-        const size_t result_ndim, const shape_elem_type *result_shape,         \
-        const shape_elem_type *result_strides, const void *input1_in,          \
-        const size_t input1_size, const size_t input1_ndim,                    \
-        const shape_elem_type *input1_shape,                                   \
-        const shape_elem_type *input1_strides, const void *input2_in,          \
-        const size_t input2_size, const size_t input2_ndim,                    \
-        const shape_elem_type *input2_shape,                                   \
-        const shape_elem_type *input2_strides, const size_t *where,            \
-        const DPCTLEventVectorRef dep_event_vec_ref);                          \
-                                                                               \
-    template <typename _DataType>                                              \
-    void __name__(                                                             \
-        void *result_out, const size_t result_size, const size_t result_ndim,  \
-        const shape_elem_type *result_shape,                                   \
-        const shape_elem_type *result_strides, const void *input1_in,          \
-        const size_t input1_size, const size_t input1_ndim,                    \
-        const shape_elem_type *input1_shape,                                   \
-        const shape_elem_type *input1_strides, const void *input2_in,          \
-        const size_t input2_size, const size_t input2_ndim,                    \
-        const shape_elem_type *input2_shape,                                   \
-        const shape_elem_type *input2_strides, const size_t *where);
-
-#endif
-
-MACRO_2ARG_1TYPE_OP(dpnp_bitwise_and_c, input1_elem &input2_elem)
-MACRO_2ARG_1TYPE_OP(dpnp_bitwise_or_c, input1_elem | input2_elem)
-MACRO_2ARG_1TYPE_OP(dpnp_bitwise_xor_c, input1_elem ^ input2_elem)
-MACRO_2ARG_1TYPE_OP(dpnp_left_shift_c, input1_elem << input2_elem)
-MACRO_2ARG_1TYPE_OP(dpnp_right_shift_c, input1_elem >> input2_elem)
-
-#undef MACRO_2ARG_1TYPE_OP
diff --git a/dpnp/backend/include/dpnp_gen_2arg_2type_tbl.hpp b/dpnp/backend/include/dpnp_gen_2arg_2type_tbl.hpp
deleted file mode 100644
index d84accb0757d..000000000000
--- a/dpnp/backend/include/dpnp_gen_2arg_2type_tbl.hpp
+++ /dev/null
@@ -1,94 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2023-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-/*
- * This header file contains single argument element wise functions definitions
- *
- * Macro `MACRO_2ARG_2TYPES_LOGIC_OP` must be defined before usage
- *
- * Parameters:
- * - public name of the function and kernel name
- * - operation used to calculate the result
- *
- */
-
-#ifndef MACRO_2ARG_2TYPES_LOGIC_OP
-#error "MACRO_2ARG_2TYPES_LOGIC_OP is not defined"
-#endif
-
-#ifdef _SECTION_DOCUMENTATION_GENERATION_
-
-#define MACRO_2ARG_2TYPES_LOGIC_OP(__name__, __operation__)                    \
-    /** @ingroup BACKEND_API */                                                \
-    /** @brief Per element operation function __name__ */                      \
-    /** */                                                                     \
-    /** Function "__name__" executes operator "__operation__" over             \
-     * corresponding elements of input arrays            */                    \
-    /** */                                                                     \
-    /** @param[in]  q_ref              Reference to SYCL queue. */             \
-    /** @param[out] result_out         Output array. */                        \
-    /** @param[in]  result_size        Output array size. */                   \
-    /** @param[in]  result_ndim        Number of output array dimensions.      \
-     */                                                                        \
-    /** @param[in]  result_shape       Output array shape. */                  \
-    /** @param[in]  result_strides     Output array strides. */                \
-    /** @param[in]  input1_in          Input array 1. */                       \
-    /** @param[in]  input1_size        Input array 1 size. */                  \
-    /** @param[in]  input1_ndim        Number of input array 1 dimensions.     \
-     */                                                                        \
-    /** @param[in]  input1_shape       Input array 1 shape. */                 \
-    /** @param[in]  input1_strides     Input array 1 strides. */               \
-    /** @param[in]  input2_in          Input array 2. */                       \
-    /** @param[in]  input2_size        Input array 2 size. */                  \
-    /** @param[in]  input2_ndim        Number of input array 2 dimensions.     \
-     */                                                                        \
-    /** @param[in]  input2_shape       Input array 2 shape. */                 \
-    /** @param[in]  input2_strides     Input array 2 strides. */               \
-    /** @param[in]  where              Where condition. */                     \
-    /** @param[in]  dep_event_vec_ref  Reference to vector of SYCL events.     \
-     */                                                                        \
-    template <typename _DataType_input1, typename _DataType_input2>            \
-    DPCTLSyclEventRef __name__(                                                \
-        DPCTLSyclQueueRef q_ref, void *result_out, const size_t result_size,   \
-        const size_t result_ndim, const shape_elem_type *result_shape,         \
-        const shape_elem_type *result_strides, const void *input1_in,          \
-        const size_t input1_size, const size_t input1_ndim,                    \
-        const shape_elem_type *input1_shape,                                   \
-        const shape_elem_type *input1_strides, const void *input2_in,          \
-        const size_t input2_size, const size_t input2_ndim,                    \
-        const shape_elem_type *input2_shape,                                   \
-        const shape_elem_type *input2_strides, const size_t *where,            \
-        const DPCTLEventVectorRef dep_event_vec_ref);
-
-#endif
-
-MACRO_2ARG_2TYPES_LOGIC_OP(dpnp_equal_c, input1_elem == input2_elem)
-MACRO_2ARG_2TYPES_LOGIC_OP(dpnp_greater_c, input1_elem > input2_elem)
-MACRO_2ARG_2TYPES_LOGIC_OP(dpnp_greater_equal_c, input1_elem >= input2_elem)
-MACRO_2ARG_2TYPES_LOGIC_OP(dpnp_less_c, input1_elem < input2_elem)
-MACRO_2ARG_2TYPES_LOGIC_OP(dpnp_less_equal_c, input1_elem <= input2_elem)
-MACRO_2ARG_2TYPES_LOGIC_OP(dpnp_not_equal_c, input1_elem != input2_elem)
-
-#undef MACRO_2ARG_2TYPES_LOGIC_OP
diff --git a/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp b/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp
index 7423085d6593..dcec3f8192bb 100644
--- a/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp
+++ b/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp
@@ -103,40 +103,6 @@
 
 #endif
 
-MACRO_2ARG_3TYPES_OP(dpnp_add_c,
-                     input1_elem + input2_elem,
-                     x1 + x2,
-                     MACRO_UNPACK_TYPES(bool, std::int32_t, std::int64_t),
-                     oneapi::mkl::vm::add,
-                     MACRO_UNPACK_TYPES(float,
-                                        double,
-                                        std::complex<float>,
-                                        std::complex<double>))
-
-MACRO_2ARG_3TYPES_OP(dpnp_arctan2_c,
-                     sycl::atan2(input1_elem, input2_elem),
-                     sycl::atan2(x1, x2),
-                     MACRO_UNPACK_TYPES(float, double),
-                     oneapi::mkl::vm::atan2,
-                     MACRO_UNPACK_TYPES(float, double))
-
-MACRO_2ARG_3TYPES_OP(dpnp_copysign_c,
-                     sycl::copysign(input1_elem, input2_elem),
-                     sycl::copysign(x1, x2),
-                     MACRO_UNPACK_TYPES(float, double),
-                     oneapi::mkl::vm::copysign,
-                     MACRO_UNPACK_TYPES(float, double))
-
-MACRO_2ARG_3TYPES_OP(dpnp_divide_c,
-                     input1_elem / input2_elem,
-                     x1 / x2,
-                     MACRO_UNPACK_TYPES(bool, std::int32_t, std::int64_t),
-                     oneapi::mkl::vm::div,
-                     MACRO_UNPACK_TYPES(float,
-                                        double,
-                                        std::complex<float>,
-                                        std::complex<double>))
-
 MACRO_2ARG_3TYPES_OP(
     dpnp_fmod_c,
     dispatch_fmod_op(input1_elem, input2_elem),
@@ -145,13 +111,6 @@ MACRO_2ARG_3TYPES_OP(
     oneapi::mkl::vm::fmod,
     MACRO_UNPACK_TYPES(float, double))
 
-MACRO_2ARG_3TYPES_OP(dpnp_hypot_c,
-                     sycl::hypot(input1_elem, input2_elem),
-                     sycl::hypot(x1, x2),
-                     MACRO_UNPACK_TYPES(float, double),
-                     oneapi::mkl::vm::hypot,
-                     MACRO_UNPACK_TYPES(float, double))
-
 MACRO_2ARG_3TYPES_OP(dpnp_maximum_c,
                      sycl::max(input1_elem, input2_elem),
                      nullptr,
@@ -181,17 +140,6 @@ MACRO_2ARG_3TYPES_OP(dpnp_multiply_c,
                                         std::complex<float>,
                                         std::complex<double>))
 
-MACRO_2ARG_3TYPES_OP(dpnp_power_c,
-                     static_cast<_DataType_output>(std::pow(input1_elem,
-                                                            input2_elem)),
-                     sycl::pow(x1, x2),
-                     MACRO_UNPACK_TYPES(float, double),
-                     oneapi::mkl::vm::pow,
-                     MACRO_UNPACK_TYPES(float,
-                                        double,
-                                        std::complex<float>,
-                                        std::complex<double>))
-
 MACRO_2ARG_3TYPES_OP(dpnp_subtract_c,
                      input1_elem - input2_elem,
                      x1 - x2,
diff --git a/dpnp/backend/include/dpnp_iface.hpp b/dpnp/backend/include/dpnp_iface.hpp
index ccbf6fa85361..324e7a612b1a 100644
--- a/dpnp/backend/include/dpnp_iface.hpp
+++ b/dpnp/backend/include/dpnp_iface.hpp
@@ -202,28 +202,6 @@ template <typename _DataType>
 INP_DLLEXPORT void
     dpnp_arange_c(size_t start, size_t step, void *result1, size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief Copy of the array, cast to a specified type.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array               Input array.
- * @param [out] result              Output array.
- * @param [in]  size                Number of input elements in `array`.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_astype_c(DPCTLSyclQueueRef q_ref,
-                  const void *array,
-                  void *result,
-                  const size_t size,
-                  const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT void
-    dpnp_astype_c(const void *array, void *result, const size_t size);
-
 /**
  * @ingroup BACKEND_API
  * @brief Implementation of full function
@@ -266,67 +244,6 @@ INP_DLLEXPORT DPCTLSyclEventRef
 template <typename _DataType>
 INP_DLLEXPORT void dpnp_full_like_c(void *array_in, void *result, size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief Matrix multiplication.
- *
- * Matrix multiplication procedure.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [out] result_out          Output array.
- * @param [in]  result_size         Size of output array.
- * @param [in]  result_ndim         Number of output array dimensions.
- * @param [in]  result_shape        Shape of output array.
- * @param [in]  result_strides      Strides of output array.
- * @param [in]  input1_in           First input array.
- * @param [in]  input1_size         Size of first input array.
- * @param [in]  input1_ndim         Number of first input array dimensions.
- * @param [in]  input1_shape        Shape of first input array.
- * @param [in]  input1_strides      Strides of first input array.
- * @param [in]  input2_in           Second input array.
- * @param [in]  input2_size         Size of second input array.
- * @param [in]  input2_ndim         Number of second input array dimensions.
- * @param [in]  input2_shape        Shape of second input array.
- * @param [in]  input2_strides      Strides of second input array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_matmul_c(DPCTLSyclQueueRef q_ref,
-                  void *result_out,
-                  const size_t result_size,
-                  const size_t result_ndim,
-                  const shape_elem_type *result_shape,
-                  const shape_elem_type *result_strides,
-                  const void *input1_in,
-                  const size_t input1_size,
-                  const size_t input1_ndim,
-                  const shape_elem_type *input1_shape,
-                  const shape_elem_type *input1_strides,
-                  const void *input2_in,
-                  const size_t input2_size,
-                  const size_t input2_ndim,
-                  const shape_elem_type *input2_shape,
-                  const shape_elem_type *input2_strides,
-                  const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_matmul_c(void *result_out,
-                                 const size_t result_size,
-                                 const size_t result_ndim,
-                                 const shape_elem_type *result_shape,
-                                 const shape_elem_type *result_strides,
-                                 const void *input1_in,
-                                 const size_t input1_size,
-                                 const size_t input1_ndim,
-                                 const shape_elem_type *input1_shape,
-                                 const shape_elem_type *input1_strides,
-                                 const void *input2_in,
-                                 const size_t input2_size,
-                                 const size_t input2_ndim,
-                                 const shape_elem_type *input2_shape,
-                                 const shape_elem_type *input2_strides);
-
 /**
  * @ingroup BACKEND_API
  * @brief Compute the variance along the specified axis, while ignoring NaNs.
@@ -388,28 +305,6 @@ INP_DLLEXPORT void dpnp_nonzero_c(const void *array1,
                                   const size_t ndim,
                                   const size_t j);
 
-/**
- * @ingroup BACKEND_API
- * @brief absolute function.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  input1_in           Input array.
- * @param [out] result1             Output array.
- * @param [in]  size                Number of elements in input arrays.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_elemwise_absolute_c(DPCTLSyclQueueRef q_ref,
-                             const void *input1_in,
-                             void *result1,
-                             size_t size,
-                             const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void
-    dpnp_elemwise_absolute_c(const void *input1_in, void *result1, size_t size);
-
 /**
  * @ingroup BACKEND_API
  * @brief Custom implementation of dot function
@@ -473,98 +368,6 @@ INP_DLLEXPORT void dpnp_dot_c(void *result_out,
                               const shape_elem_type *input2_shape,
                               const shape_elem_type *input2_strides);
 
-/**
- * @ingroup BACKEND_API
- * @brief Custom implementation of cross function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [out] result_out          Output array.
- * @param [in]  input1_in           First input array.
- * @param [in]  input1_size         Size of first input array.
- * @param [in]  input1_shape        Shape of first input array.
- * @param [in]  input1_shape_ndim   Number of first array dimensions.
- * @param [in]  input2_in           Second input array.
- * @param [in]  input2_size         Shape of second input array.
- * @param [in]  input2_shape        Shape of first input array.
- * @param [in]  input2_shape_ndim   Number of second array dimensions.
- * @param [in]  where               Mask array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_cross_c(DPCTLSyclQueueRef q_ref,
-                 void *result_out,
-                 const void *input1_in,
-                 const size_t input1_size,
-                 const shape_elem_type *input1_shape,
-                 const size_t input1_shape_ndim,
-                 const void *input2_in,
-                 const size_t input2_size,
-                 const shape_elem_type *input2_shape,
-                 const size_t input2_shape_ndim,
-                 const size_t *where,
-                 const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-INP_DLLEXPORT void dpnp_cross_c(void *result_out,
-                                const void *input1_in,
-                                const size_t input1_size,
-                                const shape_elem_type *input1_shape,
-                                const size_t input1_shape_ndim,
-                                const void *input2_in,
-                                const size_t input2_size,
-                                const shape_elem_type *input2_shape,
-                                const size_t input2_shape_ndim,
-                                const size_t *where);
-
-/**
- * @ingroup BACKEND_API
- * @brief Custom implementation of cumprod function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1_in           Input array.
- * @param [out] result1             Output array.
- * @param [in]  size                Number of elements in input arrays.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- *
- */
-template <typename _DataType_input, typename _DataType_output>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_cumprod_c(DPCTLSyclQueueRef q_ref,
-                   void *array1_in,
-                   void *result1,
-                   size_t size,
-                   const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType_input, typename _DataType_output>
-INP_DLLEXPORT void dpnp_cumprod_c(void *array1_in, void *result1, size_t size);
-
-/**
- * @ingroup BACKEND_API
- * @brief Custom implementation of cumsum function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1_in           Input array.
- * @param [out] result1             Output array.
- * @param [in]  size                Number of elements in input arrays.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- *
- */
-template <typename _DataType_input, typename _DataType_output>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_cumsum_c(DPCTLSyclQueueRef q_ref,
-                  void *array1_in,
-                  void *result1,
-                  size_t size,
-                  const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType_input, typename _DataType_output>
-INP_DLLEXPORT void dpnp_cumsum_c(void *array1_in, void *result1, size_t size);
-
 /**
  * @ingroup BACKEND_API
  * @brief The differences between consecutive elements of an array.
@@ -910,54 +713,6 @@ INP_DLLEXPORT void dpnp_put_along_axis_c(void *arr_in,
                                          size_t size_indices,
                                          size_t values_size);
 
-/**
- * @ingroup BACKEND_API
- * @brief Compute the eigenvalues and right eigenvectors of a square array.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array_in            Input array[size][size]
- * @param [out] result1             The eigenvalues, each repeated according to
- * its multiplicity
- * @param [out] result2             The normalized (unit "length") eigenvectors
- * @param [in]  size                One dimension of square [size][size] array
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_eig_c(DPCTLSyclQueueRef q_ref,
-               const void *array_in,
-               void *result1,
-               void *result2,
-               size_t size,
-               const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT void
-    dpnp_eig_c(const void *array_in, void *result1, void *result2, size_t size);
-
-/**
- * @ingroup BACKEND_API
- * @brief Compute the eigenvalues of a square array.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array_in            Input array[size][size]
- * @param [out] result1             The eigenvalues, each repeated according to
- * its multiplicity
- * @param [in]  size                One dimension of square [size][size] array
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_eigvals_c(DPCTLSyclQueueRef q_ref,
-                   const void *array_in,
-                   void *result1,
-                   size_t size,
-                   const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT void
-    dpnp_eigvals_c(const void *array_in, void *result1, size_t size);
-
 /**
  * @ingroup BACKEND_API
  * @brief Return a 2-D array with ones on the diagonal and zeros elsewhere.
@@ -1056,32 +811,6 @@ INP_DLLEXPORT DPCTLSyclEventRef
 template <typename _DataType>
 INP_DLLEXPORT void dpnp_sort_c(void *array, void *result, size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief math library implementation of cholesky function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array               Input array with data.
- * @param [out] result              Output array.
- * @param [in]  size                Number of elements in input arrays.
- * @param [in]  data_size           Last element of shape arrays.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_cholesky_c(DPCTLSyclQueueRef q_ref,
-                    void *array1_in,
-                    void *result1,
-                    const size_t size,
-                    const size_t data_size,
-                    const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_cholesky_c(void *array1_in,
-                                   void *result1,
-                                   const size_t size,
-                                   const size_t data_size);
-
 /**
  * @ingroup BACKEND_API
  * @brief correlate function
@@ -1154,32 +883,6 @@ template <typename _DataType>
 INP_DLLEXPORT void
     dpnp_cov_c(void *array1_in, void *result1, size_t nrows, size_t ncols);
 
-/**
- * @ingroup BACKEND_API
- * @brief math library implementation of det function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array               Input array with data.
- * @param [out] result              Output array.
- * @param [in]  shape               Shape of input array.
- * @param [in]  ndim                Number of elements in shape.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_det_c(DPCTLSyclQueueRef q_ref,
-               void *array1_in,
-               void *result1,
-               shape_elem_type *shape,
-               size_t ndim,
-               const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_det_c(void *array1_in,
-                              void *result1,
-                              shape_elem_type *shape,
-                              size_t ndim);
-
 /**
  * @ingroup BACKEND_API
  * @brief Construct an array from an index array and a list of arrays to choose
@@ -1344,58 +1047,6 @@ INP_DLLEXPORT DPCTLSyclEventRef
 template <typename _DataType>
 INP_DLLEXPORT void dpnp_initval_c(void *result1, void *value, size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief math library implementation of inv function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1_in           Input array with data.
- * @param [out] result1             Output array.
- * @param [in]  shape               Shape of input array.
- * @param [in]  ndim                Number of elements in shape.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_inv_c(DPCTLSyclQueueRef q_ref,
-               void *array1_in,
-               void *result1,
-               shape_elem_type *shape,
-               size_t ndim,
-               const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT void dpnp_inv_c(void *array1_in,
-                              void *result1,
-                              shape_elem_type *shape,
-                              size_t ndim);
-
-/**
- * @ingroup BACKEND_API
- * @brief math library implementation of matrix_rank function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1_in           Input array with data.
- * @param [out] result1             Output array.
- * @param [in]  shape               Shape of input array.
- * @param [in]  ndim                Number of elements in shape.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_matrix_rank_c(DPCTLSyclQueueRef q_ref,
-                       void *array1_in,
-                       void *result1,
-                       shape_elem_type *shape,
-                       size_t ndim,
-                       const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_matrix_rank_c(void *array1_in,
-                                      void *result1,
-                                      shape_elem_type *shape,
-                                      size_t ndim);
-
 /**
  * @ingroup BACKEND_API
  * @brief math library implementation of max function
@@ -1572,33 +1223,6 @@ INP_DLLEXPORT DPCTLSyclEventRef
 template <typename _DataType, typename _idx_DataType>
 INP_DLLEXPORT void dpnp_argmin_c(void *array, void *result, size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief math library implementation of around function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  input_in            Input array with data.
- * @param [out] result_out          Output array with indices.
- * @param [in]  input_size          Number of elements in input arrays.
- * @param [in]  decimals            Number of decimal places to round. Support
- * only with default value 0.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_around_c(DPCTLSyclQueueRef q_ref,
-                  const void *input_in,
-                  void *result_out,
-                  const size_t input_size,
-                  const int decimals,
-                  const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_around_c(const void *input_in,
-                                 void *result_out,
-                                 const size_t input_size,
-                                 const int decimals);
-
 /**
  * @ingroup BACKEND_API
  * @brief math library implementation of std function
@@ -1820,55 +1444,6 @@ INP_DLLEXPORT void dpnp_var_c(void *array,
                               size_t naxis,
                               size_t ddof);
 
-/**
- * @ingroup BACKEND_API
- * @brief Implementation of invert function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1_in           Input array.
- * @param [out] result1             Output array.
- * @param [in]  size                Number of elements in the input array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_invert_c(DPCTLSyclQueueRef q_ref,
-                  void *array1_in,
-                  void *result,
-                  size_t size,
-                  const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_invert_c(void *array1_in, void *result, size_t size);
-
-#define MACRO_2ARG_1TYPE_OP(__name__, __operation__)                           \
-    template <typename _DataType>                                              \
-    INP_DLLEXPORT DPCTLSyclEventRef __name__(                                  \
-        DPCTLSyclQueueRef q_ref, void *result_out, const size_t result_size,   \
-        const size_t result_ndim, const shape_elem_type *result_shape,         \
-        const shape_elem_type *result_strides, const void *input1_in,          \
-        const size_t input1_size, const size_t input1_ndim,                    \
-        const shape_elem_type *input1_shape,                                   \
-        const shape_elem_type *input1_strides, const void *input2_in,          \
-        const size_t input2_size, const size_t input2_ndim,                    \
-        const shape_elem_type *input2_shape,                                   \
-        const shape_elem_type *input2_strides, const size_t *where,            \
-        const DPCTLEventVectorRef dep_event_vec_ref);                          \
-                                                                               \
-    template <typename _DataType>                                              \
-    INP_DLLEXPORT void __name__(                                               \
-        void *result_out, const size_t result_size, const size_t result_ndim,  \
-        const shape_elem_type *result_shape,                                   \
-        const shape_elem_type *result_strides, const void *input1_in,          \
-        const size_t input1_size, const size_t input1_ndim,                    \
-        const shape_elem_type *input1_shape,                                   \
-        const shape_elem_type *input1_strides, const void *input2_in,          \
-        const size_t input2_size, const size_t input2_ndim,                    \
-        const shape_elem_type *input2_shape,                                   \
-        const shape_elem_type *input2_strides, const size_t *where);
-
-#include <dpnp_gen_2arg_1type_tbl.hpp>
-
 #define MACRO_1ARG_1TYPE_OP(__name__, __operation1__, __operation2__)          \
     template <typename _DataType>                                              \
     INP_DLLEXPORT DPCTLSyclEventRef __name__(                                  \
@@ -1913,23 +1488,6 @@ INP_DLLEXPORT void dpnp_invert_c(void *array1_in, void *result, size_t size);
 
 #include <dpnp_gen_1arg_2type_tbl.hpp>
 
-#define MACRO_2ARG_2TYPES_LOGIC_OP(__name__, __operation__)                    \
-    template <typename _DataType_output, typename _DataType_input1,            \
-              typename _DataType_input2>                                       \
-    INP_DLLEXPORT DPCTLSyclEventRef __name__(                                  \
-        DPCTLSyclQueueRef q_ref, void *result_out, const size_t result_size,   \
-        const size_t result_ndim, const shape_elem_type *result_shape,         \
-        const shape_elem_type *result_strides, const void *input1_in,          \
-        const size_t input1_size, const size_t input1_ndim,                    \
-        const shape_elem_type *input1_shape,                                   \
-        const shape_elem_type *input1_strides, const void *input2_in,          \
-        const size_t input2_size, const size_t input2_ndim,                    \
-        const shape_elem_type *input2_shape,                                   \
-        const shape_elem_type *input2_strides, const size_t *where,            \
-        const DPCTLEventVectorRef dep_event_vec_ref);
-
-#include <dpnp_gen_2arg_2type_tbl.hpp>
-
 #define MACRO_2ARG_3TYPES_OP(__name__, __operation__, __vec_operation__,       \
                              __vec_types__, __mkl_operation__, __mkl_types__)  \
     template <typename _DataType_output, typename _DataType_input1,            \
@@ -1987,54 +1545,6 @@ INP_DLLEXPORT void dpnp_fill_diagonal_c(void *array1_in,
                                         shape_elem_type *shape,
                                         const size_t ndim);
 
-/**
- * @ingroup BACKEND_API
- * @brief floor_divide function.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [out] result_out          Output array.
- * @param [in]  input1_in           First input array.
- * @param [in]  input1_size         Size of first input array.
- * @param [in]  input1_shape        Shape of first input array.
- * @param [in]  input1_shape_ndim   Number of first array dimensions.
- * @param [in]  input2_in           Second input array.
- * @param [in]  input2_size         Shape of second input array.
- * @param [in]  input2_shape        Shape of first input array.
- * @param [in]  input2_shape_ndim   Number of second array dimensions.
- * @param [in]  where               Mask array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType_input1,
-          typename _DataType_input2,
-          typename _DataType_output>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_floor_divide_c(DPCTLSyclQueueRef q_ref,
-                        void *result_out,
-                        const void *input1_in,
-                        const size_t input1_size,
-                        const shape_elem_type *input1_shape,
-                        const size_t input1_shape_ndim,
-                        const void *input2_in,
-                        const size_t input2_size,
-                        const shape_elem_type *input2_shape,
-                        const size_t input2_shape_ndim,
-                        const size_t *where,
-                        const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType_input1,
-          typename _DataType_input2,
-          typename _DataType_output>
-INP_DLLEXPORT void dpnp_floor_divide_c(void *result_out,
-                                       const void *input1_in,
-                                       const size_t input1_size,
-                                       const shape_elem_type *input1_shape,
-                                       const size_t input1_shape_ndim,
-                                       const void *input2_in,
-                                       const size_t input2_size,
-                                       const shape_elem_type *input2_shape,
-                                       const size_t input2_shape_ndim,
-                                       const size_t *where);
-
 /**
  * @ingroup BACKEND_API
  * @brief modf function.
@@ -2099,54 +1609,6 @@ INP_DLLEXPORT DPCTLSyclEventRef
 template <typename _DataType>
 INP_DLLEXPORT void dpnp_ones_like_c(void *result, size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief remainder function.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [out] result_out          Output array.
- * @param [in]  input1_in           First input array.
- * @param [in]  input1_size         Size of first input array.
- * @param [in]  input1_shape        Shape of first input array.
- * @param [in]  input1_shape_ndim   Number of first array dimensions.
- * @param [in]  input2_in           Second input array.
- * @param [in]  input2_size         Shape of second input array.
- * @param [in]  input2_shape        Shape of first input array.
- * @param [in]  input2_shape_ndim   Number of second array dimensions.
- * @param [in]  where               Mask array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_remainder_c(DPCTLSyclQueueRef q_ref,
-                     void *result_out,
-                     const void *input1_in,
-                     const size_t input1_size,
-                     const shape_elem_type *input1_shape,
-                     const size_t input1_shape_ndim,
-                     const void *input2_in,
-                     const size_t input2_size,
-                     const shape_elem_type *input2_shape,
-                     const size_t input2_shape_ndim,
-                     const size_t *where,
-                     const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-INP_DLLEXPORT void dpnp_remainder_c(void *result_out,
-                                    const void *input1_in,
-                                    const size_t input1_size,
-                                    const shape_elem_type *input1_shape,
-                                    const size_t input1_shape_ndim,
-                                    const void *input2_in,
-                                    const size_t input2_size,
-                                    const shape_elem_type *input2_shape,
-                                    const size_t input2_shape_ndim,
-                                    const size_t *where);
-
 /**
  * @ingroup BACKEND_API
  * @brief repeat elements of an array.
@@ -2173,81 +1635,6 @@ INP_DLLEXPORT void dpnp_repeat_c(const void *array_in,
                                  const size_t repeats,
                                  const size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief transpose function. Permute axes of the input to the output with
- * elements permutation.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1_in           Input array.
- * @param [in]  input_shape         Input shape.
- * @param [in]  result_shape        Output shape.
- * @param [in]  permute_axes        Order of axis by it's id as it should be
- * presented in output.
- * @param [in]  ndim                Number of elements in shapes and axes.
- * @param [out] result1             Output array.
- * @param [in]  size                Number of elements in input arrays.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_elemwise_transpose_c(DPCTLSyclQueueRef q_ref,
-                              void *array1_in,
-                              const shape_elem_type *input_shape,
-                              const shape_elem_type *result_shape,
-                              const shape_elem_type *permute_axes,
-                              size_t ndim,
-                              void *result1,
-                              size_t size,
-                              const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void
-    dpnp_elemwise_transpose_c(void *array1_in,
-                              const shape_elem_type *input_shape,
-                              const shape_elem_type *result_shape,
-                              const shape_elem_type *permute_axes,
-                              size_t ndim,
-                              void *result1,
-                              size_t size);
-
-/**
- * @ingroup BACKEND_API
- * @brief Custom implementation of trapz function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1_in           First input array.
- * @param [in]  array2_in           Second input array.
- * @param [out] result1             Output array.
- * @param [in]  dx                  The spacing between sample points.
- * @param [in]  array1_size         Number of elements in first input array.
- * @param [in]  array2_size         Number of elements in second input arrays.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- *
- */
-template <typename _DataType_input1,
-          typename _DataType_input2,
-          typename _DataType_output>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_trapz_c(DPCTLSyclQueueRef q_ref,
-                 const void *array1_in,
-                 const void *array2_in,
-                 void *result1,
-                 double dx,
-                 size_t array1_size,
-                 size_t array2_size,
-                 const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType_input1,
-          typename _DataType_input2,
-          typename _DataType_output>
-INP_DLLEXPORT void dpnp_trapz_c(const void *array1_in,
-                                const void *array2_in,
-                                void *result1,
-                                double dx,
-                                size_t array1_size,
-                                size_t array2_size);
-
 /**
  * @ingroup BACKEND_API
  * @brief Implementation of vander function
diff --git a/dpnp/backend/include/dpnp_iface_fptr.hpp b/dpnp/backend/include/dpnp_iface_fptr.hpp
index 1172bcbe4f5f..a39174931fec 100644
--- a/dpnp/backend/include/dpnp_iface_fptr.hpp
+++ b/dpnp/backend/include/dpnp_iface_fptr.hpp
@@ -58,77 +58,42 @@
  */
 enum class DPNPFuncName : size_t
 {
-    DPNP_FN_NONE,         /**< Very first element of the enumeration */
-    DPNP_FN_ABSOLUTE,     /**< Used in numpy.absolute() impl  */
-    DPNP_FN_ADD,          /**< Used in numpy.add() impl  */
-    DPNP_FN_ALL,          /**< Used in numpy.all() impl  */
-    DPNP_FN_ALLCLOSE,     /**< Used in numpy.allclose() impl  */
-    DPNP_FN_ALLCLOSE_EXT, /**< Used in numpy.allclose() impl, requires extra
-                             parameters */
-    DPNP_FN_ANY,          /**< Used in numpy.any() impl  */
-    DPNP_FN_ARANGE,       /**< Used in numpy.arange() impl  */
-    DPNP_FN_ARCCOS,       /**< Used in numpy.arccos() impl  */
-    DPNP_FN_ARCCOSH,      /**< Used in numpy.arccosh() impl  */
-    DPNP_FN_ARCSIN,       /**< Used in numpy.arcsin() impl  */
-    DPNP_FN_ARCSINH,      /**< Used in numpy.arcsinh() impl  */
-    DPNP_FN_ARCTAN,       /**< Used in numpy.arctan() impl  */
-    DPNP_FN_ARCTAN2,      /**< Used in numpy.arctan2() impl  */
-    DPNP_FN_ARCTANH,      /**< Used in numpy.arctanh() impl  */
-    DPNP_FN_ARGMAX,       /**< Used in numpy.argmax() impl  */
-    DPNP_FN_ARGMIN,       /**< Used in numpy.argmin() impl  */
-    DPNP_FN_ARGSORT,      /**< Used in numpy.argsort() impl  */
-    DPNP_FN_AROUND,       /**< Used in numpy.around() impl  */
-    DPNP_FN_ASTYPE,       /**< Used in numpy.astype() impl  */
-    DPNP_FN_BITWISE_AND,  /**< Used in numpy.bitwise_and() impl  */
-    DPNP_FN_BITWISE_OR,   /**< Used in numpy.bitwise_or() impl  */
-    DPNP_FN_BITWISE_XOR,  /**< Used in numpy.bitwise_xor() impl  */
-    DPNP_FN_CBRT,         /**< Used in numpy.cbrt() impl  */
-    DPNP_FN_CEIL,         /**< Used in numpy.ceil() impl  */
-    DPNP_FN_CHOLESKY,     /**< Used in numpy.linalg.cholesky() impl  */
-    DPNP_FN_CONJUGATE,    /**< Used in numpy.conjugate() impl  */
-    DPNP_FN_CHOOSE,       /**< Used in numpy.choose() impl  */
-    DPNP_FN_CHOOSE_EXT,   /**< Used in numpy.choose() impl, requires extra
-                             parameters */
-    DPNP_FN_COPY,         /**< Used in numpy.copy() impl  */
-    DPNP_FN_COPY_EXT, /**< Used in numpy.copy() impl, requires extra parameters
-                       */
-    DPNP_FN_COPYSIGN, /**< Used in numpy.copysign() impl  */
-    DPNP_FN_COPYTO,   /**< Used in numpy.copyto() impl  */
+    DPNP_FN_NONE,          /**< Very first element of the enumeration */
+    DPNP_FN_ALL,           /**< Used in numpy.all() impl  */
+    DPNP_FN_ALLCLOSE,      /**< Used in numpy.allclose() impl  */
+    DPNP_FN_ALLCLOSE_EXT,  /**< Used in numpy.allclose() impl, requires extra
+                              parameters */
+    DPNP_FN_ANY,           /**< Used in numpy.any() impl  */
+    DPNP_FN_ARANGE,        /**< Used in numpy.arange() impl  */
+    DPNP_FN_ARGMAX,        /**< Used in numpy.argmax() impl  */
+    DPNP_FN_ARGMIN,        /**< Used in numpy.argmin() impl  */
+    DPNP_FN_ARGSORT,       /**< Used in numpy.argsort() impl  */
+    DPNP_FN_CHOOSE,        /**< Used in numpy.choose() impl  */
+    DPNP_FN_CHOOSE_EXT,    /**< Used in numpy.choose() impl, requires extra
+                              parameters */
+    DPNP_FN_COPYTO,        /**< Used in numpy.copyto() impl  */
     DPNP_FN_COPYTO_EXT,    /**< Used in numpy.copyto() impl, requires extra
                               parameters */
     DPNP_FN_CORRELATE,     /**< Used in numpy.correlate() impl  */
     DPNP_FN_CORRELATE_EXT, /**< Used in numpy.correlate() impl, requires extra
                               parameters */
-    DPNP_FN_COS,           /**< Used in numpy.cos() impl  */
-    DPNP_FN_COSH,          /**< Used in numpy.cosh() impl  */
     DPNP_FN_COUNT_NONZERO, /**< Used in numpy.count_nonzero() impl  */
     DPNP_FN_COV,           /**< Used in numpy.cov() impl  */
-    DPNP_FN_CROSS,         /**< Used in numpy.cross() impl  */
-    DPNP_FN_CUMPROD,       /**< Used in numpy.cumprod() impl  */
-    DPNP_FN_CUMSUM,        /**< Used in numpy.cumsum() impl  */
     DPNP_FN_DEGREES,       /**< Used in numpy.degrees() impl  */
     DPNP_FN_DEGREES_EXT,   /**< Used in numpy.degrees() impl, requires extra
                               parameters */
-    DPNP_FN_DET,           /**< Used in numpy.linalg.det() impl  */
     DPNP_FN_DIAG,          /**< Used in numpy.diag() impl  */
     DPNP_FN_DIAG_INDICES,  /**< Used in numpy.diag_indices() impl  */
     DPNP_FN_DIAGONAL,      /**< Used in numpy.diagonal() impl  */
-    DPNP_FN_DIVIDE,        /**< Used in numpy.divide() impl  */
     DPNP_FN_DOT,           /**< Used in numpy.dot() impl  */
     DPNP_FN_DOT_EXT, /**< Used in numpy.dot() impl, requires extra parameters */
     DPNP_FN_EDIFF1D, /**< Used in numpy.ediff1d() impl  */
     DPNP_FN_EDIFF1D_EXT,   /**< Used in numpy.ediff1d() impl, requires extra
                               parameters */
-    DPNP_FN_EIG,           /**< Used in numpy.linalg.eig() impl  */
-    DPNP_FN_EIGVALS,       /**< Used in numpy.linalg.eigvals() impl  */
     DPNP_FN_ERF,           /**< Used in scipy.special.erf impl  */
     DPNP_FN_ERF_EXT,       /**< Used in scipy.special.erf impl, requires extra
                               parameters */
     DPNP_FN_EYE,           /**< Used in numpy.eye() impl  */
-    DPNP_FN_EXP,           /**< Used in numpy.exp() impl  */
-    DPNP_FN_EXP2,          /**< Used in numpy.exp2() impl  */
-    DPNP_FN_EXPM1,         /**< Used in numpy.expm1() impl  */
-    DPNP_FN_FABS,          /**< Used in numpy.fabs() impl  */
     DPNP_FN_FFT_FFT,       /**< Used in numpy.fft.fft() impl  */
     DPNP_FN_FFT_FFT_EXT,   /**< Used in numpy.fft.fft() impl, requires extra
                               parameters */
@@ -136,30 +101,15 @@ enum class DPNPFuncName : size_t
     DPNP_FN_FFT_RFFT_EXT,  /**< Used in numpy.fft.rfft() impl, requires extra
                               parameters */
     DPNP_FN_FILL_DIAGONAL, /**< Used in numpy.fill_diagonal() impl  */
-    DPNP_FN_FLATTEN,       /**< Used in numpy.flatten() impl  */
-    DPNP_FN_FLOOR,         /**< Used in numpy.floor() impl  */
-    DPNP_FN_FLOOR_DIVIDE,  /**< Used in numpy.floor_divide() impl  */
-    DPNP_FN_FMOD,          /**< Used in numpy.fmod() impl  */
     DPNP_FN_FULL,          /**< Used in numpy.full() impl  */
     DPNP_FN_FULL_LIKE,     /**< Used in numpy.full_like() impl  */
-    DPNP_FN_HYPOT,         /**< Used in numpy.hypot() impl  */
     DPNP_FN_IDENTITY,      /**< Used in numpy.identity() impl  */
     DPNP_FN_INITVAL, /**< Used in numpy ones, ones_like, zeros, zeros_like impls
                       */
     DPNP_FN_INITVAL_EXT, /**< Used in numpy ones, ones_like, zeros, zeros_like
                             impls  */
-    DPNP_FN_INV,         /**< Used in numpy.linalg.inv() impl  */
     DPNP_FN_INVERT,      /**< Used in numpy.invert() impl  */
-    DPNP_FN_KRON,        /**< Used in numpy.kron() impl  */
-    DPNP_FN_LEFT_SHIFT,  /**< Used in numpy.left_shift() impl  */
-    DPNP_FN_LOG,         /**< Used in numpy.log() impl  */
-    DPNP_FN_LOG10,       /**< Used in numpy.log10() impl  */
-    DPNP_FN_LOG2,        /**< Used in numpy.log2() impl  */
-    DPNP_FN_LOG1P,       /**< Used in numpy.log1p() impl  */
-    DPNP_FN_MATMUL,      /**< Used in numpy.matmul() impl  */
-    DPNP_FN_MATRIX_RANK, /**< Used in numpy.linalg.matrix_rank() impl  */
     DPNP_FN_MAX,         /**< Used in numpy.max() impl  */
-    DPNP_FN_MAXIMUM,     /**< Used in numpy.fmax() impl  */
     DPNP_FN_MAXIMUM_EXT, /**< Used in numpy.fmax() impl , requires extra
                             parameters */
     DPNP_FN_MEAN,        /**< Used in numpy.mean() impl  */
@@ -167,7 +117,6 @@ enum class DPNPFuncName : size_t
     DPNP_FN_MEDIAN_EXT,  /**< Used in numpy.median() impl, requires extra
                             parameters */
     DPNP_FN_MIN,         /**< Used in numpy.min() impl  */
-    DPNP_FN_MINIMUM,     /**< Used in numpy.fmin() impl  */
     DPNP_FN_MINIMUM_EXT, /**< Used in numpy.fmax() impl, requires extra
                             parameters */
     DPNP_FN_MODF,        /**< Used in numpy.modf() impl  */
@@ -175,7 +124,6 @@ enum class DPNPFuncName : size_t
                         */
     DPNP_FN_MULTIPLY,  /**< Used in numpy.multiply() impl  */
     DPNP_FN_NANVAR,    /**< Used in numpy.nanvar() impl  */
-    DPNP_FN_NEGATIVE,  /**< Used in numpy.negative() impl  */
     DPNP_FN_NONZERO,   /**< Used in numpy.nonzero() impl  */
     DPNP_FN_ONES,      /**< Used in numpy.ones() impl */
     DPNP_FN_ONES_LIKE, /**< Used in numpy.ones_like() impl */
@@ -183,19 +131,14 @@ enum class DPNPFuncName : size_t
     DPNP_FN_PARTITION_EXT,  /**< Used in numpy.partition() impl, requires extra
                                parameters */
     DPNP_FN_PLACE,          /**< Used in numpy.place() impl  */
-    DPNP_FN_POWER,          /**< Used in numpy.power() impl  */
     DPNP_FN_PROD,           /**< Used in numpy.prod() impl  */
     DPNP_FN_PTP,            /**< Used in numpy.ptp() impl  */
     DPNP_FN_PUT,            /**< Used in numpy.put() impl  */
     DPNP_FN_PUT_ALONG_AXIS, /**< Used in numpy.put_along_axis() impl  */
-    DPNP_FN_QR,             /**< Used in numpy.linalg.qr() impl  */
     DPNP_FN_RADIANS,        /**< Used in numpy.radians() impl  */
     DPNP_FN_RADIANS_EXT,    /**< Used in numpy.radians() impl, requires extra
                                parameters */
-    DPNP_FN_REMAINDER,      /**< Used in numpy.remainder() impl  */
-    DPNP_FN_RECIP,          /**< Used in numpy.recip() impl  */
     DPNP_FN_REPEAT,         /**< Used in numpy.repeat() impl  */
-    DPNP_FN_RIGHT_SHIFT,    /**< Used in numpy.right_shift() impl  */
     DPNP_FN_RNG_BETA,       /**< Used in numpy.random.beta() impl  */
     DPNP_FN_RNG_BETA_EXT, /**< Used in numpy.random.beta() impl, requires extra
                              parameters */
@@ -314,32 +257,22 @@ enum class DPNPFuncName : size_t
     DPNP_FN_RNG_ZIPF_EXT, /**< Used in numpy.random.zipf() impl, requires extra
                              parameters */
     DPNP_FN_SEARCHSORTED, /**< Used in numpy.searchsorted() impl  */
-    DPNP_FN_SIGN,         /**< Used in numpy.sign() impl  */
-    DPNP_FN_SIN,          /**< Used in numpy.sin() impl  */
-    DPNP_FN_SINH,         /**< Used in numpy.sinh() impl  */
     DPNP_FN_SORT,         /**< Used in numpy.sort() impl  */
     DPNP_FN_SQRT,         /**< Used in numpy.sqrt() impl  */
     DPNP_FN_SQRT_EXT, /**< Used in numpy.sqrt() impl, requires extra parameters
                        */
-    DPNP_FN_SQUARE,   /**< Used in numpy.square() impl  */
     DPNP_FN_STD,      /**< Used in numpy.std() impl  */
-    DPNP_FN_SUBTRACT, /**< Used in numpy.subtract() impl  */
     DPNP_FN_SUBTRACT_EXT, /**< Used in numpy.subtract() impl, requires extra
                              parameters */
     DPNP_FN_SUM,          /**< Used in numpy.sum() impl  */
-    DPNP_FN_SVD,          /**< Used in numpy.linalg.svd() impl  */
     DPNP_FN_TAKE,         /**< Used in numpy.take() impl  */
-    DPNP_FN_TAN,          /**< Used in numpy.tan() impl  */
-    DPNP_FN_TANH,         /**< Used in numpy.tanh() impl  */
     DPNP_FN_TRANSPOSE,    /**< Used in numpy.transpose() impl  */
     DPNP_FN_TRACE,        /**< Used in numpy.trace() impl  */
-    DPNP_FN_TRAPZ,        /**< Used in numpy.trapz() impl  */
     DPNP_FN_TRAPZ_EXT,    /**< Used in numpy.trapz() impl, requires extra
                              parameters */
     DPNP_FN_TRI,          /**< Used in numpy.tri() impl  */
     DPNP_FN_TRIL,         /**< Used in numpy.tril() impl  */
     DPNP_FN_TRIU,         /**< Used in numpy.triu() impl  */
-    DPNP_FN_TRUNC,        /**< Used in numpy.trunc() impl  */
     DPNP_FN_VANDER,       /**< Used in numpy.vander() impl  */
     DPNP_FN_VAR,          /**< Used in numpy.var() impl  */
     DPNP_FN_ZEROS,        /**< Used in numpy.zeros() impl */
diff --git a/dpnp/backend/kernels/dpnp_krnl_bitwise.cpp b/dpnp/backend/kernels/dpnp_krnl_bitwise.cpp
deleted file mode 100644
index 9db8425f6de4..000000000000
--- a/dpnp/backend/kernels/dpnp_krnl_bitwise.cpp
+++ /dev/null
@@ -1,438 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2016-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-#include <iostream>
-
-#include "dpnp_fptr.hpp"
-#include "dpnp_iface.hpp"
-#include "dpnp_iterator.hpp"
-#include "dpnp_utils.hpp"
-#include "dpnpc_memory_adapter.hpp"
-#include "queue_sycl.hpp"
-
-// dpctl tensor headers
-#include "kernels/alignment.hpp"
-
-using dpctl::tensor::kernels::alignment_utils::is_aligned;
-using dpctl::tensor::kernels::alignment_utils::required_alignment;
-
-template <typename _KernelNameSpecialization>
-class dpnp_invert_c_kernel;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_invert_c(DPCTLSyclQueueRef q_ref,
-                                void *array1_in,
-                                void *result1,
-                                size_t size,
-                                const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    _DataType *input_data = static_cast<_DataType *>(array1_in);
-    _DataType *result = static_cast<_DataType *>(result1);
-
-    constexpr size_t lws = 64;
-    constexpr unsigned int vec_sz = 8;
-
-    auto gws_range =
-        sycl::range<1>(((size + lws * vec_sz - 1) / (lws * vec_sz)) * lws);
-    auto lws_range = sycl::range<1>(lws);
-
-    auto kernel_parallel_for_func = [=](sycl::nd_item<1> nd_it) {
-        auto sg = nd_it.get_sub_group();
-        const auto max_sg_size = sg.get_max_local_range()[0];
-        const size_t start =
-            vec_sz * (nd_it.get_group(0) * nd_it.get_local_range(0) +
-                      sg.get_group_id()[0] * max_sg_size);
-
-        if (is_aligned<required_alignment>(input_data) &&
-            is_aligned<required_alignment>(result) &&
-            (start + static_cast<size_t>(vec_sz) * max_sg_size < size))
-        {
-            auto input_multi_ptr = sycl::address_space_cast<
-                sycl::access::address_space::global_space,
-                sycl::access::decorated::yes>(&input_data[start]);
-            auto result_multi_ptr = sycl::address_space_cast<
-                sycl::access::address_space::global_space,
-                sycl::access::decorated::yes>(&result[start]);
-
-            sycl::vec<_DataType, vec_sz> x = sg.load<vec_sz>(input_multi_ptr);
-            sycl::vec<_DataType, vec_sz> res_vec;
-
-            if constexpr (std::is_same_v<_DataType, bool>) {
-#pragma unroll
-                for (size_t k = 0; k < vec_sz; ++k) {
-                    res_vec[k] = !(x[k]);
-                }
-            }
-            else {
-                res_vec = ~x;
-            }
-
-            sg.store<vec_sz>(result_multi_ptr, res_vec);
-        }
-        else {
-            for (size_t k = start + sg.get_local_id()[0]; k < size;
-                 k += max_sg_size) {
-                if constexpr (std::is_same_v<_DataType, bool>) {
-                    result[k] = !(input_data[k]);
-                }
-                else {
-                    result[k] = ~(input_data[k]);
-                }
-            }
-        }
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_invert_c_kernel<_DataType>>(
-            sycl::nd_range<1>(gws_range, lws_range), kernel_parallel_for_func);
-    };
-    event = q.submit(kernel_func);
-
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_invert_c(void *array1_in, void *result1, size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_invert_c<_DataType>(
-        q_ref, array1_in, result1, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_invert_default_c)(void *,
-                              void *,
-                              size_t) = dpnp_invert_c<_DataType>;
-
-static void func_map_init_bitwise_1arg_1type(func_map_t &fmap)
-{
-    fmap[DPNPFuncName::DPNP_FN_INVERT][eft_BLN][eft_BLN] = {
-        eft_BLN, (void *)dpnp_invert_default_c<bool>};
-    fmap[DPNPFuncName::DPNP_FN_INVERT][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_invert_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_INVERT][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_invert_default_c<int64_t>};
-
-    return;
-}
-
-#define MACRO_2ARG_1TYPE_OP(__name__, __operation__)                           \
-    template <typename _KernelNameSpecialization>                              \
-    class __name__##_kernel;                                                   \
-                                                                               \
-    template <typename _KernelNameSpecialization>                              \
-    class __name__##_strides_kernel;                                           \
-                                                                               \
-    template <typename _KernelNameSpecialization>                              \
-    class __name__##_broadcast_kernel;                                         \
-                                                                               \
-    template <typename _DataType>                                              \
-    DPCTLSyclEventRef __name__(                                                \
-        DPCTLSyclQueueRef q_ref, void *result_out, const size_t result_size,   \
-        const size_t result_ndim, const shape_elem_type *result_shape,         \
-        const shape_elem_type *result_strides, const void *input1_in,          \
-        const size_t input1_size, const size_t input1_ndim,                    \
-        const shape_elem_type *input1_shape,                                   \
-        const shape_elem_type *input1_strides, const void *input2_in,          \
-        const size_t input2_size, const size_t input2_ndim,                    \
-        const shape_elem_type *input2_shape,                                   \
-        const shape_elem_type *input2_strides, const size_t *where,            \
-        const DPCTLEventVectorRef dep_event_vec_ref)                           \
-    {                                                                          \
-        /* avoid warning unused variable*/                                     \
-        (void)result_shape;                                                    \
-        (void)where;                                                           \
-        (void)dep_event_vec_ref;                                               \
-                                                                               \
-        DPCTLSyclEventRef event_ref = nullptr;                                 \
-                                                                               \
-        if (!input1_size || !input2_size) {                                    \
-            return event_ref;                                                  \
-        }                                                                      \
-                                                                               \
-        sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));             \
-                                                                               \
-        _DataType *input1_data =                                               \
-            static_cast<_DataType *>(const_cast<void *>(input1_in));           \
-        _DataType *input2_data =                                               \
-            static_cast<_DataType *>(const_cast<void *>(input2_in));           \
-        _DataType *result = static_cast<_DataType *>(result_out);              \
-                                                                               \
-        bool use_broadcasting = !array_equal(input1_shape, input1_ndim,        \
-                                             input2_shape, input2_ndim);       \
-                                                                               \
-        shape_elem_type *input1_shape_offsets =                                \
-            new shape_elem_type[input1_ndim];                                  \
-                                                                               \
-        get_shape_offsets_inkernel(input1_shape, input1_ndim,                  \
-                                   input1_shape_offsets);                      \
-        bool use_strides = !array_equal(input1_strides, input1_ndim,           \
-                                        input1_shape_offsets, input1_ndim);    \
-        delete[] input1_shape_offsets;                                         \
-                                                                               \
-        shape_elem_type *input2_shape_offsets =                                \
-            new shape_elem_type[input2_ndim];                                  \
-                                                                               \
-        get_shape_offsets_inkernel(input2_shape, input2_ndim,                  \
-                                   input2_shape_offsets);                      \
-        use_strides =                                                          \
-            use_strides || !array_equal(input2_strides, input2_ndim,           \
-                                        input2_shape_offsets, input2_ndim);    \
-        delete[] input2_shape_offsets;                                         \
-                                                                               \
-        sycl::event event;                                                     \
-        sycl::range<1> gws(result_size);                                       \
-                                                                               \
-        if (use_broadcasting) {                                                \
-            DPNPC_id<_DataType> *input1_it;                                    \
-            const size_t input1_it_size_in_bytes =                             \
-                sizeof(DPNPC_id<_DataType>);                                   \
-            input1_it = reinterpret_cast<DPNPC_id<_DataType> *>(               \
-                dpnp_memory_alloc_c(q_ref, input1_it_size_in_bytes));          \
-            new (input1_it)                                                    \
-                DPNPC_id<_DataType>(q_ref, input1_data, input1_shape,          \
-                                    input1_strides, input1_ndim);              \
-                                                                               \
-            input1_it->broadcast_to_shape(result_shape, result_ndim);          \
-                                                                               \
-            DPNPC_id<_DataType> *input2_it;                                    \
-            const size_t input2_it_size_in_bytes =                             \
-                sizeof(DPNPC_id<_DataType>);                                   \
-            input2_it = reinterpret_cast<DPNPC_id<_DataType> *>(               \
-                dpnp_memory_alloc_c(q_ref, input2_it_size_in_bytes));          \
-            new (input2_it)                                                    \
-                DPNPC_id<_DataType>(q_ref, input2_data, input2_shape,          \
-                                    input2_strides, input2_ndim);              \
-                                                                               \
-            input2_it->broadcast_to_shape(result_shape, result_ndim);          \
-                                                                               \
-            auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {       \
-                const size_t i = global_id[0]; /* for (size_t i = 0; i <       \
-                                                  result_size; ++i) */         \
-                {                                                              \
-                    const _DataType input1_elem = (*input1_it)[i];             \
-                    const _DataType input2_elem = (*input2_it)[i];             \
-                    result[i] = __operation__;                                 \
-                }                                                              \
-            };                                                                 \
-            auto kernel_func = [&](sycl::handler &cgh) {                       \
-                cgh.parallel_for<                                              \
-                    class __name__##_broadcast_kernel<_DataType>>(             \
-                    gws, kernel_parallel_for_func);                            \
-            };                                                                 \
-                                                                               \
-            q.submit(kernel_func).wait();                                      \
-                                                                               \
-            input1_it->~DPNPC_id();                                            \
-            input2_it->~DPNPC_id();                                            \
-                                                                               \
-            return event_ref;                                                  \
-        }                                                                      \
-        else if (use_strides) {                                                \
-            if ((result_ndim != input1_ndim) || (result_ndim != input2_ndim))  \
-            {                                                                  \
-                throw std::runtime_error(                                      \
-                    "Result ndim=" + std::to_string(result_ndim) +             \
-                    " mismatches with either input1 ndim=" +                   \
-                    std::to_string(input1_ndim) +                              \
-                    " or input2 ndim=" + std::to_string(input2_ndim));         \
-            }                                                                  \
-                                                                               \
-            /* memory transfer optimization, use USM-host for temporary speeds \
-             * up transfer to device */                                        \
-            using usm_host_allocatorT =                                        \
-                sycl::usm_allocator<shape_elem_type, sycl::usm::alloc::host>;  \
-                                                                               \
-            size_t strides_size = 3 * result_ndim;                             \
-            shape_elem_type *dev_strides_data =                                \
-                sycl::malloc_device<shape_elem_type>(strides_size, q);         \
-                                                                               \
-            /* create host temporary for packed strides managed by shared      \
-             * pointer */                                                      \
-            auto strides_host_packed =                                         \
-                std::vector<shape_elem_type, usm_host_allocatorT>(             \
-                    strides_size, usm_host_allocatorT(q));                     \
-                                                                               \
-            /* packed vector is concatenation of result_strides,               \
-             * input1_strides and input2_strides */                            \
-            std::copy(result_strides, result_strides + result_ndim,            \
-                      strides_host_packed.begin());                            \
-            std::copy(input1_strides, input1_strides + result_ndim,            \
-                      strides_host_packed.begin() + result_ndim);              \
-            std::copy(input2_strides, input2_strides + result_ndim,            \
-                      strides_host_packed.begin() + 2 * result_ndim);          \
-                                                                               \
-            auto copy_strides_ev = q.copy<shape_elem_type>(                    \
-                strides_host_packed.data(), dev_strides_data,                  \
-                strides_host_packed.size());                                   \
-                                                                               \
-            auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {       \
-                const size_t output_id =                                       \
-                    global_id[0]; /* for (size_t i = 0; i < result_size; ++i)  \
-                                   */                                          \
-                {                                                              \
-                    const shape_elem_type *result_strides_data =               \
-                        &dev_strides_data[0];                                  \
-                    const shape_elem_type *input1_strides_data =               \
-                        &dev_strides_data[result_ndim];                        \
-                    const shape_elem_type *input2_strides_data =               \
-                        &dev_strides_data[2 * result_ndim];                    \
-                                                                               \
-                    size_t input1_id = 0;                                      \
-                    size_t input2_id = 0;                                      \
-                                                                               \
-                    for (size_t i = 0; i < result_ndim; ++i) {                 \
-                        const size_t output_xyz_id =                           \
-                            get_xyz_id_by_id_inkernel(output_id,               \
-                                                      result_strides_data,     \
-                                                      result_ndim, i);         \
-                        input1_id += output_xyz_id * input1_strides_data[i];   \
-                        input2_id += output_xyz_id * input2_strides_data[i];   \
-                    }                                                          \
-                                                                               \
-                    const _DataType input1_elem =                              \
-                        (input1_size == 1) ? input1_data[0]                    \
-                                           : input1_data[input1_id];           \
-                    const _DataType input2_elem =                              \
-                        (input2_size == 1) ? input2_data[0]                    \
-                                           : input2_data[input2_id];           \
-                    result[output_id] = __operation__;                         \
-                }                                                              \
-            };                                                                 \
-            auto kernel_func = [&](sycl::handler &cgh) {                       \
-                cgh.depends_on(copy_strides_ev);                               \
-                cgh.parallel_for<class __name__##_strides_kernel<_DataType>>(  \
-                    gws, kernel_parallel_for_func);                            \
-            };                                                                 \
-                                                                               \
-            q.submit(kernel_func).wait();                                      \
-                                                                               \
-            sycl::free(dev_strides_data, q);                                   \
-            return event_ref;                                                  \
-        }                                                                      \
-        else {                                                                 \
-            auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {       \
-                size_t i = global_id[0]; /* for (size_t i = 0; i <             \
-                                            result_size; ++i) */               \
-                const _DataType input1_elem =                                  \
-                    (input1_size == 1) ? input1_data[0] : input1_data[i];      \
-                const _DataType input2_elem =                                  \
-                    (input2_size == 1) ? input2_data[0] : input2_data[i];      \
-                result[i] = __operation__;                                     \
-            };                                                                 \
-            auto kernel_func = [&](sycl::handler &cgh) {                       \
-                cgh.parallel_for<class __name__##_kernel<_DataType>>(          \
-                    gws, kernel_parallel_for_func);                            \
-            };                                                                 \
-            event = q.submit(kernel_func);                                     \
-        }                                                                      \
-                                                                               \
-        event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);               \
-        return DPCTLEvent_Copy(event_ref);                                     \
-    }                                                                          \
-                                                                               \
-    template <typename _DataType>                                              \
-    void __name__(                                                             \
-        void *result_out, const size_t result_size, const size_t result_ndim,  \
-        const shape_elem_type *result_shape,                                   \
-        const shape_elem_type *result_strides, const void *input1_in,          \
-        const size_t input1_size, const size_t input1_ndim,                    \
-        const shape_elem_type *input1_shape,                                   \
-        const shape_elem_type *input1_strides, const void *input2_in,          \
-        const size_t input2_size, const size_t input2_ndim,                    \
-        const shape_elem_type *input2_shape,                                   \
-        const shape_elem_type *input2_strides, const size_t *where)            \
-    {                                                                          \
-        DPCTLSyclQueueRef q_ref =                                              \
-            reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);                  \
-        DPCTLEventVectorRef dep_event_vec_ref = nullptr;                       \
-        DPCTLSyclEventRef event_ref = __name__<_DataType>(                     \
-            q_ref, result_out, result_size, result_ndim, result_shape,         \
-            result_strides, input1_in, input1_size, input1_ndim, input1_shape, \
-            input1_strides, input2_in, input2_size, input2_ndim, input2_shape, \
-            input2_strides, where, dep_event_vec_ref);                         \
-        DPCTLEvent_WaitAndThrow(event_ref);                                    \
-        DPCTLEvent_Delete(event_ref);                                          \
-    }                                                                          \
-                                                                               \
-    template <typename _DataType>                                              \
-    void (*__name__##_default)(                                                \
-        void *, const size_t, const size_t, const shape_elem_type *,           \
-        const shape_elem_type *, const void *, const size_t, const size_t,     \
-        const shape_elem_type *, const shape_elem_type *, const void *,        \
-        const size_t, const size_t, const shape_elem_type *,                   \
-        const shape_elem_type *, const size_t *) = __name__<_DataType>;
-
-#include <dpnp_gen_2arg_1type_tbl.hpp>
-
-static void func_map_init_bitwise_2arg_1type(func_map_t &fmap)
-{
-    fmap[DPNPFuncName::DPNP_FN_BITWISE_AND][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_bitwise_and_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_BITWISE_AND][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_bitwise_and_c_default<int64_t>};
-
-    fmap[DPNPFuncName::DPNP_FN_BITWISE_OR][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_bitwise_or_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_BITWISE_OR][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_bitwise_or_c_default<int64_t>};
-
-    fmap[DPNPFuncName::DPNP_FN_BITWISE_XOR][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_bitwise_xor_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_BITWISE_XOR][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_bitwise_xor_c_default<int64_t>};
-
-    fmap[DPNPFuncName::DPNP_FN_LEFT_SHIFT][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_left_shift_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_LEFT_SHIFT][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_left_shift_c_default<int64_t>};
-
-    fmap[DPNPFuncName::DPNP_FN_RIGHT_SHIFT][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_right_shift_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_RIGHT_SHIFT][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_right_shift_c_default<int64_t>};
-
-    return;
-}
-
-void func_map_init_bitwise(func_map_t &fmap)
-{
-    func_map_init_bitwise_1arg_1type(fmap);
-    func_map_init_bitwise_2arg_1type(fmap);
-
-    return;
-}
diff --git a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
index 486851516dcf..20be65f53cab 100644
--- a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
@@ -230,77 +230,6 @@ using dpctl::tensor::kernels::alignment_utils::required_alignment;
 
 static void func_map_init_elemwise_1arg_2type(func_map_t &fmap)
 {
-    fmap[DPNPFuncName::DPNP_FN_ARCCOS][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_acos_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCCOS][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_acos_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCCOS][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_acos_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCCOS][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_acos_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_ARCCOSH][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_acosh_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCCOSH][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_acosh_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCCOSH][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_acosh_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCCOSH][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_acosh_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_ARCSIN][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_asin_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCSIN][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_asin_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCSIN][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_asin_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCSIN][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_asin_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_ARCSINH][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_asinh_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCSINH][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_asinh_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCSINH][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_asinh_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCSINH][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_asinh_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_atan_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_atan_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_atan_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_atan_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_ARCTANH][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_atanh_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTANH][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_atanh_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTANH][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_atanh_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTANH][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_atanh_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_CBRT][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_cbrt_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_CBRT][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_cbrt_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_CBRT][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_cbrt_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_CBRT][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cbrt_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_CEIL][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_ceil_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_CEIL][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_ceil_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_CEIL][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_ceil_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_CEIL][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_ceil_c_default<double, double>};
 
     fmap[DPNPFuncName::DPNP_FN_COPYTO][eft_BLN][eft_BLN] = {
         eft_BLN, (void *)dpnp_copyto_c_default<bool, bool>};
@@ -378,24 +307,6 @@ static void func_map_init_elemwise_1arg_2type(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_COPYTO_EXT][eft_FLT][eft_FLT] = {
         eft_FLT, (void *)dpnp_copyto_c_ext<float, float>};
 
-    fmap[DPNPFuncName::DPNP_FN_COS][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_cos_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_COS][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_cos_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_COS][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_cos_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_COS][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cos_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_COSH][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_cosh_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_COSH][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_cosh_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_COSH][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_cosh_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_COSH][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cosh_c_default<double, double>};
-
     fmap[DPNPFuncName::DPNP_FN_DEGREES][eft_INT][eft_INT] = {
         eft_DBL, (void *)dpnp_degrees_c_default<int32_t, double>};
     fmap[DPNPFuncName::DPNP_FN_DEGREES][eft_LNG][eft_LNG] = {
@@ -426,87 +337,6 @@ static void func_map_init_elemwise_1arg_2type(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_DEGREES_EXT][eft_DBL][eft_DBL] = {
         eft_DBL, (void *)dpnp_degrees_c_ext<double, double>};
 
-    fmap[DPNPFuncName::DPNP_FN_EXP2][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_exp2_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EXP2][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_exp2_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EXP2][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_exp2_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_EXP2][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_exp2_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_EXP][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_exp_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EXP][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_exp_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EXP][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_exp_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_EXP][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_exp_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_EXPM1][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_expm1_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EXPM1][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_expm1_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_EXPM1][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_expm1_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_EXPM1][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_expm1_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_FABS][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_fabs_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FABS][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_fabs_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FABS][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_fabs_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_FABS][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_fabs_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_FLOOR][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_floor_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_floor_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_floor_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_floor_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_LOG10][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_log10_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_LOG10][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_log10_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_LOG10][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_log10_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_LOG10][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_log10_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_LOG1P][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_log1p_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_LOG1P][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_log1p_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_LOG1P][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_log1p_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_LOG1P][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_log1p_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_LOG2][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_log2_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_LOG2][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_log2_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_LOG2][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_log2_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_LOG2][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_log2_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_LOG][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_log_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_LOG][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_log_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_LOG][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_log_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_LOG][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_log_c_default<double, double>};
-
     fmap[DPNPFuncName::DPNP_FN_RADIANS][eft_INT][eft_INT] = {
         eft_DBL, (void *)dpnp_radians_c_default<int32_t, double>};
     fmap[DPNPFuncName::DPNP_FN_RADIANS][eft_LNG][eft_LNG] = {
@@ -537,24 +367,6 @@ static void func_map_init_elemwise_1arg_2type(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_RADIANS_EXT][eft_DBL][eft_DBL] = {
         eft_DBL, (void *)dpnp_radians_c_ext<double, double>};
 
-    fmap[DPNPFuncName::DPNP_FN_SIN][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_sin_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_SIN][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_sin_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_SIN][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_sin_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_SIN][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_sin_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_SINH][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_sinh_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_SINH][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_sinh_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_SINH][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_sinh_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_SINH][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_sinh_c_default<double, double>};
-
     fmap[DPNPFuncName::DPNP_FN_SQRT][eft_INT][eft_INT] = {
         eft_DBL, (void *)dpnp_sqrt_c_default<int32_t, double>};
     fmap[DPNPFuncName::DPNP_FN_SQRT][eft_LNG][eft_LNG] = {
@@ -570,33 +382,6 @@ static void func_map_init_elemwise_1arg_2type(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_SQRT_EXT][eft_DBL][eft_DBL] = {
         eft_DBL, (void *)dpnp_sqrt_c_ext<double, double>};
 
-    fmap[DPNPFuncName::DPNP_FN_TAN][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_tan_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TAN][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_tan_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TAN][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_tan_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_TAN][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_tan_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_TANH][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_tanh_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TANH][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_tanh_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TANH][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_tanh_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_TANH][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_tanh_c_default<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_TRUNC][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_trunc_c_default<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRUNC][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_trunc_c_default<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRUNC][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_trunc_c_default<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_TRUNC][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trunc_c_default<double, double>};
-
     return;
 }
 
@@ -612,21 +397,6 @@ constexpr T dispatch_erf_op(T elem)
     }
 }
 
-template <typename T>
-constexpr T dispatch_sign_op(T elem)
-{
-    if constexpr (is_any_v<T, std::int32_t, std::int64_t>) {
-        if (elem > 0)
-            return T(1);
-        if (elem < 0)
-            return T(-1);
-        return elem; // elem is 0
-    }
-    else {
-        return sycl::sign(elem);
-    }
-}
-
 template <typename T>
 constexpr auto dispatch_fmod_op(T elem1, T elem2)
 {
@@ -837,45 +607,6 @@ constexpr auto dispatch_fmod_op(T elem1, T elem2)
 
 static void func_map_init_elemwise_1arg_1type(func_map_t &fmap)
 {
-    fmap[DPNPFuncName::DPNP_FN_CONJUGATE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_copy_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_CONJUGATE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_copy_c_default<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_CONJUGATE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_copy_c_default<float>};
-    fmap[DPNPFuncName::DPNP_FN_CONJUGATE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_copy_c_default<double>};
-    fmap[DPNPFuncName::DPNP_FN_CONJUGATE][eft_C128][eft_C128] = {
-        eft_C128, (void *)dpnp_conjugate_c_default<std::complex<double>>};
-
-    fmap[DPNPFuncName::DPNP_FN_COPY][eft_BLN][eft_BLN] = {
-        eft_BLN, (void *)dpnp_copy_c_default<bool>};
-    fmap[DPNPFuncName::DPNP_FN_COPY][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_copy_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPY][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_copy_c_default<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPY][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_copy_c_default<float>};
-    fmap[DPNPFuncName::DPNP_FN_COPY][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_copy_c_default<double>};
-    fmap[DPNPFuncName::DPNP_FN_COPY][eft_C128][eft_C128] = {
-        eft_C128, (void *)dpnp_copy_c_default<std::complex<double>>};
-
-    fmap[DPNPFuncName::DPNP_FN_COPY_EXT][eft_BLN][eft_BLN] = {
-        eft_BLN, (void *)dpnp_copy_c_ext<bool>};
-    fmap[DPNPFuncName::DPNP_FN_COPY_EXT][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_copy_c_ext<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPY_EXT][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_copy_c_ext<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPY_EXT][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_copy_c_ext<float>};
-    fmap[DPNPFuncName::DPNP_FN_COPY_EXT][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_copy_c_ext<double>};
-    fmap[DPNPFuncName::DPNP_FN_COPY_EXT][eft_C64][eft_C64] = {
-        eft_C64, (void *)dpnp_copy_c_ext<std::complex<float>>};
-    fmap[DPNPFuncName::DPNP_FN_COPY_EXT][eft_C128][eft_C128] = {
-        eft_C128, (void *)dpnp_copy_c_ext<std::complex<double>>};
-
     fmap[DPNPFuncName::DPNP_FN_ERF][eft_INT][eft_INT] = {
         eft_INT, (void *)dpnp_erf_c_default<int32_t>};
     fmap[DPNPFuncName::DPNP_FN_ERF][eft_LNG][eft_LNG] = {
@@ -894,55 +625,6 @@ static void func_map_init_elemwise_1arg_1type(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_ERF_EXT][eft_DBL][eft_DBL] = {
         eft_DBL, (void *)dpnp_erf_c_ext<double>};
 
-    fmap[DPNPFuncName::DPNP_FN_FLATTEN][eft_BLN][eft_BLN] = {
-        eft_BLN, (void *)dpnp_copy_c_default<bool>};
-    fmap[DPNPFuncName::DPNP_FN_FLATTEN][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_copy_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLATTEN][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_copy_c_default<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLATTEN][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_copy_c_default<float>};
-    fmap[DPNPFuncName::DPNP_FN_FLATTEN][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_copy_c_default<double>};
-    fmap[DPNPFuncName::DPNP_FN_FLATTEN][eft_C128][eft_C128] = {
-        eft_C128, (void *)dpnp_copy_c_default<std::complex<double>>};
-
-    fmap[DPNPFuncName::DPNP_FN_NEGATIVE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_negative_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_NEGATIVE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_negative_c_default<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_NEGATIVE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_negative_c_default<float>};
-    fmap[DPNPFuncName::DPNP_FN_NEGATIVE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_negative_c_default<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_RECIP][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_recip_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_RECIP][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_recip_c_default<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_RECIP][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_recip_c_default<float>};
-    fmap[DPNPFuncName::DPNP_FN_RECIP][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_recip_c_default<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_SIGN][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_sign_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_SIGN][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_sign_c_default<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_SIGN][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_sign_c_default<float>};
-    fmap[DPNPFuncName::DPNP_FN_SIGN][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_sign_c_default<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_SQUARE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_square_c_default<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_SQUARE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_square_c_default<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_SQUARE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_square_c_default<float>};
-    fmap[DPNPFuncName::DPNP_FN_SQUARE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_square_c_default<double>};
-
     return;
 }
 
@@ -1344,47 +1026,6 @@ static void func_map_init_elemwise_1arg_1type(func_map_t &fmap)
 
 #include <dpnp_gen_2arg_3type_tbl.hpp>
 
-template <DPNPFuncType FT1,
-          DPNPFuncType FT2,
-          typename has_fp64 = std::true_type>
-static constexpr DPNPFuncType get_divide_res_type()
-{
-    constexpr auto widest_type = populate_func_types<FT1, FT2>();
-    constexpr auto shortes_type = (widest_type == FT1) ? FT2 : FT1;
-
-    if constexpr (widest_type == DPNPFuncType::DPNP_FT_CMPLX128 ||
-                  widest_type == DPNPFuncType::DPNP_FT_DOUBLE)
-    {
-        return widest_type;
-    }
-    else if constexpr (widest_type == DPNPFuncType::DPNP_FT_CMPLX64) {
-        if constexpr (shortes_type == DPNPFuncType::DPNP_FT_DOUBLE) {
-            return DPNPFuncType::DPNP_FT_CMPLX128;
-        }
-        else if constexpr (has_fp64::value &&
-                           (shortes_type == DPNPFuncType::DPNP_FT_INT ||
-                            shortes_type == DPNPFuncType::DPNP_FT_LONG))
-        {
-            return DPNPFuncType::DPNP_FT_CMPLX128;
-        }
-    }
-    else if constexpr (widest_type == DPNPFuncType::DPNP_FT_FLOAT) {
-        if constexpr (has_fp64::value &&
-                      (shortes_type == DPNPFuncType::DPNP_FT_INT ||
-                       shortes_type == DPNPFuncType::DPNP_FT_LONG))
-        {
-            return DPNPFuncType::DPNP_FT_DOUBLE;
-        }
-    }
-    else if constexpr (has_fp64::value) {
-        return DPNPFuncType::DPNP_FT_DOUBLE;
-    }
-    else {
-        return DPNPFuncType::DPNP_FT_FLOAT;
-    }
-    return widest_type;
-}
-
 template <DPNPFuncType FT1, DPNPFuncType... FTs>
 static void func_map_elemwise_2arg_3type_core(func_map_t &fmap)
 {
@@ -1445,270 +1086,7 @@ static void func_map_elemwise_2arg_3type_short_helper(func_map_t &fmap)
 
 static void func_map_init_elemwise_2arg_3type(func_map_t &fmap)
 {
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_add_c_default<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_add_c_default<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_add_c_default<int64_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_add_c_default<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_add_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_ADD][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_add_c_default<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_INT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_LNG][eft_INT] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_arctan2_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_ARCTAN2][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_arctan2_c_default<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_INT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_LNG][eft_INT] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_copysign_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_COPYSIGN][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_copysign_c_default<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_INT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_LNG][eft_INT] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_divide_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_DIVIDE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_divide_c_default<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_fmod_c_default<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_fmod_c_default<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_fmod_c_default<int64_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_fmod_c_default<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_fmod_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_FMOD][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_fmod_c_default<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_INT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_LNG][eft_INT] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_hypot_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_HYPOT][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_hypot_c_default<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_maximum_c_default<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_maximum_c_default<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_maximum_c_default<int64_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_maximum_c_default<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_maximum_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_MAXIMUM][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_maximum_c_default<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_minimum_c_default<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_minimum_c_default<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_minimum_c_default<int64_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_minimum_c_default<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_minimum_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_MINIMUM][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_minimum_c_default<double, double, double>};
-
+    // Used in dpnp_dot_c
     fmap[DPNPFuncName::DPNP_FN_MULTIPLY][eft_BLN][eft_BLN] = {
         eft_BLN, (void *)dpnp_multiply_c_default<bool, bool, bool>};
     fmap[DPNPFuncName::DPNP_FN_MULTIPLY][eft_BLN][eft_INT] = {
@@ -1811,72 +1189,6 @@ static void func_map_init_elemwise_2arg_3type(func_map_t &fmap)
         (void *)dpnp_multiply_c_default<
             std::complex<double>, std::complex<double>, std::complex<double>>};
 
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_power_c_default<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_power_c_default<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_power_c_default<int64_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_power_c_default<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_power_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_POWER][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_power_c_default<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_subtract_c_default<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_subtract_c_default<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_subtract_c_default<int64_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_subtract_c_default<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_subtract_c_default<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_SUBTRACT][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_subtract_c_default<double, double, double>};
-
     func_map_elemwise_2arg_3type_helper<eft_BLN, eft_INT, eft_LNG, eft_FLT,
                                         eft_DBL, eft_C64, eft_C128>(fmap);
 
diff --git a/dpnp/backend/kernels/dpnp_krnl_logic.cpp b/dpnp/backend/kernels/dpnp_krnl_logic.cpp
index 0174b47339a8..c8197d728898 100644
--- a/dpnp/backend/kernels/dpnp_krnl_logic.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_logic.cpp
@@ -405,278 +405,6 @@ DPCTLSyclEventRef (*dpnp_any_ext_c)(DPCTLSyclQueueRef,
                                     const DPCTLEventVectorRef) =
     dpnp_any_c<_DataType, _ResultType>;
 
-#define MACRO_2ARG_2TYPES_LOGIC_OP(__name__, __operation__)                    \
-    template <typename _KernelNameSpecialization1,                             \
-              typename _KernelNameSpecialization2>                             \
-    class __name__##_kernel;                                                   \
-                                                                               \
-    template <typename _KernelNameSpecialization1,                             \
-              typename _KernelNameSpecialization2>                             \
-    class __name__##_broadcast_kernel;                                         \
-                                                                               \
-    template <typename _KernelNameSpecialization1,                             \
-              typename _KernelNameSpecialization2>                             \
-    class __name__##_strides_kernel;                                           \
-                                                                               \
-    template <typename _DataType_input1, typename _DataType_input2>            \
-    DPCTLSyclEventRef __name__(                                                \
-        DPCTLSyclQueueRef q_ref, void *result_out, const size_t result_size,   \
-        const size_t result_ndim, const shape_elem_type *result_shape,         \
-        const shape_elem_type *result_strides, const void *input1_in,          \
-        const size_t input1_size, const size_t input1_ndim,                    \
-        const shape_elem_type *input1_shape,                                   \
-        const shape_elem_type *input1_strides, const void *input2_in,          \
-        const size_t input2_size, const size_t input2_ndim,                    \
-        const shape_elem_type *input2_shape,                                   \
-        const shape_elem_type *input2_strides, const size_t *where,            \
-        const DPCTLEventVectorRef dep_event_vec_ref)                           \
-    {                                                                          \
-        /* avoid warning unused variable*/                                     \
-        (void)where;                                                           \
-        (void)dep_event_vec_ref;                                               \
-                                                                               \
-        DPCTLSyclEventRef event_ref = nullptr;                                 \
-                                                                               \
-        if (!input1_size || !input2_size) {                                    \
-            return event_ref;                                                  \
-        }                                                                      \
-                                                                               \
-        sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));             \
-                                                                               \
-        _DataType_input1 *input1_data =                                        \
-            static_cast<_DataType_input1 *>(const_cast<void *>(input1_in));    \
-        _DataType_input2 *input2_data =                                        \
-            static_cast<_DataType_input2 *>(const_cast<void *>(input2_in));    \
-        bool *result = static_cast<bool *>(result_out);                        \
-                                                                               \
-        bool use_broadcasting = !array_equal(input1_shape, input1_ndim,        \
-                                             input2_shape, input2_ndim);       \
-                                                                               \
-        shape_elem_type *input1_shape_offsets =                                \
-            new shape_elem_type[input1_ndim];                                  \
-                                                                               \
-        get_shape_offsets_inkernel(input1_shape, input1_ndim,                  \
-                                   input1_shape_offsets);                      \
-        bool use_strides = !array_equal(input1_strides, input1_ndim,           \
-                                        input1_shape_offsets, input1_ndim);    \
-        delete[] input1_shape_offsets;                                         \
-                                                                               \
-        shape_elem_type *input2_shape_offsets =                                \
-            new shape_elem_type[input2_ndim];                                  \
-                                                                               \
-        get_shape_offsets_inkernel(input2_shape, input2_ndim,                  \
-                                   input2_shape_offsets);                      \
-        use_strides =                                                          \
-            use_strides || !array_equal(input2_strides, input2_ndim,           \
-                                        input2_shape_offsets, input2_ndim);    \
-        delete[] input2_shape_offsets;                                         \
-                                                                               \
-        sycl::event event;                                                     \
-        sycl::range<1> gws(result_size); /* used only when use_broadcasting or \
-                                            use_strides is true */             \
-                                                                               \
-        if (use_broadcasting) {                                                \
-            DPNPC_id<_DataType_input1> *input1_it;                             \
-            const size_t input1_it_size_in_bytes =                             \
-                sizeof(DPNPC_id<_DataType_input1>);                            \
-            input1_it = reinterpret_cast<DPNPC_id<_DataType_input1> *>(        \
-                dpnp_memory_alloc_c(q_ref, input1_it_size_in_bytes));          \
-            new (input1_it)                                                    \
-                DPNPC_id<_DataType_input1>(q_ref, input1_data, input1_shape,   \
-                                           input1_strides, input1_ndim);       \
-                                                                               \
-            input1_it->broadcast_to_shape(result_shape, result_ndim);          \
-                                                                               \
-            DPNPC_id<_DataType_input2> *input2_it;                             \
-            const size_t input2_it_size_in_bytes =                             \
-                sizeof(DPNPC_id<_DataType_input2>);                            \
-            input2_it = reinterpret_cast<DPNPC_id<_DataType_input2> *>(        \
-                dpnp_memory_alloc_c(q_ref, input2_it_size_in_bytes));          \
-            new (input2_it)                                                    \
-                DPNPC_id<_DataType_input2>(q_ref, input2_data, input2_shape,   \
-                                           input2_strides, input2_ndim);       \
-                                                                               \
-            input2_it->broadcast_to_shape(result_shape, result_ndim);          \
-                                                                               \
-            auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {       \
-                const size_t i = global_id[0]; /* for (size_t i = 0; i <       \
-                                                  result_size; ++i) */         \
-                {                                                              \
-                    const _DataType_input1 input1_elem = (*input1_it)[i];      \
-                    const _DataType_input2 input2_elem = (*input2_it)[i];      \
-                    result[i] = __operation__;                                 \
-                }                                                              \
-            };                                                                 \
-            auto kernel_func = [&](sycl::handler &cgh) {                       \
-                cgh.parallel_for<class __name__##_broadcast_kernel<            \
-                    _DataType_input1, _DataType_input2>>(                      \
-                    gws, kernel_parallel_for_func);                            \
-            };                                                                 \
-                                                                               \
-            q.submit(kernel_func).wait();                                      \
-                                                                               \
-            input1_it->~DPNPC_id();                                            \
-            input2_it->~DPNPC_id();                                            \
-                                                                               \
-            return event_ref;                                                  \
-        }                                                                      \
-        else if (use_strides) {                                                \
-            if ((result_ndim != input1_ndim) || (result_ndim != input2_ndim))  \
-            {                                                                  \
-                throw std::runtime_error(                                      \
-                    "Result ndim=" + std::to_string(result_ndim) +             \
-                    " mismatches with either input1 ndim=" +                   \
-                    std::to_string(input1_ndim) +                              \
-                    " or input2 ndim=" + std::to_string(input2_ndim));         \
-            }                                                                  \
-                                                                               \
-            /* memory transfer optimization, use USM-host for temporary speeds \
-             * up transfer to device */                                        \
-            using usm_host_allocatorT =                                        \
-                sycl::usm_allocator<shape_elem_type, sycl::usm::alloc::host>;  \
-                                                                               \
-            size_t strides_size = 3 * result_ndim;                             \
-            shape_elem_type *dev_strides_data =                                \
-                sycl::malloc_device<shape_elem_type>(strides_size, q);         \
-                                                                               \
-            /* create host temporary for packed strides managed by shared      \
-             * pointer */                                                      \
-            auto strides_host_packed =                                         \
-                std::vector<shape_elem_type, usm_host_allocatorT>(             \
-                    strides_size, usm_host_allocatorT(q));                     \
-                                                                               \
-            /* packed vector is concatenation of result_strides,               \
-             * input1_strides and input2_strides */                            \
-            std::copy(result_strides, result_strides + result_ndim,            \
-                      strides_host_packed.begin());                            \
-            std::copy(input1_strides, input1_strides + result_ndim,            \
-                      strides_host_packed.begin() + result_ndim);              \
-            std::copy(input2_strides, input2_strides + result_ndim,            \
-                      strides_host_packed.begin() + 2 * result_ndim);          \
-                                                                               \
-            auto copy_strides_ev = q.copy<shape_elem_type>(                    \
-                strides_host_packed.data(), dev_strides_data,                  \
-                strides_host_packed.size());                                   \
-                                                                               \
-            auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {       \
-                const size_t output_id =                                       \
-                    global_id[0]; /* for (size_t i = 0; i < result_size; ++i)  \
-                                   */                                          \
-                {                                                              \
-                    const shape_elem_type *result_strides_data =               \
-                        &dev_strides_data[0];                                  \
-                    const shape_elem_type *input1_strides_data =               \
-                        &dev_strides_data[result_ndim];                        \
-                    const shape_elem_type *input2_strides_data =               \
-                        &dev_strides_data[2 * result_ndim];                    \
-                                                                               \
-                    size_t input1_id = 0;                                      \
-                    size_t input2_id = 0;                                      \
-                                                                               \
-                    for (size_t i = 0; i < result_ndim; ++i) {                 \
-                        const size_t output_xyz_id =                           \
-                            get_xyz_id_by_id_inkernel(output_id,               \
-                                                      result_strides_data,     \
-                                                      result_ndim, i);         \
-                        input1_id += output_xyz_id * input1_strides_data[i];   \
-                        input2_id += output_xyz_id * input2_strides_data[i];   \
-                    }                                                          \
-                                                                               \
-                    const _DataType_input1 input1_elem =                       \
-                        input1_data[input1_id];                                \
-                    const _DataType_input2 input2_elem =                       \
-                        input2_data[input2_id];                                \
-                    result[output_id] = __operation__;                         \
-                }                                                              \
-            };                                                                 \
-            auto kernel_func = [&](sycl::handler &cgh) {                       \
-                cgh.depends_on(copy_strides_ev);                               \
-                cgh.parallel_for<class __name__##_strides_kernel<              \
-                    _DataType_input1, _DataType_input2>>(                      \
-                    gws, kernel_parallel_for_func);                            \
-            };                                                                 \
-                                                                               \
-            q.submit(kernel_func).wait();                                      \
-                                                                               \
-            sycl::free(dev_strides_data, q);                                   \
-            return event_ref;                                                  \
-        }                                                                      \
-        else {                                                                 \
-            constexpr size_t lws = 64;                                         \
-            constexpr unsigned int vec_sz = 8;                                 \
-                                                                               \
-            auto gws_range = sycl::range<1>(                                   \
-                ((result_size + lws * vec_sz - 1) / (lws * vec_sz)) * lws);    \
-            auto lws_range = sycl::range<1>(lws);                              \
-                                                                               \
-            auto kernel_parallel_for_func = [=](sycl::nd_item<1> nd_it) {      \
-                auto sg = nd_it.get_sub_group();                               \
-                const auto max_sg_size = sg.get_max_local_range()[0];          \
-                const size_t start =                                           \
-                    vec_sz * (nd_it.get_group(0) * nd_it.get_local_range(0) +  \
-                              sg.get_group_id()[0] * max_sg_size);             \
-                                                                               \
-                if (is_aligned<required_alignment>(input1_data) &&             \
-                    is_aligned<required_alignment>(input2_data) &&             \
-                    is_aligned<required_alignment>(result) &&                  \
-                    (start + static_cast<size_t>(vec_sz) * max_sg_size <       \
-                     result_size))                                             \
-                {                                                              \
-                    auto input1_multi_ptr = sycl::address_space_cast<          \
-                        sycl::access::address_space::global_space,             \
-                        sycl::access::decorated::yes>(&input1_data[start]);    \
-                    auto input2_multi_ptr = sycl::address_space_cast<          \
-                        sycl::access::address_space::global_space,             \
-                        sycl::access::decorated::yes>(&input2_data[start]);    \
-                    auto result_multi_ptr = sycl::address_space_cast<          \
-                        sycl::access::address_space::global_space,             \
-                        sycl::access::decorated::yes>(&result[start]);         \
-                                                                               \
-                    sycl::vec<_DataType_input1, vec_sz> x1 =                   \
-                        sg.load<vec_sz>(input1_multi_ptr);                     \
-                    sycl::vec<_DataType_input2, vec_sz> x2 =                   \
-                        sg.load<vec_sz>(input2_multi_ptr);                     \
-                    sycl::vec<bool, vec_sz> res_vec;                           \
-                                                                               \
-                    for (size_t k = 0; k < vec_sz; ++k) {                      \
-                        const _DataType_input1 input1_elem = x1[k];            \
-                        const _DataType_input2 input2_elem = x2[k];            \
-                        res_vec[k] = __operation__;                            \
-                    }                                                          \
-                    sg.store<vec_sz>(result_multi_ptr, res_vec);               \
-                }                                                              \
-                else {                                                         \
-                    for (size_t k = start; k < result_size; ++k) {             \
-                        const _DataType_input1 input1_elem = input1_data[k];   \
-                        const _DataType_input2 input2_elem = input2_data[k];   \
-                        result[k] = __operation__;                             \
-                    }                                                          \
-                }                                                              \
-            };                                                                 \
-                                                                               \
-            auto kernel_func = [&](sycl::handler &cgh) {                       \
-                cgh.parallel_for<class __name__##_kernel<_DataType_input1,     \
-                                                         _DataType_input2>>(   \
-                    sycl::nd_range<1>(gws_range, lws_range),                   \
-                    kernel_parallel_for_func);                                 \
-            };                                                                 \
-            event = q.submit(kernel_func);                                     \
-        }                                                                      \
-                                                                               \
-        event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);               \
-        return DPCTLEvent_Copy(event_ref);                                     \
-    }                                                                          \
-                                                                               \
-    template <typename _DataType_input1, typename _DataType_input2>            \
-    DPCTLSyclEventRef (*__name__##_ext)(                                       \
-        DPCTLSyclQueueRef, void *, const size_t, const size_t,                 \
-        const shape_elem_type *, const shape_elem_type *, const void *,        \
-        const size_t, const size_t, const shape_elem_type *,                   \
-        const shape_elem_type *, const void *, const size_t, const size_t,     \
-        const shape_elem_type *, const shape_elem_type *, const size_t *,      \
-        const DPCTLEventVectorRef) =                                           \
-        __name__<_DataType_input1, _DataType_input2>;
-
 void func_map_init_logic(func_map_t &fmap)
 {
     fmap[DPNPFuncName::DPNP_FN_ALL][eft_BLN][eft_BLN] = {
diff --git a/dpnp/backend/kernels/dpnp_krnl_mathematical.cpp b/dpnp/backend/kernels/dpnp_krnl_mathematical.cpp
index b485701154cc..44cd91854df4 100644
--- a/dpnp/backend/kernels/dpnp_krnl_mathematical.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_mathematical.cpp
@@ -44,379 +44,6 @@ using dpctl::tensor::kernels::alignment_utils::required_alignment;
 static_assert(__SYCL_COMPILER_VERSION >= __SYCL_COMPILER_VECTOR_ABS_CHANGED,
               "SYCL DPC++ compiler does not meet minimum version requirement");
 
-template <typename _KernelNameSpecialization>
-class dpnp_around_c_kernel;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_around_c(DPCTLSyclQueueRef q_ref,
-                                const void *input_in,
-                                void *result_out,
-                                const size_t input_size,
-                                const int decimals,
-                                const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    (void)decimals;
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!input_size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, input_in, input_size);
-    _DataType *input = input1_ptr.get_ptr();
-    _DataType *result = reinterpret_cast<_DataType *>(result_out);
-
-    if constexpr (std::is_same<_DataType, double>::value ||
-                  std::is_same<_DataType, float>::value)
-    {
-        event = oneapi::mkl::vm::rint(q, input_size, input, result);
-    }
-    else {
-        sycl::range<1> gws(input_size);
-        auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-            size_t i = global_id[0];
-            {
-                result[i] = std::rint(input[i]);
-            }
-        };
-
-        auto kernel_func = [&](sycl::handler &cgh) {
-            cgh.parallel_for<class dpnp_around_c_kernel<_DataType>>(
-                gws, kernel_parallel_for_func);
-        };
-
-        event = q.submit(kernel_func);
-    }
-
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_around_c(const void *input_in,
-                   void *result_out,
-                   const size_t input_size,
-                   const int decimals)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_around_c<_DataType>(
-        q_ref, input_in, result_out, input_size, decimals, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_around_default_c)(const void *, void *, const size_t, const int) =
-    dpnp_around_c<_DataType>;
-
-template <typename _KernelNameSpecialization1,
-          typename _KernelNameSpecialization2>
-class dpnp_elemwise_absolute_c_kernel;
-
-template <typename _DataType_input, typename _DataType_output>
-DPCTLSyclEventRef
-    dpnp_elemwise_absolute_c(DPCTLSyclQueueRef q_ref,
-                             const void *input1_in,
-                             void *result1,
-                             size_t size,
-                             const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    _DataType_input *array1 =
-        static_cast<_DataType_input *>(const_cast<void *>(input1_in));
-    _DataType_output *result = static_cast<_DataType_output *>(result1);
-
-    if constexpr (is_any_v<_DataType_input, float, double, std::complex<float>,
-                           std::complex<double>>)
-    {
-        event = oneapi::mkl::vm::abs(q, size, array1, result);
-    }
-    else {
-        static_assert(
-            is_any_v<_DataType_input, int32_t, int64_t>,
-            "Integer types are only expected to pass in 'abs' kernel");
-        static_assert(std::is_same_v<_DataType_input, _DataType_output>,
-                      "Result type must match a type of input data");
-
-        constexpr size_t lws = 64;
-        constexpr unsigned int vec_sz = 8;
-
-        auto gws_range =
-            sycl::range<1>(((size + lws * vec_sz - 1) / (lws * vec_sz)) * lws);
-        auto lws_range = sycl::range<1>(lws);
-
-        auto kernel_parallel_for_func = [=](sycl::nd_item<1> nd_it) {
-            auto sg = nd_it.get_sub_group();
-            const auto max_sg_size = sg.get_max_local_range()[0];
-            const size_t start =
-                vec_sz * (nd_it.get_group(0) * nd_it.get_local_range(0) +
-                          sg.get_group_id()[0] * max_sg_size);
-
-            if (is_aligned<required_alignment>(array1) &&
-                is_aligned<required_alignment>(result) &&
-                (start + static_cast<size_t>(vec_sz) * max_sg_size < size))
-            {
-                auto array_multi_ptr = sycl::address_space_cast<
-                    sycl::access::address_space::global_space,
-                    sycl::access::decorated::yes>(&array1[start]);
-                auto result_multi_ptr = sycl::address_space_cast<
-                    sycl::access::address_space::global_space,
-                    sycl::access::decorated::yes>(&result[start]);
-
-                sycl::vec<_DataType_input, vec_sz> data_vec =
-                    sg.load<vec_sz>(array_multi_ptr);
-
-                sycl::vec<_DataType_output, vec_sz> res_vec =
-                    sycl::abs(data_vec);
-
-                sg.store<vec_sz>(result_multi_ptr, res_vec);
-            }
-            else {
-                for (size_t k = start + sg.get_local_id()[0]; k < size;
-                     k += max_sg_size) {
-                    result[k] = std::abs(array1[k]);
-                }
-            }
-        };
-
-        auto kernel_func = [&](sycl::handler &cgh) {
-            cgh.parallel_for<class dpnp_elemwise_absolute_c_kernel<
-                _DataType_input, _DataType_output>>(
-                sycl::nd_range<1>(gws_range, lws_range),
-                kernel_parallel_for_func);
-        };
-        event = q.submit(kernel_func);
-    }
-
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_elemwise_absolute_c(const void *input1_in, void *result1, size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_elemwise_absolute_c<_DataType, _DataType>(
-            q_ref, input1_in, result1, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_elemwise_absolute_default_c)(const void *, void *, size_t) =
-    dpnp_elemwise_absolute_c<_DataType>;
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-DPCTLSyclEventRef dpnp_cross_c(DPCTLSyclQueueRef q_ref,
-                               void *result_out,
-                               const void *input1_in,
-                               const size_t input1_size,
-                               const shape_elem_type *input1_shape,
-                               const size_t input1_shape_ndim,
-                               const void *input2_in,
-                               const size_t input2_size,
-                               const shape_elem_type *input2_shape,
-                               const size_t input2_shape_ndim,
-                               const size_t *where,
-                               const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    (void)input1_size; // avoid warning unused variable
-    (void)input1_shape;
-    (void)input1_shape_ndim;
-    (void)input2_size;
-    (void)input2_shape;
-    (void)input2_shape_ndim;
-    (void)where;
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType_input1> input1_ptr(q_ref, input1_in,
-                                                   input1_size, true);
-    DPNPC_ptr_adapter<_DataType_input2> input2_ptr(q_ref, input2_in,
-                                                   input2_size, true);
-    DPNPC_ptr_adapter<_DataType_output> result_ptr(q_ref, result_out,
-                                                   input1_size, true, true);
-    const _DataType_input1 *input1 = input1_ptr.get_ptr();
-    const _DataType_input2 *input2 = input2_ptr.get_ptr();
-    _DataType_output *result = result_ptr.get_ptr();
-
-    result[0] = input1[1] * input2[2] - input1[2] * input2[1];
-
-    result[1] = input1[2] * input2[0] - input1[0] * input2[2];
-
-    result[2] = input1[0] * input2[1] - input1[1] * input2[0];
-
-    return event_ref;
-}
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-void dpnp_cross_c(void *result_out,
-                  const void *input1_in,
-                  const size_t input1_size,
-                  const shape_elem_type *input1_shape,
-                  const size_t input1_shape_ndim,
-                  const void *input2_in,
-                  const size_t input2_size,
-                  const shape_elem_type *input2_shape,
-                  const size_t input2_shape_ndim,
-                  const size_t *where)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_cross_c<_DataType_output, _DataType_input1, _DataType_input2>(
-            q_ref, result_out, input1_in, input1_size, input1_shape,
-            input1_shape_ndim, input2_in, input2_size, input2_shape,
-            input2_shape_ndim, where, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-void (*dpnp_cross_default_c)(void *,
-                             const void *,
-                             const size_t,
-                             const shape_elem_type *,
-                             const size_t,
-                             const void *,
-                             const size_t,
-                             const shape_elem_type *,
-                             const size_t,
-                             const size_t *) =
-    dpnp_cross_c<_DataType_output, _DataType_input1, _DataType_input2>;
-
-template <typename _KernelNameSpecialization1,
-          typename _KernelNameSpecialization2>
-class dpnp_cumprod_c_kernel;
-
-template <typename _DataType_input, typename _DataType_output>
-DPCTLSyclEventRef dpnp_cumprod_c(DPCTLSyclQueueRef q_ref,
-                                 void *array1_in,
-                                 void *result1,
-                                 size_t size,
-                                 const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType_input> input1_ptr(q_ref, array1_in, size, true);
-    DPNPC_ptr_adapter<_DataType_output> result_ptr(q_ref, result1, size, true,
-                                                   true);
-    _DataType_input *array1 = input1_ptr.get_ptr();
-    _DataType_output *result = result_ptr.get_ptr();
-
-    _DataType_output cur_res = 1;
-
-    for (size_t i = 0; i < size; ++i) {
-        cur_res *= array1[i];
-        result[i] = cur_res;
-    }
-
-    return event_ref;
-}
-
-template <typename _DataType_input, typename _DataType_output>
-void dpnp_cumprod_c(void *array1_in, void *result1, size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_cumprod_c<_DataType_input, _DataType_output>(
-            q_ref, array1_in, result1, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType_input, typename _DataType_output>
-void (*dpnp_cumprod_default_c)(void *, void *, size_t) =
-    dpnp_cumprod_c<_DataType_input, _DataType_output>;
-
-template <typename _KernelNameSpecialization1,
-          typename _KernelNameSpecialization2>
-class dpnp_cumsum_c_kernel;
-
-template <typename _DataType_input, typename _DataType_output>
-DPCTLSyclEventRef dpnp_cumsum_c(DPCTLSyclQueueRef q_ref,
-                                void *array1_in,
-                                void *result1,
-                                size_t size,
-                                const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType_input> input1_ptr(q_ref, array1_in, size, true);
-    DPNPC_ptr_adapter<_DataType_output> result_ptr(q_ref, result1, size, true,
-                                                   true);
-    _DataType_input *array1 = input1_ptr.get_ptr();
-    _DataType_output *result = result_ptr.get_ptr();
-
-    _DataType_output cur_res = 0;
-
-    for (size_t i = 0; i < size; ++i) {
-        cur_res += array1[i];
-        result[i] = cur_res;
-    }
-
-    return event_ref;
-}
-
-template <typename _DataType_input, typename _DataType_output>
-void dpnp_cumsum_c(void *array1_in, void *result1, size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_cumsum_c<_DataType_input, _DataType_output>(
-            q_ref, array1_in, result1, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType_input, typename _DataType_output>
-void (*dpnp_cumsum_default_c)(void *, void *, size_t) =
-    dpnp_cumsum_c<_DataType_input, _DataType_output>;
-
 template <typename _KernelNameSpecialization1,
           typename _KernelNameSpecialization2>
 class dpnp_ediff1d_c_kernel;
@@ -541,176 +168,6 @@ DPCTLSyclEventRef (*dpnp_ediff1d_ext_c)(DPCTLSyclQueueRef,
                                         const DPCTLEventVectorRef) =
     dpnp_ediff1d_c<_DataType_input, _DataType_output>;
 
-template <typename _KernelNameSpecialization1,
-          typename _KernelNameSpecialization2,
-          typename _KernelNameSpecialization3>
-class dpnp_floor_divide_c_kernel;
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-DPCTLSyclEventRef
-    dpnp_floor_divide_c(DPCTLSyclQueueRef q_ref,
-                        void *result_out,
-                        const void *input1_in,
-                        const size_t input1_size,
-                        const shape_elem_type *input1_shape,
-                        const size_t input1_shape_ndim,
-                        const void *input2_in,
-                        const size_t input2_size,
-                        const shape_elem_type *input2_shape,
-                        const size_t input2_shape_ndim,
-                        const size_t *where,
-                        const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)where;
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!input1_size || !input2_size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType_input1> input1_ptr(q_ref, input1_in,
-                                                   input1_size);
-    DPNPC_ptr_adapter<_DataType_input2> input2_ptr(q_ref, input2_in,
-                                                   input2_size);
-    _DataType_input1 *input1_data = input1_ptr.get_ptr();
-    _DataType_input2 *input2_data = input2_ptr.get_ptr();
-    _DataType_output *result = reinterpret_cast<_DataType_output *>(result_out);
-
-    std::vector<shape_elem_type> result_shape = get_result_shape(
-        input1_shape, input1_shape_ndim, input2_shape, input2_shape_ndim);
-
-    DPNPC_id<_DataType_input1> *input1_it;
-    const size_t input1_it_size_in_bytes = sizeof(DPNPC_id<_DataType_input1>);
-    input1_it = reinterpret_cast<DPNPC_id<_DataType_input1> *>(
-        dpnp_memory_alloc_c(q_ref, input1_it_size_in_bytes));
-    new (input1_it) DPNPC_id<_DataType_input1>(q_ref, input1_data, input1_shape,
-                                               input1_shape_ndim);
-
-    input1_it->broadcast_to_shape(result_shape);
-
-    DPNPC_id<_DataType_input2> *input2_it;
-    const size_t input2_it_size_in_bytes = sizeof(DPNPC_id<_DataType_input2>);
-    input2_it = reinterpret_cast<DPNPC_id<_DataType_input2> *>(
-        dpnp_memory_alloc_c(q_ref, input2_it_size_in_bytes));
-    new (input2_it) DPNPC_id<_DataType_input2>(q_ref, input2_data, input2_shape,
-                                               input2_shape_ndim);
-
-    input2_it->broadcast_to_shape(result_shape);
-
-    const size_t result_size = input1_it->get_output_size();
-
-    sycl::range<1> gws(result_size);
-    auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-        const size_t i =
-            global_id[0]; /* for (size_t i = 0; i < result_size; ++i) */
-        const _DataType_output input1_elem = (*input1_it)[i];
-        const _DataType_output input2_elem = (*input2_it)[i];
-
-        double div = (double)input1_elem / (double)input2_elem;
-        result[i] = static_cast<_DataType_output>(sycl::floor(div));
-    };
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_floor_divide_c_kernel<
-            _DataType_output, _DataType_input1, _DataType_input2>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    sycl::event event;
-
-    if (input1_size == input2_size) {
-        if constexpr ((std::is_same<_DataType_input1, double>::value ||
-                       std::is_same<_DataType_input1, float>::value) &&
-                      std::is_same<_DataType_input2, _DataType_input1>::value)
-        {
-            event = oneapi::mkl::vm::div(q, input1_size, input1_data,
-                                         input2_data, result);
-            event.wait();
-            event = oneapi::mkl::vm::floor(q, input1_size, result, result);
-        }
-        else {
-            event = q.submit(kernel_func);
-        }
-    }
-    else {
-        event = q.submit(kernel_func);
-    }
-
-    event.wait();
-
-    input1_it->~DPNPC_id();
-    input2_it->~DPNPC_id();
-
-    sycl::free(input1_it, q);
-    sycl::free(input2_it, q);
-
-    return event_ref;
-}
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-void dpnp_floor_divide_c(void *result_out,
-                         const void *input1_in,
-                         const size_t input1_size,
-                         const shape_elem_type *input1_shape,
-                         const size_t input1_shape_ndim,
-                         const void *input2_in,
-                         const size_t input2_size,
-                         const shape_elem_type *input2_shape,
-                         const size_t input2_shape_ndim,
-                         const size_t *where)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_floor_divide_c<_DataType_output, _DataType_input1,
-                            _DataType_input2>(
-            q_ref, result_out, input1_in, input1_size, input1_shape,
-            input1_shape_ndim, input2_in, input2_size, input2_shape,
-            input2_shape_ndim, where, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-void (*dpnp_floor_divide_default_c)(void *,
-                                    const void *,
-                                    const size_t,
-                                    const shape_elem_type *,
-                                    const size_t,
-                                    const void *,
-                                    const size_t,
-                                    const shape_elem_type *,
-                                    const size_t,
-                                    const size_t *) =
-    dpnp_floor_divide_c<_DataType_output, _DataType_input1, _DataType_input2>;
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-DPCTLSyclEventRef (*dpnp_floor_divide_ext_c)(DPCTLSyclQueueRef,
-                                             void *,
-                                             const void *,
-                                             const size_t,
-                                             const shape_elem_type *,
-                                             const size_t,
-                                             const void *,
-                                             const size_t,
-                                             const shape_elem_type *,
-                                             const size_t,
-                                             const size_t *,
-                                             const DPCTLEventVectorRef) =
-    dpnp_floor_divide_c<_DataType_output, _DataType_input1, _DataType_input2>;
-
 template <typename _KernelNameSpecialization1,
           typename _KernelNameSpecialization2>
 class dpnp_modf_c_kernel;
@@ -796,363 +253,8 @@ DPCTLSyclEventRef (*dpnp_modf_ext_c)(DPCTLSyclQueueRef,
                                      const DPCTLEventVectorRef) =
     dpnp_modf_c<_DataType_input, _DataType_output>;
 
-template <typename _KernelNameSpecialization1,
-          typename _KernelNameSpecialization2,
-          typename _KernelNameSpecialization3>
-class dpnp_remainder_c_kernel;
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-DPCTLSyclEventRef dpnp_remainder_c(DPCTLSyclQueueRef q_ref,
-                                   void *result_out,
-                                   const void *input1_in,
-                                   const size_t input1_size,
-                                   const shape_elem_type *input1_shape,
-                                   const size_t input1_shape_ndim,
-                                   const void *input2_in,
-                                   const size_t input2_size,
-                                   const shape_elem_type *input2_shape,
-                                   const size_t input2_shape_ndim,
-                                   const size_t *where,
-                                   const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)where;
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!input1_size || !input2_size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType_input1> input1_ptr(q_ref, input1_in,
-                                                   input1_size);
-    DPNPC_ptr_adapter<_DataType_input2> input2_ptr(q_ref, input2_in,
-                                                   input2_size);
-    _DataType_input1 *input1_data = input1_ptr.get_ptr();
-    _DataType_input2 *input2_data = input2_ptr.get_ptr();
-    _DataType_output *result = reinterpret_cast<_DataType_output *>(result_out);
-
-    std::vector<shape_elem_type> result_shape = get_result_shape(
-        input1_shape, input1_shape_ndim, input2_shape, input2_shape_ndim);
-
-    DPNPC_id<_DataType_input1> *input1_it;
-    const size_t input1_it_size_in_bytes = sizeof(DPNPC_id<_DataType_input1>);
-    input1_it = reinterpret_cast<DPNPC_id<_DataType_input1> *>(
-        dpnp_memory_alloc_c(q_ref, input1_it_size_in_bytes));
-    new (input1_it) DPNPC_id<_DataType_input1>(q_ref, input1_data, input1_shape,
-                                               input1_shape_ndim);
-
-    input1_it->broadcast_to_shape(result_shape);
-
-    DPNPC_id<_DataType_input2> *input2_it;
-    const size_t input2_it_size_in_bytes = sizeof(DPNPC_id<_DataType_input2>);
-    input2_it = reinterpret_cast<DPNPC_id<_DataType_input2> *>(
-        dpnp_memory_alloc_c(q_ref, input2_it_size_in_bytes));
-    new (input2_it) DPNPC_id<_DataType_input2>(q_ref, input2_data, input2_shape,
-                                               input2_shape_ndim);
-
-    input2_it->broadcast_to_shape(result_shape);
-
-    const size_t result_size = input1_it->get_output_size();
-
-    sycl::range<1> gws(result_size);
-    auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-        const size_t i = global_id[0];
-        const _DataType_output input1_elem = (*input1_it)[i];
-        const _DataType_output input2_elem = (*input2_it)[i];
-        double fmod_res = sycl::fmod((double)input1_elem, (double)input2_elem);
-        double add = fmod_res + input2_elem;
-        result[i] = sycl::fmod(add, (double)input2_elem);
-    };
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_remainder_c_kernel<
-            _DataType_output, _DataType_input1, _DataType_input2>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    sycl::event event;
-
-    if (input1_size == input2_size) {
-        if constexpr ((std::is_same<_DataType_input1, double>::value ||
-                       std::is_same<_DataType_input1, float>::value) &&
-                      std::is_same<_DataType_input2, _DataType_input1>::value)
-        {
-            event = oneapi::mkl::vm::fmod(q, input1_size, input1_data,
-                                          input2_data, result);
-            event.wait();
-            event = oneapi::mkl::vm::add(q, input1_size, result, input2_data,
-                                         result);
-            event.wait();
-            event = oneapi::mkl::vm::fmod(q, input1_size, result, input2_data,
-                                          result);
-        }
-        else {
-            event = q.submit(kernel_func);
-        }
-    }
-    else {
-        event = q.submit(kernel_func);
-    }
-
-    event.wait();
-
-    input1_it->~DPNPC_id();
-    input2_it->~DPNPC_id();
-
-    return event_ref;
-}
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-void dpnp_remainder_c(void *result_out,
-                      const void *input1_in,
-                      const size_t input1_size,
-                      const shape_elem_type *input1_shape,
-                      const size_t input1_shape_ndim,
-                      const void *input2_in,
-                      const size_t input2_size,
-                      const shape_elem_type *input2_shape,
-                      const size_t input2_shape_ndim,
-                      const size_t *where)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_remainder_c<_DataType_output, _DataType_input1, _DataType_input2>(
-            q_ref, result_out, input1_in, input1_size, input1_shape,
-            input1_shape_ndim, input2_in, input2_size, input2_shape,
-            input2_shape_ndim, where, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType_output,
-          typename _DataType_input1,
-          typename _DataType_input2>
-void (*dpnp_remainder_default_c)(void *,
-                                 const void *,
-                                 const size_t,
-                                 const shape_elem_type *,
-                                 const size_t,
-                                 const void *,
-                                 const size_t,
-                                 const shape_elem_type *,
-                                 const size_t,
-                                 const size_t *) =
-    dpnp_remainder_c<_DataType_output, _DataType_input1, _DataType_input2>;
-
-template <typename _KernelNameSpecialization1,
-          typename _KernelNameSpecialization2,
-          typename _KernelNameSpecialization3>
-class dpnp_trapz_c_kernel;
-
-template <typename _DataType_input1,
-          typename _DataType_input2,
-          typename _DataType_output>
-DPCTLSyclEventRef dpnp_trapz_c(DPCTLSyclQueueRef q_ref,
-                               const void *array1_in,
-                               const void *array2_in,
-                               void *result1,
-                               double dx,
-                               size_t array1_size,
-                               size_t array2_size,
-                               const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if ((array1_in == nullptr) || (array2_in == nullptr && array2_size > 1)) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_DataType_input1> input1_ptr(q_ref, array1_in,
-                                                   array1_size);
-    DPNPC_ptr_adapter<_DataType_input2> input2_ptr(q_ref, array2_in,
-                                                   array2_size);
-    _DataType_input1 *array1 = input1_ptr.get_ptr();
-    _DataType_input2 *array2 = input2_ptr.get_ptr();
-    _DataType_output *result = reinterpret_cast<_DataType_output *>(result1);
-
-    if (array1_size < 2) {
-        const _DataType_output init_val = 0;
-        q.memcpy(result, &init_val, sizeof(_DataType_output))
-            .wait(); // result[0] = 0;
-
-        return event_ref;
-    }
-
-    if (array1_size == array2_size) {
-        size_t cur_res_size = array1_size - 2;
-
-        _DataType_output *cur_res = reinterpret_cast<_DataType_output *>(
-            sycl::malloc_shared((cur_res_size) * sizeof(_DataType_output), q));
-
-        sycl::range<1> gws(cur_res_size);
-        auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-            size_t i = global_id[0];
-            {
-                cur_res[i] = array1[i + 1] * (array2[i + 2] - array2[i]);
-            }
-        };
-
-        auto kernel_func = [&](sycl::handler &cgh) {
-            cgh.parallel_for<class dpnp_trapz_c_kernel<
-                _DataType_input1, _DataType_input2, _DataType_output>>(
-                gws, kernel_parallel_for_func);
-        };
-
-        event = q.submit(kernel_func);
-
-        event.wait();
-
-        shape_elem_type _shape = cur_res_size;
-        dpnp_sum_c<_DataType_output, _DataType_output>(result, cur_res, &_shape,
-                                                       1, NULL, 0, NULL, NULL);
-
-        sycl::free(cur_res, q);
-
-        result[0] += array1[0] * (array2[1] - array2[0]) +
-                     array1[array1_size - 1] *
-                         (array2[array2_size - 1] - array2[array2_size - 2]);
-
-        result[0] *= 0.5;
-    }
-    else {
-        shape_elem_type _shape = array1_size;
-        dpnp_sum_c<_DataType_output, _DataType_input1>(result, array1, &_shape,
-                                                       1, NULL, 0, NULL, NULL);
-
-        result[0] -= (array1[0] + array1[array1_size - 1]) * 0.5;
-        result[0] *= dx;
-    }
-    return event_ref;
-}
-
-template <typename _DataType_input1,
-          typename _DataType_input2,
-          typename _DataType_output>
-void dpnp_trapz_c(const void *array1_in,
-                  const void *array2_in,
-                  void *result1,
-                  double dx,
-                  size_t array1_size,
-                  size_t array2_size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_trapz_c<_DataType_input1, _DataType_input2, _DataType_output>(
-            q_ref, array1_in, array2_in, result1, dx, array1_size, array2_size,
-            dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType_input1,
-          typename _DataType_input2,
-          typename _DataType_output>
-void (*dpnp_trapz_default_c)(const void *,
-                             const void *,
-                             void *,
-                             double,
-                             size_t,
-                             size_t) =
-    dpnp_trapz_c<_DataType_input1, _DataType_input2, _DataType_output>;
-
-template <typename _DataType_input1,
-          typename _DataType_input2,
-          typename _DataType_output>
-DPCTLSyclEventRef (*dpnp_trapz_ext_c)(DPCTLSyclQueueRef,
-                                      const void *,
-                                      const void *,
-                                      void *,
-                                      double,
-                                      size_t,
-                                      size_t,
-                                      const DPCTLEventVectorRef) =
-    dpnp_trapz_c<_DataType_input1, _DataType_input2, _DataType_output>;
-
 void func_map_init_mathematical(func_map_t &fmap)
 {
-    fmap[DPNPFuncName::DPNP_FN_ABSOLUTE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_elemwise_absolute_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ABSOLUTE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_elemwise_absolute_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ABSOLUTE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_elemwise_absolute_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_ABSOLUTE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_elemwise_absolute_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_AROUND][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_around_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_AROUND][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_around_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_AROUND][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_around_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_AROUND][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_around_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_cross_default_c<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_cross_default_c<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_cross_default_c<int64_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_cross_default_c<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_cross_default_c<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_CROSS][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cross_default_c<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_CUMPROD][eft_INT][eft_INT] = {
-        eft_LNG, (void *)dpnp_cumprod_default_c<int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_CUMPROD][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_cumprod_default_c<int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_CUMPROD][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_cumprod_default_c<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_CUMPROD][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cumprod_default_c<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_CUMSUM][eft_INT][eft_INT] = {
-        eft_LNG, (void *)dpnp_cumsum_default_c<int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_CUMSUM][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_cumsum_default_c<int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_CUMSUM][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_cumsum_default_c<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_CUMSUM][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_cumsum_default_c<double, double>};
 
     fmap[DPNPFuncName::DPNP_FN_EDIFF1D][eft_INT][eft_INT] = {
         eft_LNG, (void *)dpnp_ediff1d_default_c<int32_t, int64_t>};
@@ -1172,43 +274,6 @@ void func_map_init_mathematical(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_EDIFF1D_EXT][eft_DBL][eft_DBL] = {
         eft_DBL, (void *)dpnp_ediff1d_ext_c<double, double>};
 
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_INT][eft_INT] = {
-        eft_INT,
-        (void *)dpnp_floor_divide_default_c<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_INT][eft_LNG] = {
-        eft_LNG,
-        (void *)dpnp_floor_divide_default_c<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_LNG][eft_INT] = {
-        eft_LNG,
-        (void *)dpnp_floor_divide_default_c<int64_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_LNG][eft_LNG] = {
-        eft_LNG,
-        (void *)dpnp_floor_divide_default_c<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_floor_divide_default_c<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_FLOOR_DIVIDE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_floor_divide_default_c<double, double, double>};
-
     fmap[DPNPFuncName::DPNP_FN_MODF][eft_INT][eft_INT] = {
         eft_DBL, (void *)dpnp_modf_default_c<int32_t, double>};
     fmap[DPNPFuncName::DPNP_FN_MODF][eft_LNG][eft_LNG] = {
@@ -1227,104 +292,5 @@ void func_map_init_mathematical(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_MODF_EXT][eft_DBL][eft_DBL] = {
         eft_DBL, (void *)dpnp_modf_ext_c<double, double>};
 
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_remainder_default_c<int32_t, int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_remainder_default_c<int64_t, int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_remainder_default_c<int64_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_remainder_default_c<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_remainder_default_c<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, double, float>};
-    fmap[DPNPFuncName::DPNP_FN_REMAINDER][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_remainder_default_c<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<int32_t, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_INT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<int32_t, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<int32_t, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<int32_t, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_LNG][eft_INT] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<int64_t, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<int64_t, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<int64_t, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<int64_t, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<float, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<float, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_trapz_default_c<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<float, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trapz_default_c<double, double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_INT][eft_INT] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<int32_t, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_INT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<int32_t, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_INT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<int32_t, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<int32_t, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_LNG][eft_INT] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<int64_t, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_LNG][eft_LNG] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<int64_t, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_LNG][eft_FLT] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<int64_t, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<int64_t, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_FLT][eft_INT] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<float, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_FLT][eft_LNG] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<float, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_trapz_ext_c<float, float, float>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<float, double, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<double, int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<double, int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_DBL][eft_FLT] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<double, float, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRAPZ_EXT][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trapz_ext_c<double, double, double>};
-
     return;
 }
diff --git a/dpnp/backend/src/dpnp_fptr.hpp b/dpnp/backend/src/dpnp_fptr.hpp
index 20fc5305e9a2..2a9c42eb1720 100644
--- a/dpnp/backend/src/dpnp_fptr.hpp
+++ b/dpnp/backend/src/dpnp_fptr.hpp
@@ -326,7 +326,6 @@ static constexpr DPNPFuncType get_floating_res_type()
  * FPTR interface initialization functions
  */
 void func_map_init_arraycreation(func_map_t &fmap);
-void func_map_init_bitwise(func_map_t &fmap);
 void func_map_init_elemwise(func_map_t &fmap);
 void func_map_init_fft_func(func_map_t &fmap);
 void func_map_init_indexing_func(func_map_t &fmap);
diff --git a/dpnp/backend/src/dpnp_iface_fptr.cpp b/dpnp/backend/src/dpnp_iface_fptr.cpp
index 460896bfa2dd..f8214212728d 100644
--- a/dpnp/backend/src/dpnp_iface_fptr.cpp
+++ b/dpnp/backend/src/dpnp_iface_fptr.cpp
@@ -167,7 +167,6 @@ static func_map_t func_map_init()
     func_map_t fmap;
 
     func_map_init_arraycreation(fmap);
-    func_map_init_bitwise(fmap);
     func_map_init_elemwise(fmap);
     func_map_init_fft_func(fmap);
     func_map_init_indexing_func(fmap);
diff --git a/dpnp/dpnp_algo/CMakeLists.txt b/dpnp/dpnp_algo/CMakeLists.txt
index 2c3a49c6be4b..1aea452c5d93 100644
--- a/dpnp/dpnp_algo/CMakeLists.txt
+++ b/dpnp/dpnp_algo/CMakeLists.txt
@@ -3,7 +3,6 @@ set(dpnp_algo_pyx_deps
   ${CMAKE_CURRENT_SOURCE_DIR}/dpnp_algo_statistics.pxi
   ${CMAKE_CURRENT_SOURCE_DIR}/dpnp_algo_trigonometric.pxi
   ${CMAKE_CURRENT_SOURCE_DIR}/dpnp_algo_sorting.pxi
-  ${CMAKE_CURRENT_SOURCE_DIR}/dpnp_algo_arraycreation.pxi
   ${CMAKE_CURRENT_SOURCE_DIR}/dpnp_algo_mathematical.pxi
   ${CMAKE_CURRENT_SOURCE_DIR}/dpnp_algo_indexing.pxi
   ${CMAKE_CURRENT_SOURCE_DIR}/dpnp_algo_logic.pxi
diff --git a/dpnp/dpnp_algo/dpnp_algo.pxd b/dpnp/dpnp_algo/dpnp_algo.pxd
index 4e91151697c0..37663bee8343 100644
--- a/dpnp/dpnp_algo/dpnp_algo.pxd
+++ b/dpnp/dpnp_algo/dpnp_algo.pxd
@@ -35,7 +35,6 @@ cdef extern from "dpnp_iface_fptr.hpp" namespace "DPNPFuncName":  # need this na
     cdef enum DPNPFuncName "DPNPFuncName":
         DPNP_FN_ALLCLOSE_EXT
         DPNP_FN_CHOOSE_EXT
-        DPNP_FN_COPY_EXT
         DPNP_FN_CORRELATE_EXT
         DPNP_FN_DEGREES_EXT
         DPNP_FN_EDIFF1D_EXT
@@ -172,11 +171,6 @@ cpdef dpnp_descriptor dpnp_isclose(dpnp_descriptor input1, dpnp_descriptor input
                                    double rtol=*, double atol=*, cpp_bool equal_nan=*)
 
 
-"""
-Array creation routines
-"""
-cpdef dpnp_descriptor dpnp_copy(dpnp_descriptor x1)
-
 """
 Mathematical functions
 """
diff --git a/dpnp/dpnp_algo/dpnp_algo.pyx b/dpnp/dpnp_algo/dpnp_algo.pyx
index c8d99c56912c..4c560d50e0b3 100644
--- a/dpnp/dpnp_algo/dpnp_algo.pyx
+++ b/dpnp/dpnp_algo/dpnp_algo.pyx
@@ -58,7 +58,6 @@ __all__ = [
 ]
 
 
-include "dpnp_algo_arraycreation.pxi"
 include "dpnp_algo_indexing.pxi"
 include "dpnp_algo_logic.pxi"
 include "dpnp_algo_mathematical.pxi"
diff --git a/dpnp/dpnp_algo/dpnp_algo_arraycreation.pxi b/dpnp/dpnp_algo/dpnp_algo_arraycreation.pxi
deleted file mode 100644
index bd86a4618484..000000000000
--- a/dpnp/dpnp_algo/dpnp_algo_arraycreation.pxi
+++ /dev/null
@@ -1,78 +0,0 @@
-# cython: language_level=3
-# cython: linetrace=True
-# -*- coding: utf-8 -*-
-# *****************************************************************************
-# Copyright (c) 2016-2024, Intel Corporation
-# All rights reserved.
-#
-# Redistribution and use in source and binary forms, with or without
-# modification, are permitted provided that the following conditions are met:
-# - Redistributions of source code must retain the above copyright notice,
-#   this list of conditions and the following disclaimer.
-# - Redistributions in binary form must reproduce the above copyright notice,
-#   this list of conditions and the following disclaimer in the documentation
-#   and/or other materials provided with the distribution.
-#
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-# THE POSSIBILITY OF SUCH DAMAGE.
-# *****************************************************************************
-
-"""Module Backend (array creation part)
-
-This module contains interface functions between C backend layer
-and the rest of the library
-
-"""
-
-# NO IMPORTs here. All imports must be placed into main "dpnp_algo.pyx" file
-
-__all__ += [
-    "dpnp_copy",
-]
-
-
-ctypedef c_dpctl.DPCTLSyclEventRef(*custom_1in_1out_func_ptr_t)(c_dpctl.DPCTLSyclQueueRef,
-                                                                void *,
-                                                                void * ,
-                                                                const int ,
-                                                                shape_elem_type * ,
-                                                                shape_elem_type * ,
-                                                                const size_t,
-                                                                const size_t,
-                                                                const c_dpctl.DPCTLEventVectorRef)
-ctypedef c_dpctl.DPCTLSyclEventRef(*ftpr_custom_vander_1in_1out_t)(c_dpctl.DPCTLSyclQueueRef,
-                                                                   void * , void * , size_t, size_t, int,
-                                                                   const c_dpctl.DPCTLEventVectorRef) except +
-ctypedef c_dpctl.DPCTLSyclEventRef(*custom_arraycreation_1in_1out_func_ptr_t)(c_dpctl.DPCTLSyclQueueRef,
-                                                                              void *,
-                                                                              const size_t,
-                                                                              const size_t,
-                                                                              const shape_elem_type*,
-                                                                              const shape_elem_type*,
-                                                                              void *,
-                                                                              const size_t,
-                                                                              const size_t,
-                                                                              const shape_elem_type*,
-                                                                              const shape_elem_type*,
-                                                                              const shape_elem_type *,
-                                                                              const size_t,
-                                                                              const c_dpctl.DPCTLEventVectorRef)
-ctypedef c_dpctl.DPCTLSyclEventRef(*custom_indexing_1out_func_ptr_t)(c_dpctl.DPCTLSyclQueueRef,
-                                                                     void * ,
-                                                                     const size_t ,
-                                                                     const size_t ,
-                                                                     const int,
-                                                                     const c_dpctl.DPCTLEventVectorRef) except +
-
-
-cpdef utils.dpnp_descriptor dpnp_copy(utils.dpnp_descriptor x1):
-    return call_fptr_1in_1out_strides(DPNP_FN_COPY_EXT, x1)
diff --git a/dpnp/dpnp_algo/dpnp_algo_sorting.pxi b/dpnp/dpnp_algo/dpnp_algo_sorting.pxi
index 4947fa9e41d1..5da472a246bd 100644
--- a/dpnp/dpnp_algo/dpnp_algo_sorting.pxi
+++ b/dpnp/dpnp_algo/dpnp_algo_sorting.pxi
@@ -58,7 +58,7 @@ cpdef utils.dpnp_descriptor dpnp_partition(utils.dpnp_descriptor arr, int kth, a
 
     cdef DPNPFuncData kernel_data = get_dpnp_function_ptr(DPNP_FN_PARTITION_EXT, param1_type, param1_type)
 
-    cdef utils.dpnp_descriptor arr2 = dpnp_copy(arr)
+    cdef utils.dpnp_descriptor arr2 = dpnp.get_dpnp_descriptor(arr.get_pyobj().copy(), copy_when_nondefault_queue=False)
 
     arr_obj = arr.get_array()
 

From 805d50218142833206af2d7441a845942c01440a Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 30 Jun 2024 12:36:40 +0200
Subject: [PATCH 41/49] Bump github/codeql-action from 3.25.10 to 3.25.11
 (#1904)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.25.10 to 3.25.11.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/23acc5c183826b7a8a97bce3cecc52db901f8251...b611370bb5703a7efb587f9d136a52ea24c5c38c)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
 .github/workflows/openssf-scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/openssf-scorecard.yml b/.github/workflows/openssf-scorecard.yml
index 803f20d284b4..9658c7e3b2f0 100644
--- a/.github/workflows/openssf-scorecard.yml
+++ b/.github/workflows/openssf-scorecard.yml
@@ -68,6 +68,6 @@ jobs:
 
       # Upload the results to GitHub's code scanning dashboard.
       - name: "Upload to code-scanning"
-        uses: github/codeql-action/upload-sarif@23acc5c183826b7a8a97bce3cecc52db901f8251 # v3.25.10
+        uses: github/codeql-action/upload-sarif@b611370bb5703a7efb587f9d136a52ea24c5c38c # v3.25.11
         with:
           sarif_file: results.sarif

From 1a4f8a4b8ff9223425f13b83af5728b7ba56d396 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Tue, 2 Jul 2024 16:50:35 +0200
Subject: [PATCH 42/49] Adopt dpnp interface to asynchronous dpctl execution
 (Part #1) (#1897)

* Update manipulation functions

* Update functions from the array creation container

* Update dpnp array methods

* Implement backward compatible solution

* dpnp.meshgrid has to follow CFD and prohibit input arrays allocating on different SYCL queues

* updated linspace, logspace and geomspace functions

* Updated elementwise functions and astype

* Updated counting and histogram functions

* Switched back to use dppy/label/dev for coverage GH action

* Removed dpnp_container.linspace since unused

* Return dpnp ndarray for linspace, logspace and geomspace internal functions
---
 .github/workflows/generate_coverage.yaml  |   2 +-
 dpnp/dpnp_algo/dpnp_arraycreation.py      | 114 ++++++++++----------
 dpnp/dpnp_algo/dpnp_elementwise_common.py | 125 +++++++++++++---------
 dpnp/dpnp_array.py                        |   3 +
 dpnp/dpnp_container.py                    |  42 ++------
 dpnp/dpnp_iface.py                        |  19 +++-
 dpnp/dpnp_iface_arraycreation.py          |  77 +++++++++----
 dpnp/dpnp_iface_counting.py               |   7 +-
 dpnp/dpnp_iface_histograms.py             |  68 ++++++++----
 dpnp/dpnp_iface_manipulation.py           |  64 +++++++----
 tests/test_sycl_queue.py                  |  13 +--
 11 files changed, 318 insertions(+), 216 deletions(-)

diff --git a/.github/workflows/generate_coverage.yaml b/.github/workflows/generate_coverage.yaml
index 1fa71fb479dc..5a0480235a7b 100644
--- a/.github/workflows/generate_coverage.yaml
+++ b/.github/workflows/generate_coverage.yaml
@@ -21,7 +21,7 @@ jobs:
 
     env:
       python-ver: '3.10'
-      CHANNELS: '-c dppy/label/coverage -c intel -c conda-forge --override-channels'
+      CHANNELS: '-c dppy/label/dev -c intel -c conda-forge --override-channels'
       # Install the latest oneAPI compiler to work around an issue
       INSTALL_ONE_API: 'yes'
 
diff --git a/dpnp/dpnp_algo/dpnp_arraycreation.py b/dpnp/dpnp_algo/dpnp_arraycreation.py
index 83cd9da4acf2..b493efac9931 100644
--- a/dpnp/dpnp_algo/dpnp_arraycreation.py
+++ b/dpnp/dpnp_algo/dpnp_arraycreation.py
@@ -1,12 +1,13 @@
 import math
 import operator
 
+import dpctl.tensor as dpt
 import dpctl.utils as dpu
 import numpy
 
 import dpnp
-import dpnp.dpnp_container as dpnp_container
 import dpnp.dpnp_utils as utils
+from dpnp.dpnp_array import dpnp_array
 
 __all__ = [
     "dpnp_geomspace",
@@ -16,6 +17,12 @@
 ]
 
 
+def _as_usm_ndarray(a, usm_type, sycl_queue):
+    if isinstance(a, dpnp_array):
+        return a.get_array()
+    return dpt.asarray(a, usm_type=usm_type, sycl_queue=sycl_queue)
+
+
 def dpnp_geomspace(
     start,
     stop,
@@ -40,14 +47,8 @@ def dpnp_geomspace(
     else:
         _usm_type = usm_type
 
-    if not dpnp.is_supported_array_type(start):
-        start = dpnp.asarray(
-            start, usm_type=_usm_type, sycl_queue=sycl_queue_normalized
-        )
-    if not dpnp.is_supported_array_type(stop):
-        stop = dpnp.asarray(
-            stop, usm_type=_usm_type, sycl_queue=sycl_queue_normalized
-        )
+    start = _as_usm_ndarray(start, _usm_type, sycl_queue_normalized)
+    stop = _as_usm_ndarray(stop, _usm_type, sycl_queue_normalized)
 
     dt = numpy.result_type(start, stop, float(num))
     dt = utils.map_dtype_to_device(dt, sycl_queue_normalized.sycl_device)
@@ -57,8 +58,8 @@ def dpnp_geomspace(
     if dpnp.any(start == 0) or dpnp.any(stop == 0):
         raise ValueError("Geometric sequence cannot include zero")
 
-    out_sign = dpnp.ones(
-        dpnp.broadcast_arrays(start, stop)[0].shape,
+    out_sign = dpt.ones(
+        dpt.broadcast_arrays(start, stop)[0].shape,
         dtype=dt,
         usm_type=_usm_type,
         sycl_queue=sycl_queue_normalized,
@@ -72,15 +73,15 @@ def dpnp_geomspace(
             stop[all_imag] = stop[all_imag].imag
             out_sign[all_imag] = 1j
 
-    both_negative = (dpnp.sign(start) == -1) & (dpnp.sign(stop) == -1)
+    both_negative = (dpt.sign(start) == -1) & (dpt.sign(stop) == -1)
     if dpnp.any(both_negative):
-        dpnp.negative(start[both_negative], out=start[both_negative])
-        dpnp.negative(stop[both_negative], out=stop[both_negative])
-        dpnp.negative(out_sign[both_negative], out=out_sign[both_negative])
+        dpt.negative(start[both_negative], out=start[both_negative])
+        dpt.negative(stop[both_negative], out=stop[both_negative])
+        dpt.negative(out_sign[both_negative], out=out_sign[both_negative])
 
-    log_start = dpnp.log10(start)
-    log_stop = dpnp.log10(stop)
-    result = dpnp_logspace(
+    log_start = dpt.log10(start)
+    log_stop = dpt.log10(stop)
+    res = dpnp_logspace(
         log_start,
         log_stop,
         num=num,
@@ -89,19 +90,20 @@ def dpnp_geomspace(
         dtype=dtype,
         usm_type=_usm_type,
         sycl_queue=sycl_queue_normalized,
-    )
+    ).get_array()
 
     if num > 0:
-        result[0] = start
+        res[0] = start
         if num > 1 and endpoint:
-            result[-1] = stop
+            res[-1] = stop
 
-    result = out_sign * result
+    res = out_sign * res
 
     if axis != 0:
-        result = dpnp.moveaxis(result, 0, axis)
+        res = dpt.moveaxis(res, 0, axis)
 
-    return result.astype(dtype, copy=False)
+    res = dpt.astype(res, dtype, copy=False)
+    return dpnp_array._create_from_usm_ndarray(res)
 
 
 def dpnp_linspace(
@@ -129,14 +131,11 @@ def dpnp_linspace(
     else:
         _usm_type = usm_type
 
-    if not hasattr(start, "dtype") and not dpnp.isscalar(start):
-        start = dpnp.asarray(
-            start, usm_type=_usm_type, sycl_queue=sycl_queue_normalized
-        )
-    if not hasattr(stop, "dtype") and not dpnp.isscalar(stop):
-        stop = dpnp.asarray(
-            stop, usm_type=_usm_type, sycl_queue=sycl_queue_normalized
-        )
+    if not dpnp.isscalar(start):
+        start = _as_usm_ndarray(start, _usm_type, sycl_queue_normalized)
+
+    if not dpnp.isscalar(stop):
+        stop = _as_usm_ndarray(stop, _usm_type, sycl_queue_normalized)
 
     dt = numpy.result_type(start, stop, float(num))
     dt = utils.map_dtype_to_device(dt, sycl_queue_normalized.sycl_device)
@@ -155,7 +154,7 @@ def dpnp_linspace(
 
     if dpnp.isscalar(start) and dpnp.isscalar(stop):
         # Call linspace() function for scalars.
-        res = dpnp_container.linspace(
+        usm_res = dpt.linspace(
             start,
             stop,
             num,
@@ -167,17 +166,17 @@ def dpnp_linspace(
         if retstep is True and step_nan is False:
             step = (stop - start) / step_num
     else:
-        _start = dpnp.asarray(
+        usm_start = dpt.asarray(
             start,
             dtype=dt,
             usm_type=_usm_type,
             sycl_queue=sycl_queue_normalized,
         )
-        _stop = dpnp.asarray(
+        usm_stop = dpt.asarray(
             stop, dtype=dt, usm_type=_usm_type, sycl_queue=sycl_queue_normalized
         )
 
-        res = dpnp_container.arange(
+        usm_res = dpt.arange(
             0,
             stop=num,
             step=1,
@@ -187,28 +186,29 @@ def dpnp_linspace(
         )
 
         if step_nan is False:
-            step = (_stop - _start) / step_num
-            res = res.reshape((-1,) + (1,) * step.ndim)
-            res = res * step + _start
+            step = (usm_stop - usm_start) / step_num
+            usm_res = dpt.reshape(usm_res, (-1,) + (1,) * step.ndim, copy=False)
+            usm_res = usm_res * step
+            usm_res += usm_start
 
         if endpoint and num > 1:
-            res[-1] = dpnp_container.full(step.shape, _stop)
+            usm_res[-1] = dpt.full(step.shape, usm_stop)
 
     if axis != 0:
-        res = dpnp.moveaxis(res, 0, axis)
+        usm_res = dpt.moveaxis(usm_res, 0, axis)
 
     if numpy.issubdtype(dtype, dpnp.integer):
-        dpnp.floor(res, out=res)
+        dpt.floor(usm_res, out=usm_res)
 
-    res = res.astype(dtype, copy=False)
+    res = dpt.astype(usm_res, dtype, copy=False)
+    res = dpnp_array._create_from_usm_ndarray(res)
 
     if retstep is True:
         if dpnp.isscalar(step):
-            step = dpnp.asarray(
+            step = dpt.asarray(
                 step, usm_type=res.usm_type, sycl_queue=res.sycl_queue
             )
-        return (res, step)
-
+        return res, dpnp_array._create_from_usm_ndarray(step)
     return res
 
 
@@ -239,12 +239,15 @@ def dpnp_logspace(
             usm_type = "device" if usm_type_alloc is None else usm_type_alloc
         else:
             usm_type = usm_type
-        start = dpnp.asarray(start, usm_type=usm_type, sycl_queue=sycl_queue)
-        stop = dpnp.asarray(stop, usm_type=usm_type, sycl_queue=sycl_queue)
-        base = dpnp.asarray(base, usm_type=usm_type, sycl_queue=sycl_queue)
-        [start, stop, base] = dpnp.broadcast_arrays(start, stop, base)
-        base = dpnp.expand_dims(base, axis=axis)
 
+        start = _as_usm_ndarray(start, usm_type, sycl_queue)
+        stop = _as_usm_ndarray(stop, usm_type, sycl_queue)
+        base = _as_usm_ndarray(base, usm_type, sycl_queue)
+
+        [start, stop, base] = dpt.broadcast_arrays(start, stop, base)
+        base = dpt.expand_dims(base, axis=axis)
+
+    # assume res as not a tuple, because retstep is False
     res = dpnp_linspace(
         start,
         stop,
@@ -254,11 +257,12 @@ def dpnp_logspace(
         sycl_queue=sycl_queue,
         endpoint=endpoint,
         axis=axis,
-    )
+    ).get_array()
 
-    if dtype is None:
-        return dpnp.power(base, res)
-    return dpnp.power(base, res).astype(dtype, copy=False)
+    dpt.pow(base, res, out=res)
+    if dtype is not None:
+        res = dpt.astype(res, dtype, copy=False)
+    return dpnp_array._create_from_usm_ndarray(res)
 
 
 class dpnp_nd_grid:
diff --git a/dpnp/dpnp_algo/dpnp_elementwise_common.py b/dpnp/dpnp_algo/dpnp_elementwise_common.py
index 374981a63031..b13ea56bc329 100644
--- a/dpnp/dpnp_algo/dpnp_elementwise_common.py
+++ b/dpnp/dpnp_algo/dpnp_elementwise_common.py
@@ -24,6 +24,7 @@
 # THE POSSIBILITY OF SUCH DAMAGE.
 # *****************************************************************************
 
+import dpctl.tensor as dpt
 import numpy
 from dpctl.tensor._elementwise_common import (
     BinaryElementwiseFunc,
@@ -161,24 +162,27 @@ def __call__(
                 f"Requested function={self.name_} only takes `out` or `dtype`"
                 "as an argument, but both were provided."
             )
+
+        if order is None:
+            order = "K"
+        elif order in "afkcAFKC":
+            order = order.upper()
         else:
-            if order is None:
-                order = "K"
-            elif order in "afkcAFKC":
-                order = order.upper()
-            else:
-                raise ValueError(
-                    "order must be one of 'C', 'F', 'A', or 'K' "
-                    f"(got '{order}')"
-                )
-            if dtype is not None:
-                x = dpnp.astype(x, dtype=dtype, copy=False)
-            x_usm = dpnp.get_usm_ndarray(x)
-            out_usm = None if out is None else dpnp.get_usm_ndarray(out)
-            res_usm = super().__call__(x_usm, out=out_usm, order=order)
-            if out is not None and isinstance(out, dpnp_array):
-                return out
-            return dpnp_array._create_from_usm_ndarray(res_usm)
+            raise ValueError(
+                "order must be one of 'C', 'F', 'A', or 'K' " f"(got '{order}')"
+            )
+
+        x_usm = dpnp.get_usm_ndarray(x)
+        if dtype is not None:
+            x_usm = dpt.astype(x_usm, dtype, copy=False)
+
+        out_usm = None if out is None else dpnp.get_usm_ndarray(out)
+        res_usm = super().__call__(x_usm, out=out_usm, order=order)
+
+        dpnp.synchronize_array_data(res_usm)
+        if out is not None and isinstance(out, dpnp_array):
+            return out
+        return dpnp_array._create_from_usm_ndarray(res_usm)
 
 
 class DPNPBinaryFunc(BinaryElementwiseFunc):
@@ -311,35 +315,47 @@ def __call__(
                 f"Requested function={self.name_} only takes `out` or `dtype`"
                 "as an argument, but both were provided."
             )
+
+        if order is None:
+            order = "K"
+        elif order in "afkcAFKC":
+            order = order.upper()
         else:
-            if order is None:
-                order = "K"
-            elif order in "afkcAFKC":
-                order = order.upper()
-            else:
-                raise ValueError(
-                    "order must be one of 'C', 'F', 'A', or 'K' "
-                    f"(got '{order}')"
+            raise ValueError(
+                "order must be one of 'C', 'F', 'A', or 'K' (got '{order}')"
+            )
+
+        x1_usm = dpnp.get_usm_ndarray_or_scalar(x1)
+        x2_usm = dpnp.get_usm_ndarray_or_scalar(x2)
+
+        if dtype is not None:
+            if dpnp.isscalar(x1):
+                x1_usm = dpt.asarray(
+                    x1,
+                    dtype=dtype,
+                    sycl_queue=x2.sycl_queue,
+                    usm_type=x2.usm_type,
                 )
-            if dtype is not None:
-                if dpnp.isscalar(x1):
-                    x1 = dpnp.asarray(x1, dtype=dtype)
-                    x2 = dpnp.astype(x2, dtype=dtype, copy=False)
-                elif dpnp.isscalar(x2):
-                    x1 = dpnp.astype(x1, dtype=dtype, copy=False)
-                    x2 = dpnp.asarray(x2, dtype=dtype)
-                else:
-                    x1 = dpnp.astype(x1, dtype=dtype, copy=False)
-                    x2 = dpnp.astype(x2, dtype=dtype, copy=False)
-
-            x1_usm = dpnp.get_usm_ndarray_or_scalar(x1)
-            x2_usm = dpnp.get_usm_ndarray_or_scalar(x2)
+                x2_usm = dpt.astype(x2_usm, dtype, copy=False)
+            elif dpnp.isscalar(x2):
+                x1_usm = dpt.astype(x1_usm, dtype, copy=False)
+                x2_usm = dpt.asarray(
+                    x2,
+                    dtype=dtype,
+                    sycl_queue=x1.sycl_queue,
+                    usm_type=x1.usm_type,
+                )
+            else:
+                x1_usm = dpt.astype(x1_usm, dtype, copy=False)
+                x2_usm = dpt.astype(x2_usm, dtype, copy=False)
 
-            out_usm = None if out is None else dpnp.get_usm_ndarray(out)
-            res_usm = super().__call__(x1_usm, x2_usm, out=out_usm, order=order)
-            if out is not None and isinstance(out, dpnp_array):
-                return out
-            return dpnp_array._create_from_usm_ndarray(res_usm)
+        out_usm = None if out is None else dpnp.get_usm_ndarray(out)
+        res_usm = super().__call__(x1_usm, x2_usm, out=out_usm, order=order)
+
+        dpnp.synchronize_array_data(res_usm)
+        if out is not None and isinstance(out, dpnp_array):
+            return out
+        return dpnp_array._create_from_usm_ndarray(res_usm)
 
     def outer(
         self,
@@ -463,7 +479,7 @@ def __init__(
     def __call__(self, x, deg=False):
         res = super().__call__(x)
         if deg is True:
-            res = res * (180 / dpnp.pi)
+            res *= 180 / dpnp.pi
         return res
 
 
@@ -513,14 +529,21 @@ def __init__(
 
     def __call__(self, x, decimals=0, out=None, dtype=None):
         if decimals != 0:
-            if dpnp.issubdtype(x.dtype, dpnp.integer) and dtype is None:
-                dtype = x.dtype
-            res = dpnp.true_divide(
-                dpnp.rint(x * 10**decimals, out=out), 10**decimals, out=out
-            )
+            x_usm = dpnp.get_usm_ndarray(x)
+            if dpnp.issubdtype(x_usm.dtype, dpnp.integer) and dtype is None:
+                dtype = x_usm.dtype
+
+            out_usm = None if out is None else dpnp.get_usm_ndarray(out)
+            x_usm = dpt.round(x_usm * 10**decimals, out=out_usm)
+            res_usm = dpt.divide(x_usm, 10**decimals, out=out_usm)
+
             if dtype is not None:
-                res = res.astype(dtype)
-            return res
+                res_usm = dpt.astype(res_usm, dtype, copy=False)
+
+            dpnp.synchronize_array_data(res_usm)
+            if out is not None and isinstance(out, dpnp_array):
+                return out
+            return dpnp_array._create_from_usm_ndarray(res_usm)
         else:
             return super().__call__(x, out=out, dtype=dtype)
 
diff --git a/dpnp/dpnp_array.py b/dpnp/dpnp_array.py
index fd2d06f74285..d9936872a890 100644
--- a/dpnp/dpnp_array.py
+++ b/dpnp/dpnp_array.py
@@ -258,6 +258,8 @@ def __getitem__(self, key):
         res = self.__new__(dpnp_array)
         res._array_obj = item
 
+        if self._array_obj.usm_data is not res._array_obj.usm_data:
+            dpnp.synchronize_array_data(self)
         return res
 
     def __gt__(self, other):
@@ -454,6 +456,7 @@ def __setitem__(self, key, val):
             val = val.get_array()
 
         self._array_obj.__setitem__(key, val)
+        dpnp.synchronize_array_data(self)
 
     # '__setstate__',
     # '__sizeof__',
diff --git a/dpnp/dpnp_container.py b/dpnp/dpnp_container.py
index 5322df3324b4..8f70e015393c 100644
--- a/dpnp/dpnp_container.py
+++ b/dpnp/dpnp_container.py
@@ -47,7 +47,6 @@
     "empty",
     "eye",
     "full",
-    "linspace",
     "ones",
     "tril",
     "triu",
@@ -81,6 +80,7 @@ def arange(
         sycl_queue=sycl_queue_normalized,
     )
 
+    dpnp.synchronize_array_data(array_obj)
     return dpnp_array(array_obj.shape, buffer=array_obj)
 
 
@@ -133,6 +133,7 @@ def asarray(
         if array_obj is x1_obj and isinstance(x1, dpnp_array):
             return x1
 
+    dpnp.synchronize_array_data(array_obj)
     return dpnp_array(array_obj.shape, buffer=array_obj, order=order)
 
 
@@ -142,6 +143,7 @@ def copy(x1, /, *, order="K"):
         order = "K"
 
     array_obj = dpt.copy(dpnp.get_usm_ndarray(x1), order=order)
+    dpnp.synchronize_array_data(array_obj)
     return dpnp_array(array_obj.shape, buffer=array_obj, order="K")
 
 
@@ -203,6 +205,7 @@ def eye(
         usm_type=usm_type,
         sycl_queue=sycl_queue_normalized,
     )
+    dpnp.synchronize_array_data(array_obj)
     return dpnp_array(array_obj.shape, buffer=array_obj, order=order)
 
 
@@ -237,40 +240,10 @@ def full(
         usm_type=usm_type,
         sycl_queue=sycl_queue_normalized,
     )
+    dpnp.synchronize_array_data(array_obj)
     return dpnp_array(array_obj.shape, buffer=array_obj, order=order)
 
 
-def linspace(
-    start,
-    stop,
-    /,
-    num,
-    *,
-    dtype=None,
-    device=None,
-    usm_type="device",
-    sycl_queue=None,
-    endpoint=True,
-):
-    """Validate input parameters before passing them into `dpctl.tensor` module"""
-    dpu.validate_usm_type(usm_type, allow_none=False)
-    sycl_queue_normalized = dpnp.get_normalized_queue_device(
-        sycl_queue=sycl_queue, device=device
-    )
-
-    """Creates `dpnp_array` with evenly spaced numbers of specified interval."""
-    array_obj = dpt.linspace(
-        start,
-        stop,
-        num,
-        dtype=dtype,
-        usm_type=usm_type,
-        sycl_queue=sycl_queue_normalized,
-        endpoint=endpoint,
-    )
-    return dpnp_array(array_obj.shape, buffer=array_obj)
-
-
 def ones(
     shape,
     *,
@@ -296,18 +269,21 @@ def ones(
         usm_type=usm_type,
         sycl_queue=sycl_queue_normalized,
     )
+    dpnp.synchronize_array_data(array_obj)
     return dpnp_array(array_obj.shape, buffer=array_obj, order=order)
 
 
 def tril(x1, /, *, k=0):
     """Creates `dpnp_array` as lower triangular part of an input array."""
     array_obj = dpt.tril(dpnp.get_usm_ndarray(x1), k=k)
+    dpnp.synchronize_array_data(array_obj)
     return dpnp_array(array_obj.shape, buffer=array_obj, order="K")
 
 
 def triu(x1, /, *, k=0):
     """Creates `dpnp_array` as upper triangular part of an input array."""
     array_obj = dpt.triu(dpnp.get_usm_ndarray(x1), k=k)
+    dpnp.synchronize_array_data(array_obj)
     return dpnp_array(array_obj.shape, buffer=array_obj, order="K")
 
 
@@ -336,4 +312,6 @@ def zeros(
         usm_type=usm_type,
         sycl_queue=sycl_queue_normalized,
     )
+    # TODO: uncomment once dpctl implements asynchronous call
+    # dpnp.synchronize_array_data(array_obj)
     return dpnp_array(array_obj.shape, buffer=array_obj, order=order)
diff --git a/dpnp/dpnp_iface.py b/dpnp/dpnp_iface.py
index 49e7b41c01c9..b3103869e8d3 100644
--- a/dpnp/dpnp_iface.py
+++ b/dpnp/dpnp_iface.py
@@ -42,6 +42,7 @@
 
 import dpctl
 import dpctl.tensor as dpt
+import dpctl.utils as dpu
 import numpy
 from dpctl.tensor._device import normalize_queue_device
 
@@ -69,6 +70,7 @@
     "get_usm_ndarray_or_scalar",
     "is_supported_array_or_scalar",
     "is_supported_array_type",
+    "synchronize_array_data",
 ]
 
 from dpnp import float64, isscalar
@@ -238,10 +240,10 @@ def astype(x1, dtype, order="K", casting="unsafe", copy=True, device=None):
         x1_obj, dtype, order=order, casting=casting, copy=copy, device=device
     )
 
-    # return x1 if dpctl returns a zero copy of x1_obj
+    dpnp.synchronize_array_data(x1)
     if array_obj is x1_obj and isinstance(x1, dpnp_array):
+        # return x1 if dpctl returns a zero copy of x1_obj
         return x1
-
     return dpnp_array._create_from_usm_ndarray(array_obj)
 
 
@@ -699,3 +701,16 @@ def is_supported_array_type(a):
     """
 
     return isinstance(a, (dpnp_array, dpt.usm_ndarray))
+
+
+def synchronize_array_data(a):
+    """
+    The dpctl interface was reworked to make asynchronous execution.
+    That function makes a synchronization call to ensure array data is valid
+    before exit from dpnp interface function.
+
+    """
+
+    if hasattr(dpu, "SequentialOrderManager"):
+        check_supported_arrays_type(a)
+        dpu.SequentialOrderManager[a.sycl_queue].wait()
diff --git a/dpnp/dpnp_iface_arraycreation.py b/dpnp/dpnp_iface_arraycreation.py
index 5cf63ea0fca0..6698f3f782e8 100644
--- a/dpnp/dpnp_iface_arraycreation.py
+++ b/dpnp/dpnp_iface_arraycreation.py
@@ -40,6 +40,7 @@
 
 import operator
 
+import dpctl.tensor as dpt
 import numpy
 
 import dpnp
@@ -51,6 +52,10 @@
     dpnp_logspace,
     dpnp_nd_grid,
 )
+from .dpnp_array import dpnp_array
+
+# pylint: disable=no-name-in-module
+from .dpnp_utils import get_usm_allocations, map_dtype_to_device
 
 __all__ = [
     "arange",
@@ -2183,7 +2188,7 @@ def geomspace(
 
     """
 
-    return dpnp_geomspace(
+    res = dpnp_geomspace(
         start,
         stop,
         num,
@@ -2195,6 +2200,9 @@ def geomspace(
         axis=axis,
     )
 
+    dpnp.synchronize_array_data(res)
+    return res
+
 
 def identity(
     n,
@@ -2402,7 +2410,7 @@ def linspace(
 
     """
 
-    return dpnp_linspace(
+    res = dpnp_linspace(
         start,
         stop,
         num,
@@ -2415,6 +2423,12 @@ def linspace(
         axis=axis,
     )
 
+    if isinstance(res, tuple):  # (result, step) is returning
+        dpnp.synchronize_array_data(res[0])
+    else:
+        dpnp.synchronize_array_data(res)
+    return res
+
 
 def loadtxt(
     fname,
@@ -2629,7 +2643,7 @@ def logspace(
 
     """
 
-    return dpnp_logspace(
+    res = dpnp_logspace(
         start,
         stop,
         num=num,
@@ -2642,6 +2656,9 @@ def logspace(
         axis=axis,
     )
 
+    dpnp.synchronize_array_data(res)
+    return res
+
 
 # pylint: disable=redefined-outer-name
 def meshgrid(*xi, copy=True, sparse=False, indexing="xy"):
@@ -2720,21 +2737,30 @@ def meshgrid(*xi, copy=True, sparse=False, indexing="xy"):
             "Unrecognized indexing keyword value, expecting 'xy' or 'ij'."
         )
 
+    if ndim < 1:
+        return []
+
     s0 = (1,) * ndim
     output = [
-        dpnp.reshape(x, s0[:i] + (-1,) + s0[i + 1 :]) for i, x in enumerate(xi)
+        dpt.reshape(dpnp.get_usm_ndarray(x), s0[:i] + (-1,) + s0[i + 1 :])
+        for i, x in enumerate(xi)
     ]
 
+    # input arrays must be allocated on the same queue
+    _, _ = get_usm_allocations(output)
+
     if indexing == "xy" and ndim > 1:
-        output[0] = output[0].reshape((1, -1) + s0[2:])
-        output[1] = output[1].reshape((-1, 1) + s0[2:])
+        output[0] = dpt.reshape(output[0], (1, -1) + s0[2:])
+        output[1] = dpt.reshape(output[1], (-1, 1) + s0[2:])
 
     if not sparse:
-        output = dpnp.broadcast_arrays(*output)
+        output = dpt.broadcast_arrays(*output)
 
     if copy:
-        output = [x.copy() for x in output]
+        output = [dpt.copy(x) for x in output]
 
+    dpnp.synchronize_array_data(output[0])
+    output = [dpnp_array._create_from_usm_ndarray(x) for x in output]
     return output
 
 
@@ -3261,7 +3287,10 @@ def tri(
 
     _dtype = dpnp.default_float_type() if dtype in (dpnp.float, None) else dtype
 
-    m = dpnp.ones(
+    if usm_type is None:
+        usm_type = "device"
+
+    m = dpt.ones(
         (N, M),
         dtype=_dtype,
         device=device,
@@ -3469,28 +3498,34 @@ def vander(
             [125,  25,   5,   1]]), Device(level_zero:gpu:0), 'host')
     """
 
-    x = dpnp.asarray(x, device=device, usm_type=usm_type, sycl_queue=sycl_queue)
+    if dpnp.is_supported_array_type(x):
+        x = dpnp.get_usm_ndarray(x)
+    usm_x = dpt.asarray(
+        x, device=device, usm_type=usm_type, sycl_queue=sycl_queue
+    )
+
+    x_sycl_queue = usm_x.sycl_queue
+    x_usm_type = usm_x.usm_type
 
     if N is not None and not isinstance(N, int):
         raise TypeError(f"An integer is required, but got {type(N)}")
 
-    if x.ndim != 1:
+    if usm_x.ndim != 1:
         raise ValueError("`x` must be a one-dimensional array or sequence.")
 
     if N is None:
-        N = x.size
+        N = usm_x.size
+
+    _dtype = numpy.promote_types(usm_x.dtype, int)
+    _dtype = map_dtype_to_device(_dtype, x_sycl_queue.sycl_device)
+    m = dpnp.empty_like(usm_x, shape=(usm_x.size, N), dtype=_dtype)
 
-    _dtype = int if x.dtype == bool else x.dtype
-    m = empty(
-        (x.size, N),
-        dtype=_dtype,
-        usm_type=x.usm_type,
-        sycl_queue=x.sycl_queue,
-    )
     tmp = m[:, ::-1] if not increasing else m
     dpnp.power(
-        x.reshape(-1, 1),
-        dpnp.arange(N, dtype=_dtype, sycl_queue=x.sycl_queue),
+        dpt.reshape(usm_x, (-1, 1)),
+        dpt.arange(
+            N, dtype=_dtype, usm_type=x_usm_type, sycl_queue=x_sycl_queue
+        ),
         out=tmp,
     )
     return m
diff --git a/dpnp/dpnp_iface_counting.py b/dpnp/dpnp_iface_counting.py
index 8a90601ce8fe..515cad08a06b 100644
--- a/dpnp/dpnp_iface_counting.py
+++ b/dpnp/dpnp_iface_counting.py
@@ -37,6 +37,8 @@
 
 """
 
+import dpctl.tensor as dpt
+
 import dpnp
 
 __all__ = ["count_nonzero"]
@@ -87,5 +89,6 @@ def count_nonzero(a, axis=None, *, keepdims=False):
 
     # TODO: might be improved by implementing an extension
     # with `count_nonzero` kernel
-    a = dpnp.astype(a, dpnp.bool, copy=False)
-    return a.sum(axis=axis, dtype=dpnp.intp, keepdims=keepdims)
+    usm_a = dpnp.get_usm_ndarray(a)
+    usm_a = dpt.astype(usm_a, dpnp.bool, copy=False)
+    return dpnp.sum(usm_a, axis=axis, dtype=dpnp.intp, keepdims=keepdims)
diff --git a/dpnp/dpnp_iface_histograms.py b/dpnp/dpnp_iface_histograms.py
index 1a1b4daf740d..24c8b6aaf78d 100644
--- a/dpnp/dpnp_iface_histograms.py
+++ b/dpnp/dpnp_iface_histograms.py
@@ -40,11 +40,17 @@
 import operator
 import warnings
 
+import dpctl.tensor as dpt
 import dpctl.utils as dpu
 import numpy
 
 import dpnp
 
+from .dpnp_algo.dpnp_arraycreation import (
+    dpnp_linspace,
+)
+from .dpnp_array import dpnp_array
+
 __all__ = [
     "digitize",
     "histogram",
@@ -60,7 +66,7 @@ def _ravel_check_a_and_weights(a, weights):
     """Check input `a` and `weights` arrays, and ravel both."""
 
     # ensure that `a` array has supported type
-    dpnp.check_supported_arrays_type(a)
+    a = dpnp.get_usm_ndarray(a)
     usm_type = a.usm_type
 
     # ensure that the array is a "subtractable" dtype
@@ -71,11 +77,11 @@ def _ravel_check_a_and_weights(a, weights):
             RuntimeWarning,
             stacklevel=3,
         )
-        a = a.astype(numpy.uint8)
+        a = dpt.astype(a, numpy.uint8)
 
     if weights is not None:
         # check that `weights` array has supported type
-        dpnp.check_supported_arrays_type(weights)
+        weights = dpnp.get_usm_ndarray(weights)
         usm_type = dpu.get_coerced_usm_type([usm_type, weights.usm_type])
 
         # check that arrays have the same allocation queue
@@ -86,8 +92,9 @@ def _ravel_check_a_and_weights(a, weights):
 
         if weights.shape != a.shape:
             raise ValueError("weights should have the same shape as a.")
-        weights = weights.ravel()
-    a = a.ravel()
+        weights = dpt.reshape(weights, -1)
+
+    a = dpt.reshape(a, -1)
     return a, weights, usm_type
 
 
@@ -113,7 +120,7 @@ def _get_outer_edges(a, range):
         first_edge, last_edge = 0, 1
 
     else:
-        first_edge, last_edge = a.min(), a.max()
+        first_edge, last_edge = dpt.min(a), dpt.max(a)
         if not (dpnp.isfinite(first_edge) and dpnp.isfinite(last_edge)):
             raise ValueError(
                 f"autodetected range of [{first_edge}, {last_edge}] "
@@ -157,9 +164,9 @@ def _get_bin_edges(a, bins, range, usm_type):
                     "a and bins must be allocated on the same SYCL queue"
                 )
 
-            bin_edges = bins
+            bin_edges = dpnp.get_usm_ndarray(bins)
         else:
-            bin_edges = dpnp.asarray(
+            bin_edges = dpt.asarray(
                 bins, sycl_queue=sycl_queue, usm_type=usm_type
             )
 
@@ -183,7 +190,7 @@ def _get_bin_edges(a, bins, range, usm_type):
             )
 
         # bin edges must be computed
-        bin_edges = dpnp.linspace(
+        bin_edges = dpnp_linspace(
             first_edge,
             last_edge,
             n_equal_bins + 1,
@@ -191,7 +198,7 @@ def _get_bin_edges(a, bins, range, usm_type):
             dtype=bin_type,
             sycl_queue=sycl_queue,
             usm_type=usm_type,
-        )
+        ).get_array()
         return bin_edges, (first_edge, last_edge, n_equal_bins)
     return bin_edges, None
 
@@ -204,8 +211,11 @@ def _search_sorted_inclusive(a, v):
 
     """
 
-    return dpnp.concatenate(
-        (a.searchsorted(v[:-1], "left"), a.searchsorted(v[-1:], "right"))
+    return dpt.concat(
+        (
+            dpt.searchsorted(a, v[:-1], side="left"),
+            dpt.searchsorted(a, v[-1:], side="right"),
+        )
     )
 
 
@@ -297,8 +307,14 @@ def digitize(x, bins, right=False):
         # Use dpnp.searchsorted directly if bins are increasing
         return dpnp.searchsorted(bins, x, side=side)
 
+    usm_x = dpnp.get_usm_ndarray(x)
+    usm_bins = dpnp.get_usm_ndarray(bins)
+
     # Reverse bins and adjust indices if bins are decreasing
-    return bins.size - dpnp.searchsorted(bins[::-1], x, side=side)
+    usm_res = usm_bins.size - dpt.searchsorted(usm_bins[::-1], usm_x, side=side)
+
+    dpnp.synchronize_array_data(usm_res)
+    return dpnp_array._create_from_usm_ndarray(usm_res)
 
 
 def histogram(a, bins=10, range=None, density=None, weights=None):
@@ -412,26 +428,36 @@ def histogram(a, bins=10, range=None, density=None, weights=None):
     else:
         # Compute via cumulative histogram
         if weights is None:
-            sa = dpnp.sort(a)
+            sa = dpt.sort(a)
             cum_n = _search_sorted_inclusive(sa, bin_edges)
         else:
-            zero = dpnp.zeros(
+            zero = dpt.zeros(
                 1, dtype=ntype, sycl_queue=a.sycl_queue, usm_type=usm_type
             )
-            sorting_index = dpnp.argsort(a)
+            sorting_index = dpt.argsort(a)
             sa = a[sorting_index]
             sw = weights[sorting_index]
-            cw = dpnp.concatenate((zero, sw.cumsum(dtype=ntype)))
+            cw = dpt.concat((zero, dpt.cumulative_sum(sw, dtype=ntype)))
             bin_index = _search_sorted_inclusive(sa, bin_edges)
             cum_n = cw[bin_index]
 
         n = dpnp.diff(cum_n)
 
+    # convert bin_edges to dpnp.ndarray
+    bin_edges = dpnp_array._create_from_usm_ndarray(bin_edges)
+
     if density:
         # pylint: disable=possibly-used-before-assignment
-        db = dpnp.diff(bin_edges).astype(dpnp.default_float_type())
-        return n / db / n.sum(), bin_edges
+        db = dpnp.diff(bin_edges)
+        db = dpt.astype(db.get_array(), dpnp.default_float_type())
+
+        usm_n = n.get_array()
+        hist = usm_n / db / dpt.sum(usm_n)
 
+        dpnp.synchronize_array_data(hist)
+        return dpnp_array._create_from_usm_ndarray(hist), bin_edges
+
+    dpnp.synchronize_array_data(n)
     return n, bin_edges
 
 
@@ -517,4 +543,6 @@ def histogram_bin_edges(a, bins=10, range=None, weights=None):
 
     a, weights, usm_type = _ravel_check_a_and_weights(a, weights)
     bin_edges, _ = _get_bin_edges(a, bins, range, usm_type)
-    return bin_edges
+
+    dpnp.synchronize_array_data(bin_edges)
+    return dpnp_array._create_from_usm_ndarray(bin_edges)
diff --git a/dpnp/dpnp_iface_manipulation.py b/dpnp/dpnp_iface_manipulation.py
index bf3c66d7fda9..a4b7352d4e64 100644
--- a/dpnp/dpnp_iface_manipulation.py
+++ b/dpnp/dpnp_iface_manipulation.py
@@ -668,12 +668,15 @@ def concatenate(
 
     usm_arrays = [dpnp.get_usm_ndarray(x) for x in arrays]
     usm_res = dpt.concat(usm_arrays, axis=axis)
+
     res = dpnp_array._create_from_usm_ndarray(usm_res)
     if dtype is not None:
         res = res.astype(dtype, casting=casting, copy=False)
     elif out is not None:
         dpnp.copyto(out, res, casting=casting)
         return out
+
+    dpnp.synchronize_array_data(res)
     return res
 
 
@@ -907,10 +910,11 @@ def expand_dims(a, axis):
 
     """
 
-    usm_array = dpnp.get_usm_ndarray(a)
-    return dpnp_array._create_from_usm_ndarray(
-        dpt.expand_dims(usm_array, axis=axis)
-    )
+    usm_a = dpnp.get_usm_ndarray(a)
+    usm_res = dpt.expand_dims(usm_a, axis=axis)
+
+    dpnp.synchronize_array_data(usm_res)
+    return dpnp_array._create_from_usm_ndarray(usm_res)
 
 
 def flip(m, axis=None):
@@ -1298,8 +1302,10 @@ def repeat(a, repeats, axis=None):
         a = dpnp.ravel(a)
 
     usm_arr = dpnp.get_usm_ndarray(a)
-    usm_arr = dpt.repeat(usm_arr, repeats, axis=axis)
-    return dpnp_array._create_from_usm_ndarray(usm_arr)
+    usm_res = dpt.repeat(usm_arr, repeats, axis=axis)
+
+    dpnp.synchronize_array_data(usm_res)
+    return dpnp_array._create_from_usm_ndarray(usm_res)
 
 
 def reshape(a, /, newshape, order="C", copy=None):
@@ -1374,9 +1380,11 @@ def reshape(a, /, newshape, order="C", copy=None):
     elif order not in "cfCF":
         raise ValueError(f"order must be one of 'C' or 'F' (got {order})")
 
-    usm_arr = dpnp.get_usm_ndarray(a)
-    usm_arr = dpt.reshape(usm_arr, shape=newshape, order=order, copy=copy)
-    return dpnp_array._create_from_usm_ndarray(usm_arr)
+    usm_a = dpnp.get_usm_ndarray(a)
+    usm_res = dpt.reshape(usm_a, shape=newshape, order=order, copy=copy)
+
+    dpnp.synchronize_array_data(usm_res)
+    return dpnp_array._create_from_usm_ndarray(usm_res)
 
 
 def result_type(*arrays_and_dtypes):
@@ -1483,10 +1491,12 @@ def roll(x, shift, axis=None):
     """
     if axis is None:
         return roll(x.reshape(-1), shift, 0).reshape(x.shape)
-    usm_array = dpnp.get_usm_ndarray(x)
-    return dpnp_array._create_from_usm_ndarray(
-        dpt.roll(usm_array, shift=shift, axis=axis)
-    )
+
+    usm_x = dpnp.get_usm_ndarray(x)
+    usm_res = dpt.roll(usm_x, shift=shift, axis=axis)
+
+    dpnp.synchronize_array_data(usm_res)
+    return dpnp_array._create_from_usm_ndarray(usm_res)
 
 
 def rollaxis(x, axis, start=0):
@@ -1633,10 +1643,11 @@ def squeeze(a, /, axis=None):
 
     """
 
-    usm_array = dpnp.get_usm_ndarray(a)
-    return dpnp_array._create_from_usm_ndarray(
-        dpt.squeeze(usm_array, axis=axis)
-    )
+    usm_a = dpnp.get_usm_ndarray(a)
+    usm_res = dpt.squeeze(usm_a, axis=axis)
+
+    dpnp.synchronize_array_data(usm_res)
+    return dpnp_array._create_from_usm_ndarray(usm_res)
 
 
 def stack(arrays, /, *, axis=0, out=None, dtype=None, casting="same_kind"):
@@ -1714,12 +1725,15 @@ def stack(arrays, /, *, axis=0, out=None, dtype=None, casting="same_kind"):
 
     usm_arrays = [dpnp.get_usm_ndarray(x) for x in arrays]
     usm_res = dpt.stack(usm_arrays, axis=axis)
+
     res = dpnp_array._create_from_usm_ndarray(usm_res)
     if dtype is not None:
         res = res.astype(dtype, casting=casting, copy=False)
     elif out is not None:
         dpnp.copyto(out, res, casting=casting)
         return out
+
+    dpnp.synchronize_array_data(res)
     return res
 
 
@@ -1772,10 +1786,11 @@ def swapaxes(a, axis1, axis2):
 
     """
 
-    usm_array = dpnp.get_usm_ndarray(a)
-    return dpnp_array._create_from_usm_ndarray(
-        dpt.swapaxes(usm_array, axis1=axis1, axis2=axis2)
-    )
+    usm_a = dpnp.get_usm_ndarray(a)
+    usm_res = dpt.swapaxes(usm_a, axis1=axis1, axis2=axis2)
+
+    dpnp.synchronize_array_data(usm_res)
+    return dpnp_array._create_from_usm_ndarray(usm_res)
 
 
 # pylint: disable=invalid-name
@@ -1853,8 +1868,11 @@ def tile(A, reps):
 
     """
 
-    usm_array = dpnp.get_usm_ndarray(A)
-    return dpnp_array._create_from_usm_ndarray(dpt.tile(usm_array, reps))
+    usm_a = dpnp.get_usm_ndarray(A)
+    usm_res = dpt.tile(usm_a, reps)
+
+    dpnp.synchronize_array_data(usm_res)
+    return dpnp_array._create_from_usm_ndarray(usm_res)
 
 
 def transpose(a, axes=None):
diff --git a/tests/test_sycl_queue.py b/tests/test_sycl_queue.py
index 378ecaf9b197..f7c70320dbfb 100644
--- a/tests/test_sycl_queue.py
+++ b/tests/test_sycl_queue.py
@@ -373,18 +373,13 @@ def test_array_creation_load_txt(device):
 
 
 @pytest.mark.parametrize(
-    "device_x",
-    valid_devices,
-    ids=[device.filter_string for device in valid_devices],
-)
-@pytest.mark.parametrize(
-    "device_y",
+    "device",
     valid_devices,
     ids=[device.filter_string for device in valid_devices],
 )
-def test_meshgrid(device_x, device_y):
-    x = dpnp.arange(100, device=device_x)
-    y = dpnp.arange(100, device=device_y)
+def test_meshgrid(device):
+    x = dpnp.arange(100, device=device)
+    y = dpnp.arange(100, device=device)
     z = dpnp.meshgrid(x, y)
     assert_sycl_queue_equal(z[0].sycl_queue, x.sycl_queue)
     assert_sycl_queue_equal(z[1].sycl_queue, y.sycl_queue)

From 2fff1f12fea976836b7a0a66fad4fbf760a578f2 Mon Sep 17 00:00:00 2001
From: Natalia Polina <natalia.polina@intel.com>
Date: Wed, 3 Jul 2024 12:29:39 -0700
Subject: [PATCH 43/49] Clean up legacy array creation and manipulation
 implementation from the backend (#1903)

* Clean up legacy element-wise implementation from the backend

* return legacy copy implementation for partition function

* Apply comments

* Fix pre-commit

* Fix pre-commit

* Clean up legacy array creation implementation from the backend

* Clean-up MACRO_2ARG_2TYPES_LOGIC_OP. Clean-up /backend/include

* Removed backend/examples for removed functions

* address comments

* address comments

---------

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
Co-authored-by: Anton Volkov <antonwolfy@gmail.com>
---
 dpnp/backend/CMakeLists.txt                   |    1 -
 dpnp/backend/examples/example11.cpp           |   85 --
 dpnp/backend/examples/example3.cpp            |   79 --
 dpnp/backend/examples/example7.cpp            |   77 --
 dpnp/backend/examples/example_bs.cpp          |  282 -----
 .../examples/example_experimental_iface.cpp   |   63 -
 .../include/dpnp_gen_2arg_3type_tbl.hpp       |   10 -
 dpnp/backend/include/dpnp_iface.hpp           |  392 ------
 dpnp/backend/include/dpnp_iface_fptr.hpp      |   42 +-
 .../kernels/dpnp_krnl_arraycreation.cpp       | 1128 +----------------
 dpnp/backend/kernels/dpnp_krnl_elemwise.cpp   |   22 -
 dpnp/backend/kernels/dpnp_krnl_indexing.cpp   |   34 -
 .../kernels/dpnp_krnl_manipulation.cpp        |  235 ----
 dpnp/backend/src/dpnp_fptr.hpp                |    1 -
 dpnp/backend/src/dpnp_iface_fptr.cpp          |   40 -
 dpnp/backend/src/queue_sycl.cpp               |   30 -
 dpnp/dpnp_algo/dpnp_algo.pxd                  |    1 -
 dpnp/dpnp_algo/dpnp_algo_mathematical.pxi     |   42 -
 dpnp/dpnp_iface_mathematical.py               |   31 -
 19 files changed, 9 insertions(+), 2586 deletions(-)
 delete mode 100644 dpnp/backend/examples/example11.cpp
 delete mode 100644 dpnp/backend/examples/example3.cpp
 delete mode 100644 dpnp/backend/examples/example7.cpp
 delete mode 100644 dpnp/backend/examples/example_bs.cpp
 delete mode 100644 dpnp/backend/examples/example_experimental_iface.cpp
 delete mode 100644 dpnp/backend/kernels/dpnp_krnl_manipulation.cpp

diff --git a/dpnp/backend/CMakeLists.txt b/dpnp/backend/CMakeLists.txt
index d96320bf0acd..7ed57fd929d8 100644
--- a/dpnp/backend/CMakeLists.txt
+++ b/dpnp/backend/CMakeLists.txt
@@ -30,7 +30,6 @@ set(DPNP_SRC
     kernels/dpnp_krnl_fft.cpp
     kernels/dpnp_krnl_indexing.cpp
     kernels/dpnp_krnl_logic.cpp
-    kernels/dpnp_krnl_manipulation.cpp
     kernels/dpnp_krnl_mathematical.cpp
     kernels/dpnp_krnl_random.cpp
     kernels/dpnp_krnl_reduction.cpp
diff --git a/dpnp/backend/examples/example11.cpp b/dpnp/backend/examples/example11.cpp
deleted file mode 100644
index 3a16991bae66..000000000000
--- a/dpnp/backend/examples/example11.cpp
+++ /dev/null
@@ -1,85 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2016-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-/**
- * Example 11.
- *
- * This example shows simple usage of the DPNP C++ Backend library RNG shuffle
- * function for one and ndim arrays.
- *
- * Possible compile line:
- * g++ -g dpnp/backend/examples/example11.cpp -Idpnp -Idpnp/backend/include
- * -Ldpnp -Wl,-rpath='$ORIGIN'/dpnp -ldpnp_backend_c -o example11
- *
- */
-
-#include <iostream>
-
-#include <dpnp_iface.hpp>
-
-template <typename T>
-void print_dpnp_array(T *arr, size_t size)
-{
-    std::cout << std::endl;
-    for (size_t i = 0; i < size; ++i) {
-        std::cout << arr[i] << ", ";
-    }
-    std::cout << std::endl;
-}
-
-int main(int, char **)
-{
-    // Two cases:
-    // 1) array size = 100, ndim = 1, high_dim_size = 10 (aka ndarray with shape
-    // (100,) ) 2) array size = 100, ndim = 2, high_dim_size = 20 (e.g. ndarray
-    // with shape (20, 5) and len(array) = 20 )
-    const size_t ndim_cases = 2;
-    const size_t itemsize = sizeof(double);
-    const size_t ndim[ndim_cases] = {1, 2};
-    const size_t high_dim_size[ndim_cases] = {100, 20};
-    const size_t size = 100;
-    const size_t seed = 1234;
-
-    // DPNPC dpnp_rng_shuffle_c
-    // DPNPC interface
-    double *array_1 =
-        reinterpret_cast<double *>(dpnp_memory_alloc_c(size * sizeof(double)));
-    for (size_t i = 0; i < ndim_cases; i++) {
-        std::cout << "\nREPRODUCE: DPNPC dpnp_rng_shuffle_c:";
-        std::cout << "\nDIMS: " << ndim[i] << std::endl;
-        // init array 0, 1, 2, 3, 4, 5, 6, ....
-        dpnp_arange_c<double>(0, 1, array_1, size);
-        // print before shuffle
-        std::cout << "\nINPUT array:";
-        print_dpnp_array(array_1, size);
-        dpnp_rng_srand_c(seed);
-        dpnp_rng_shuffle_c<double>(array_1, itemsize, ndim[i], high_dim_size[i],
-                                   size);
-        // print shuffle result
-        std::cout << "\nSHUFFLE INPUT array:";
-        print_dpnp_array(array_1, size);
-    }
-    dpnp_memory_free_c(array_1);
-}
diff --git a/dpnp/backend/examples/example3.cpp b/dpnp/backend/examples/example3.cpp
deleted file mode 100644
index 2d516dc0b8de..000000000000
--- a/dpnp/backend/examples/example3.cpp
+++ /dev/null
@@ -1,79 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2016-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-/**
- * Example 3.
- *
- * This example shows simple usage of the DPNP C++ Backend library
- * to calculate cos of input vector elements
- *
- * Possible compile line:
- * . /opt/intel/oneapi/setvars.sh
- * g++ -g dpnp/backend/examples/example3.cpp -Idpnp -Idpnp/backend/include
- * -Ldpnp -Wl,-rpath='$ORIGIN'/dpnp -ldpnp_backend_c -o example3
- *
- */
-
-#include <iostream>
-
-#include "dpnp_iface.hpp"
-
-int main(int, char **)
-{
-    const size_t size = 256;
-
-    std::cout << "SYCL queue is CPU: " << dpnp_queue_is_cpu_c() << std::endl;
-
-    int *array1 = (int *)dpnp_memory_alloc_c(size * sizeof(int));
-    double *result = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-
-    for (size_t i = 0; i < 10; ++i) {
-        array1[i] = i;
-        result[i] = 0;
-        std::cout << ", " << array1[i];
-    }
-    std::cout << std::endl;
-
-    const long ndim = 1;
-    shape_elem_type *shape = reinterpret_cast<shape_elem_type *>(
-        dpnp_memory_alloc_c(ndim * sizeof(shape_elem_type)));
-    shape[0] = size;
-    shape_elem_type *strides = reinterpret_cast<shape_elem_type *>(
-        dpnp_memory_alloc_c(ndim * sizeof(shape_elem_type)));
-    strides[0] = 1;
-
-    dpnp_cos_c<int, double>(result, size, ndim, shape, strides, array1, size,
-                            ndim, shape, strides, NULL);
-
-    for (size_t i = 0; i < 10; ++i) {
-        std::cout << ", " << result[i];
-    }
-    std::cout << std::endl;
-
-    dpnp_memory_free_c(result);
-    dpnp_memory_free_c(array1);
-
-    return 0;
-}
diff --git a/dpnp/backend/examples/example7.cpp b/dpnp/backend/examples/example7.cpp
deleted file mode 100644
index 49c12c5dd51c..000000000000
--- a/dpnp/backend/examples/example7.cpp
+++ /dev/null
@@ -1,77 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2016-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-/**
- * Example 7.
- *
- * This example shows simple usage of the DPNP C++ Backend library
- * to calculate eigenvalues and eigenvectors of a symmetric matrix
- *
- * Possible compile line:
- * . /opt/intel/oneapi/setvars.sh
- * g++ -g dpnp/backend/examples/example7.cpp -Idpnp -Idpnp/backend/include
- * -Ldpnp -Wl,-rpath='$ORIGIN'/dpnp -ldpnp_backend_c -o example7
- *
- */
-
-#include <iostream>
-
-#include "dpnp_iface.hpp"
-
-int main(int, char **)
-{
-    const size_t size = 2;
-    size_t len = size * size;
-
-    float *array = (float *)dpnp_memory_alloc_c(len * sizeof(float));
-    float *result1 = (float *)dpnp_memory_alloc_c(size * sizeof(float));
-    float *result2 = (float *)dpnp_memory_alloc_c(len * sizeof(float));
-
-    /* init input diagonal array like:
-    1, 0, 0,
-    0, 2, 0,
-    0, 0, 3
-    */
-    for (size_t i = 0; i < len; ++i) {
-        array[i] = 0;
-    }
-    for (size_t i = 0; i < size; ++i) {
-        array[size * i + i] = i + 1;
-    }
-
-    dpnp_eig_c<float, float>(array, result1, result2, size);
-
-    std::cout << "eigen values" << std::endl;
-    for (size_t i = 0; i < size; ++i) {
-        std::cout << result1[i] << ", ";
-    }
-    std::cout << std::endl;
-
-    dpnp_memory_free_c(result2);
-    dpnp_memory_free_c(result1);
-    dpnp_memory_free_c(array);
-
-    return 0;
-}
diff --git a/dpnp/backend/examples/example_bs.cpp b/dpnp/backend/examples/example_bs.cpp
deleted file mode 100644
index c20c6d27e297..000000000000
--- a/dpnp/backend/examples/example_bs.cpp
+++ /dev/null
@@ -1,282 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2016-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-/**
- * Example BS.
- *
- * This example shows simple usage of the DPNP C++ Backend library
- * to calculate black scholes algorithm like in Python version
- *
- * Possible compile line:
- * . /opt/intel/oneapi/setvars.sh
- * g++ -g dpnp/backend/examples/example_bs.cpp -Idpnp -Idpnp/backend/include
- * -Ldpnp -Wl,-rpath='$ORIGIN'/dpnp -ldpnp_backend_c -o example_bs
- */
-
-#include <cmath>
-#include <iostream>
-
-#include "dpnp_iface.hpp"
-
-void black_scholes(double *price,
-                   double *strike,
-                   double *t,
-                   const double rate,
-                   const double vol,
-                   double *call,
-                   double *put,
-                   const size_t size)
-{
-    const size_t ndim = 1;
-    const size_t scalar_size = 1;
-
-    double *mr = (double *)dpnp_memory_alloc_c(1 * sizeof(double));
-    mr[0] = -rate;
-
-    double *vol_vol_two = (double *)dpnp_memory_alloc_c(1 * sizeof(double));
-    vol_vol_two[0] = vol * vol * 2;
-
-    double *quarter = (double *)dpnp_memory_alloc_c(1 * sizeof(double));
-    quarter[0] = 0.25;
-
-    double *one = (double *)dpnp_memory_alloc_c(1 * sizeof(double));
-    one[0] = 1.;
-
-    double *half = (double *)dpnp_memory_alloc_c(1 * sizeof(double));
-    half[0] = 0.5;
-
-    double *P = price;
-    double *S = strike;
-    double *T = t;
-
-    double *p_div_s = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // p_div_s = P / S
-    dpnp_divide_c<double, double, double>(p_div_s, P, size, &size, ndim, S,
-                                          size, &size, ndim, NULL);
-    double *a = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    dpnp_log_c<double, double>(p_div_s, a, size); // a = np.log(p_div_s)
-    dpnp_memory_free_c(p_div_s);
-
-    double *b = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // b = T * mr
-    dpnp_multiply_c<double, double, double>(
-        b, T, size, &size, ndim, mr, scalar_size, &scalar_size, ndim, NULL);
-    dpnp_memory_free_c(mr);
-    double *z = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // z = T * vol_vol_twos
-    dpnp_multiply_c<double, double, double>(z, T, size, &size, ndim,
-                                            vol_vol_two, scalar_size,
-                                            &scalar_size, ndim, NULL);
-    dpnp_memory_free_c(vol_vol_two);
-
-    double *c = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // c = quarters * z
-    dpnp_multiply_c<double, double, double>(c, quarter, scalar_size,
-                                            &scalar_size, ndim, z, size, &size,
-                                            ndim, NULL);
-    dpnp_memory_free_c(quarter);
-
-    double *sqrt_z = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    dpnp_sqrt_c<double, double>(z, sqrt_z, size); // sqrt_z = np.sqrt(z)
-    dpnp_memory_free_c(z);
-    double *y = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // y = ones / np.sqrt(z)
-    dpnp_divide_c<double, double, double>(y, one, scalar_size, &scalar_size,
-                                          ndim, sqrt_z, size, &size, ndim,
-                                          NULL);
-    dpnp_memory_free_c(sqrt_z);
-    dpnp_memory_free_c(one);
-
-    double *a_sub_b = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // a_sub_b = a - b
-    dpnp_subtract_c<double, double, double>(a_sub_b, a, size, &size, ndim, b,
-                                            size, &size, ndim, NULL);
-    dpnp_memory_free_c(a);
-    double *a_sub_b_add_c =
-        (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // a_sub_b_add_c = a_sub_b + c
-    dpnp_add_c<double, double, double>(a_sub_b_add_c, a_sub_b, size, &size,
-                                       ndim, c, size, &size, ndim, NULL);
-    double *w1 = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // w1 = a_sub_b_add_c * y
-    dpnp_multiply_c<double, double, double>(w1, a_sub_b_add_c, size, &size,
-                                            ndim, y, size, &size, ndim, NULL);
-    dpnp_memory_free_c(a_sub_b_add_c);
-
-    double *a_sub_b_sub_c =
-        (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // a_sub_b_sub_c = a_sub_b - c
-    dpnp_subtract_c<double, double, double>(a_sub_b_sub_c, a_sub_b, size, &size,
-                                            ndim, c, size, &size, ndim, NULL);
-    dpnp_memory_free_c(a_sub_b);
-    dpnp_memory_free_c(c);
-    double *w2 = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // w2 = a_sub_b_sub_c * y
-    dpnp_multiply_c<double, double, double>(w2, a_sub_b_sub_c, size, &size,
-                                            ndim, y, size, &size, ndim, NULL);
-    dpnp_memory_free_c(a_sub_b_sub_c);
-    dpnp_memory_free_c(y);
-
-    double *erf_w1 = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    dpnp_erf_c<double>(w1, erf_w1, size); // erf_w1 = np.erf(w1)
-    dpnp_memory_free_c(w1);
-    double *halfs_mul_erf_w1 =
-        (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // halfs_mul_erf_w1 = half * erf_w1
-    dpnp_multiply_c<double, double, double>(halfs_mul_erf_w1, half, scalar_size,
-                                            &scalar_size, ndim, erf_w1, size,
-                                            &size, ndim, NULL);
-    dpnp_memory_free_c(erf_w1);
-    double *d1 = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // d1 = half + halfs_mul_erf_w1
-    dpnp_add_c<double, double, double>(d1, half, scalar_size, &scalar_size,
-                                       ndim, halfs_mul_erf_w1, size, &size,
-                                       ndim, NULL);
-    dpnp_memory_free_c(halfs_mul_erf_w1);
-
-    double *erf_w2 = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    dpnp_erf_c<double>(w2, erf_w2, size); // erf_w2 = np.erf(w2)
-    dpnp_memory_free_c(w2);
-    double *halfs_mul_erf_w2 =
-        (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // halfs_mul_erf_w2 = half * erf_w2
-    dpnp_multiply_c<double, double, double>(halfs_mul_erf_w2, half, scalar_size,
-                                            &scalar_size, ndim, erf_w2, size,
-                                            &size, ndim, NULL);
-    dpnp_memory_free_c(erf_w2);
-    double *d2 = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // d2 = half + halfs_mul_erf_w2
-    dpnp_add_c<double, double, double>(d2, half, scalar_size, &scalar_size,
-                                       ndim, halfs_mul_erf_w2, size, &size,
-                                       ndim, NULL);
-    dpnp_memory_free_c(halfs_mul_erf_w2);
-    dpnp_memory_free_c(half);
-
-    double *exp_b = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    dpnp_exp_c<double, double>(b, exp_b, size); // exp_b = np.exp(b)
-    double *Se = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // Se = exp_b * S
-    dpnp_multiply_c<double, double, double>(Se, exp_b, size, &size, ndim, S,
-                                            size, &size, ndim, NULL);
-    dpnp_memory_free_c(exp_b);
-    dpnp_memory_free_c(b);
-
-    double *P_mul_d1 = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // P_mul_d1 = P * d1
-    dpnp_multiply_c<double, double, double>(P_mul_d1, P, size, &size, ndim, d1,
-                                            size, &size, ndim, NULL);
-    dpnp_memory_free_c(d1);
-    double *Se_mul_d2 = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // Se_mul_d2 = Se * d2
-    dpnp_multiply_c<double, double, double>(Se_mul_d2, Se, size, &size, ndim,
-                                            d2, size, &size, ndim, NULL);
-    dpnp_memory_free_c(d2);
-    double *r = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // r = P_mul_d1 - Se_mul_d2
-    dpnp_subtract_c<double, double, double>(r, P_mul_d1, size, &size, ndim,
-                                            Se_mul_d2, size, &size, ndim, NULL);
-    dpnp_memory_free_c(Se_mul_d2);
-    dpnp_memory_free_c(P_mul_d1);
-
-    dpnp_copyto_c<double, double>(call, r, size); // call[:] = r
-    double *r_sub_P = (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // r_sub_P = r - P
-    dpnp_subtract_c<double, double, double>(r_sub_P, r, size, &size, ndim, P,
-                                            size, &size, ndim, NULL);
-    dpnp_memory_free_c(r);
-    double *r_sub_P_add_Se =
-        (double *)dpnp_memory_alloc_c(size * sizeof(double));
-    // r_sub_P_add_Se = r_sub_P + Se
-    dpnp_add_c<double, double, double>(r_sub_P_add_Se, r_sub_P, size, &size,
-                                       ndim, Se, size, &size, ndim, NULL);
-    dpnp_memory_free_c(r_sub_P);
-    dpnp_memory_free_c(Se);
-    dpnp_copyto_c<double, double>(put, r_sub_P_add_Se,
-                                  size); // put[:] = r_sub_P_add_Se
-    dpnp_memory_free_c(r_sub_P_add_Se);
-}
-
-int main(int, char **)
-{
-    const size_t SIZE = 256;
-
-    const size_t SEED = 7777777;
-    const long PL = 10, PH = 50;
-    const long SL = 10, SH = 50;
-    const long TL = 1, TH = 2;
-    const double RISK_FREE = 0.1;
-    const double VOLATILITY = 0.2;
-
-    std::cout << "SYCL queue is CPU: " << dpnp_queue_is_cpu_c() << std::endl;
-
-    double *price = (double *)dpnp_memory_alloc_c(SIZE * sizeof(double));
-    double *strike = (double *)dpnp_memory_alloc_c(SIZE * sizeof(double));
-    double *t = (double *)dpnp_memory_alloc_c(SIZE * sizeof(double));
-
-    dpnp_rng_srand_c(SEED); // np.random.seed(SEED)
-    dpnp_rng_uniform_c<double>(price, PL, PH,
-                               SIZE); // np.random.uniform(PL, PH, SIZE)
-    dpnp_rng_uniform_c<double>(strike, SL, SH,
-                               SIZE); // np.random.uniform(SL, SH, SIZE)
-    dpnp_rng_uniform_c<double>(t, TL, TH,
-                               SIZE); // np.random.uniform(TL, TH, SIZE)
-
-    double *zero = (double *)dpnp_memory_alloc_c(1 * sizeof(double));
-    zero[0] = 0.;
-
-    double *mone = (double *)dpnp_memory_alloc_c(1 * sizeof(double));
-    mone[0] = -1.;
-
-    double *call = (double *)dpnp_memory_alloc_c(SIZE * sizeof(double));
-    double *put = (double *)dpnp_memory_alloc_c(SIZE * sizeof(double));
-
-    dpnp_full_c<double>(zero, call, SIZE); // np.full(SIZE, 0., dtype=DTYPE)
-    dpnp_full_c<double>(mone, put, SIZE);  // np.full(SIZE, -1., dtype=DTYPE)
-
-    dpnp_memory_free_c(mone);
-    dpnp_memory_free_c(zero);
-
-    black_scholes(price, strike, t, RISK_FREE, VOLATILITY, call, put, SIZE);
-
-    std::cout << "call: ";
-    for (size_t i = 0; i < 10; ++i) {
-        std::cout << call[i] << ", ";
-    }
-    std::cout << "..." << std::endl;
-    std::cout << "put: ";
-    for (size_t i = 0; i < 10; ++i) {
-        std::cout << put[i] << ", ";
-    }
-    std::cout << "..." << std::endl;
-
-    dpnp_memory_free_c(put);
-    dpnp_memory_free_c(call);
-
-    dpnp_memory_free_c(t);
-    dpnp_memory_free_c(strike);
-    dpnp_memory_free_c(price);
-
-    return 0;
-}
diff --git a/dpnp/backend/examples/example_experimental_iface.cpp b/dpnp/backend/examples/example_experimental_iface.cpp
deleted file mode 100644
index 4454a34b9a45..000000000000
--- a/dpnp/backend/examples/example_experimental_iface.cpp
+++ /dev/null
@@ -1,63 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2016-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-/**
- * Example of experimental interface.
- *
- * This example shows how to get a runtime pointer from DPNP C++ Backend library
- *
- * Possible compile line:
- * . /opt/intel/oneapi/setvars.sh
- * g++ -g dpnp/backend/examples/example_experimental_iface.cpp -Idpnp
- * -Idpnp/backend/include -Ldpnp -Wl,-rpath='$ORIGIN'/dpnp -ldpnp_backend_c -o
- * example_experimental_iface
- */
-
-#include <iostream>
-
-#include <dpnp_iface_fptr.hpp>
-// TODO #include <backend/backend_utils.hpp>
-
-int main(int, char **)
-{
-    void *result = get_backend_function_name("dpnp_dot", "float");
-    std::cout << "Result Dot() function pointer (by old interface): " << result
-              << std::endl;
-
-    DPNPFuncData_t dpnp_dot_f = get_dpnp_function_ptr(
-        DPNPFuncName::DPNP_FN_DOT, DPNPFuncType::DPNP_FT_LONG);
-    std::cout << "Result Dot() function pointer: " << dpnp_dot_f.ptr
-              << " with return datatype " << (size_t)dpnp_dot_f.return_type
-              << std::endl;
-
-    DPNPFuncData_t dpnp_add_f = get_dpnp_function_ptr(
-        DPNPFuncName::DPNP_FN_ADD, DPNPFuncType::DPNP_FT_FLOAT,
-        DPNPFuncType::DPNP_FT_INT);
-    std::cout << "Result Add() function pointer: " << dpnp_add_f.ptr
-              << " with return datatype " << (size_t)dpnp_add_f.return_type
-              << std::endl;
-
-    return 0;
-}
diff --git a/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp b/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp
index dcec3f8192bb..e5a2c924653a 100644
--- a/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp
+++ b/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp
@@ -140,14 +140,4 @@ MACRO_2ARG_3TYPES_OP(dpnp_multiply_c,
                                         std::complex<float>,
                                         std::complex<double>))
 
-MACRO_2ARG_3TYPES_OP(dpnp_subtract_c,
-                     input1_elem - input2_elem,
-                     x1 - x2,
-                     MACRO_UNPACK_TYPES(bool, std::int32_t, std::int64_t),
-                     oneapi::mkl::vm::sub,
-                     MACRO_UNPACK_TYPES(float,
-                                        double,
-                                        std::complex<float>,
-                                        std::complex<double>))
-
 #undef MACRO_2ARG_3TYPES_OP
diff --git a/dpnp/backend/include/dpnp_iface.hpp b/dpnp/backend/include/dpnp_iface.hpp
index 324e7a612b1a..0fc5595041c6 100644
--- a/dpnp/backend/include/dpnp_iface.hpp
+++ b/dpnp/backend/include/dpnp_iface.hpp
@@ -176,74 +176,6 @@ template <typename _DataType, typename _ResultType>
 INP_DLLEXPORT void
     dpnp_any_c(const void *array, void *result, const size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief Array initialization
- *
- * Input array, step based, initialization procedure.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  start               Start of initialization sequence
- * @param [in]  step                Step for initialization sequence
- * @param [out] result1             Output array.
- * @param [in]  size                Number of elements in input arrays.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_arange_c(DPCTLSyclQueueRef q_ref,
-                  size_t start,
-                  size_t step,
-                  void *result1,
-                  size_t size,
-                  const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void
-    dpnp_arange_c(size_t start, size_t step, void *result1, size_t size);
-
-/**
- * @ingroup BACKEND_API
- * @brief Implementation of full function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array_in            Input one-element array.
- * @param [out] result              Output array.
- * @param [in]  size                Number of elements in the output array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_full_c(DPCTLSyclQueueRef q_ref,
-                void *array_in,
-                void *result,
-                const size_t size,
-                const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_full_c(void *array_in, void *result, const size_t size);
-
-/**
- * @ingroup BACKEND_API
- * @brief Implementation of full_like function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array_in            Input one-element array.
- * @param [out] result              Output array.
- * @param [in]  size                Number of elements in the output array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_full_like_c(DPCTLSyclQueueRef q_ref,
-                     void *array_in,
-                     void *result,
-                     size_t size,
-                     const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_full_like_c(void *array_in, void *result, size_t size);
-
 /**
  * @ingroup BACKEND_API
  * @brief Compute the variance along the specified axis, while ignoring NaNs.
@@ -591,56 +523,6 @@ INP_DLLEXPORT void dpnp_prod_c(void *result_out,
                                const void *initial,
                                const long *where);
 
-/**
- * @ingroup BACKEND_API
- * @brief Range of values (maximum - minimum) along an axis.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [out] result_out          Output array.
- * @param [in]  result_size         Size of output array.
- * @param [in]  result_ndim         Number of output array dimensions.
- * @param [in]  result_shape        Shape of output array.
- * @param [in]  result_strides      Strides of output array.
- * @param [in]  input_in            First input array.
- * @param [in]  input_size          Size of first input array.
- * @param [in]  input_ndim          Number of first input array dimensions.
- * @param [in]  input_shape         Shape of first input array.
- * @param [in]  input_strides       Strides of first input array.
- * @param [in]  axis                Axis.
- * @param [in]  naxis               Number of elements in axis.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_ptp_c(DPCTLSyclQueueRef q_ref,
-               void *result_out,
-               const size_t result_size,
-               const size_t result_ndim,
-               const shape_elem_type *result_shape,
-               const shape_elem_type *result_strides,
-               const void *input_in,
-               const size_t input_size,
-               const size_t input_ndim,
-               const shape_elem_type *input_shape,
-               const shape_elem_type *input_strides,
-               const shape_elem_type *axis,
-               const size_t naxis,
-               const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_ptp_c(void *result_out,
-                              const size_t result_size,
-                              const size_t result_ndim,
-                              const shape_elem_type *result_shape,
-                              const shape_elem_type *result_strides,
-                              const void *input_in,
-                              const size_t input_size,
-                              const size_t input_ndim,
-                              const shape_elem_type *input_shape,
-                              const shape_elem_type *input_strides,
-                              const shape_elem_type *axis,
-                              const size_t naxis);
-
 /**
  * @ingroup BACKEND_API
  * @brief Replaces specified elements of an array with given values.
@@ -715,29 +597,7 @@ INP_DLLEXPORT void dpnp_put_along_axis_c(void *arr_in,
 
 /**
  * @ingroup BACKEND_API
- * @brief Return a 2-D array with ones on the diagonal and zeros elsewhere.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [out] result              The eigenvalues, each repeated according to
- * its multiplicity
- * @param [in]  k                   Index of the diagonal
- * @param [in]  shape               Shape of result
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_eye_c(DPCTLSyclQueueRef q_ref,
-               void *result,
-               int k,
-               const shape_elem_type *res_shape,
-               const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void
-    dpnp_eye_c(void *result, int k, const shape_elem_type *res_shape);
 
-/**
- * @ingroup BACKEND_API
  * @brief math library implementation of argsort function
  *
  * @param [in]  q_ref               Reference to SYCL queue.
@@ -916,60 +776,6 @@ INP_DLLEXPORT void dpnp_choose_c(void *result1,
                                  size_t choices_size,
                                  size_t choice_size);
 
-/**
- * @ingroup BACKEND_API
- * @brief Extract a diagonal or construct a diagonal array.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array               Input array with data.
- * @param [out] result              Output array.
- * @param [in]  k                   Diagonal in question.
- * @param [in]  shape               Shape of input array.
- * @param [in]  res_shape           Shape of result array.
- * @param [in]  ndim                Number of elements in shape of input array.
- * @param [in]  res_ndim            Number of elements in shape of result array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_diag_c(DPCTLSyclQueueRef q_ref,
-                void *array,
-                void *result,
-                const int k,
-                shape_elem_type *shape,
-                shape_elem_type *res_shape,
-                const size_t ndim,
-                const size_t res_ndim,
-                const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_diag_c(void *array,
-                               void *result,
-                               const int k,
-                               shape_elem_type *shape,
-                               shape_elem_type *res_shape,
-                               const size_t ndim,
-                               const size_t res_ndim);
-
-/**
- * @ingroup BACKEND_API
- * @brief Return the indices to access the main diagonal of an array.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [out] result1             Output array.
- * @param [in]  size                Size of array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_diag_indices_c(DPCTLSyclQueueRef q_ref,
-                        void *result1,
-                        size_t size,
-                        const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_diag_indices_c(void *result1, size_t size);
-
 /**
  * @ingroup BACKEND_API
  * @brief math library implementation of diagonal function
@@ -1006,26 +812,6 @@ INP_DLLEXPORT void dpnp_diagonal_c(void *array1_in,
                                    shape_elem_type *res_shape,
                                    const size_t res_ndim);
 
-/**
- * @ingroup BACKEND_API
- * @brief Implementation of identity function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [out] result1             Output array.
- * @param [in]  n                   Number of rows (and columns) in n x n
- * output.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_identity_c(DPCTLSyclQueueRef q_ref,
-                    void *result1,
-                    const size_t n,
-                    const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_identity_c(void *result1, const size_t n);
-
 /**
  * @ingroup BACKEND_API
  * @brief implementation of creating filled with value array function
@@ -1287,128 +1073,6 @@ INP_DLLEXPORT void dpnp_take_c(void *array,
                                void *result,
                                size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief math library implementation of trace function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array               Input array with data.
- * @param [out] result              Output array.
- * @param [in]  shape               Shape of input array.
- * @param [in]  ndim                Number of elements in array.shape.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_trace_c(DPCTLSyclQueueRef q_ref,
-                 const void *array,
-                 void *result,
-                 const shape_elem_type *shape,
-                 const size_t ndim,
-                 const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType, typename _ResultType>
-INP_DLLEXPORT void dpnp_trace_c(const void *array,
-                                void *result,
-                                const shape_elem_type *shape,
-                                const size_t ndim);
-
-/**
- * @ingroup BACKEND_API
- * @brief An array with ones at and below the given diagonal and zeros
- * elsewhere.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [out] result              Output array.
- * @param [in]  N                   Number of rows in the array.
- * @param [in]  M                   Number of columns in the array.
- * @param [in]  k                   The sub-diagonal at and below which the
- * array is filled.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_tri_c(DPCTLSyclQueueRef q_ref,
-               void *result,
-               const size_t N,
-               const size_t M,
-               const int k,
-               const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void
-    dpnp_tri_c(void *result, const size_t N, const size_t M, const int k);
-
-/**
- * @ingroup BACKEND_API
- * @brief Lower triangle of an array.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array               Input array with data.
- * @param [out] result              Output array.
- * @param [in]  k                   Diagonal above which to zero elements.
- * @param [in]  shape               Shape of input array.
- * @param [in]  res_shape           Shape of result array.
- * @param [in]  ndim                Number of elements in array.shape.
- * @param [in]  res_ndim            Number of elements in res_shape.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_tril_c(DPCTLSyclQueueRef q_ref,
-                void *array,
-                void *result,
-                const int k,
-                shape_elem_type *shape,
-                shape_elem_type *res_shape,
-                const size_t ndim,
-                const size_t res_ndim,
-                const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_tril_c(void *array,
-                               void *result,
-                               const int k,
-                               shape_elem_type *shape,
-                               shape_elem_type *res_shape,
-                               const size_t ndim,
-                               const size_t res_ndim);
-
-/**
- * @ingroup BACKEND_API
- * @brief Upper triangle of an array.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array               Input array with data.
- * @param [out] result              Output array.
- * @param [in]  k                   Diagonal above which to zero elements.
- * @param [in]  shape               Shape of input array.
- * @param [in]  res_shape           Shape of result array.
- * @param [in]  ndim                Number of elements in array.shape.
- * @param [in]  res_ndim            Number of elements in res_shape.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_triu_c(DPCTLSyclQueueRef q_ref,
-                void *array,
-                void *result,
-                const int k,
-                shape_elem_type *shape,
-                shape_elem_type *res_shape,
-                const size_t ndim,
-                const size_t res_ndim,
-                const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_triu_c(void *array,
-                               void *result,
-                               const int k,
-                               shape_elem_type *shape,
-                               shape_elem_type *res_shape,
-                               const size_t ndim,
-                               const size_t res_ndim);
-
 /**
  * @ingroup BACKEND_API
  * @brief math library implementation of var function
@@ -1609,62 +1273,6 @@ INP_DLLEXPORT DPCTLSyclEventRef
 template <typename _DataType>
 INP_DLLEXPORT void dpnp_ones_like_c(void *result, size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief repeat elements of an array.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array_in            Input array.
- * @param [out] result              Output array.
- * @param [in]  repeats             The number of repetitions for each element.
- * @param [in]  size                Number of elements in input arrays.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_repeat_c(DPCTLSyclQueueRef q_ref,
-                  const void *array_in,
-                  void *result,
-                  const size_t repeats,
-                  const size_t size,
-                  const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_repeat_c(const void *array_in,
-                                 void *result,
-                                 const size_t repeats,
-                                 const size_t size);
-
-/**
- * @ingroup BACKEND_API
- * @brief Implementation of vander function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array_in            Input array.
- * @param [out] result              Output array.
- * @param [in]  size_in             Number of elements in the input array.
- * @param [in]  N                   Number of columns in the output.
- * @param [in]  increasing          Order of the powers of the columns.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- *
- */
-template <typename _DataType_input, typename _DataType_output>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_vander_c(DPCTLSyclQueueRef q_ref,
-                  const void *array1_in,
-                  void *result1,
-                  const size_t size_in,
-                  const size_t N,
-                  const int increasing,
-                  const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType_input, typename _DataType_output>
-INP_DLLEXPORT void dpnp_vander_c(const void *array1_in,
-                                 void *result1,
-                                 const size_t size_in,
-                                 const size_t N,
-                                 const int increasing);
-
 /**
  * @ingroup BACKEND_API
  * @brief Implementation of zeros function
diff --git a/dpnp/backend/include/dpnp_iface_fptr.hpp b/dpnp/backend/include/dpnp_iface_fptr.hpp
index a39174931fec..d62e5998583e 100644
--- a/dpnp/backend/include/dpnp_iface_fptr.hpp
+++ b/dpnp/backend/include/dpnp_iface_fptr.hpp
@@ -64,7 +64,6 @@ enum class DPNPFuncName : size_t
     DPNP_FN_ALLCLOSE_EXT,  /**< Used in numpy.allclose() impl, requires extra
                               parameters */
     DPNP_FN_ANY,           /**< Used in numpy.any() impl  */
-    DPNP_FN_ARANGE,        /**< Used in numpy.arange() impl  */
     DPNP_FN_ARGMAX,        /**< Used in numpy.argmax() impl  */
     DPNP_FN_ARGMIN,        /**< Used in numpy.argmin() impl  */
     DPNP_FN_ARGSORT,       /**< Used in numpy.argsort() impl  */
@@ -82,8 +81,6 @@ enum class DPNPFuncName : size_t
     DPNP_FN_DEGREES,       /**< Used in numpy.degrees() impl  */
     DPNP_FN_DEGREES_EXT,   /**< Used in numpy.degrees() impl, requires extra
                               parameters */
-    DPNP_FN_DIAG,          /**< Used in numpy.diag() impl  */
-    DPNP_FN_DIAG_INDICES,  /**< Used in numpy.diag_indices() impl  */
     DPNP_FN_DIAGONAL,      /**< Used in numpy.diagonal() impl  */
     DPNP_FN_DOT,           /**< Used in numpy.dot() impl  */
     DPNP_FN_DOT_EXT, /**< Used in numpy.dot() impl, requires extra parameters */
@@ -93,7 +90,6 @@ enum class DPNPFuncName : size_t
     DPNP_FN_ERF,           /**< Used in scipy.special.erf impl  */
     DPNP_FN_ERF_EXT,       /**< Used in scipy.special.erf impl, requires extra
                               parameters */
-    DPNP_FN_EYE,           /**< Used in numpy.eye() impl  */
     DPNP_FN_FFT_FFT,       /**< Used in numpy.fft.fft() impl  */
     DPNP_FN_FFT_FFT_EXT,   /**< Used in numpy.fft.fft() impl, requires extra
                               parameters */
@@ -101,14 +97,10 @@ enum class DPNPFuncName : size_t
     DPNP_FN_FFT_RFFT_EXT,  /**< Used in numpy.fft.rfft() impl, requires extra
                               parameters */
     DPNP_FN_FILL_DIAGONAL, /**< Used in numpy.fill_diagonal() impl  */
-    DPNP_FN_FULL,          /**< Used in numpy.full() impl  */
-    DPNP_FN_FULL_LIKE,     /**< Used in numpy.full_like() impl  */
-    DPNP_FN_IDENTITY,      /**< Used in numpy.identity() impl  */
     DPNP_FN_INITVAL, /**< Used in numpy ones, ones_like, zeros, zeros_like impls
                       */
     DPNP_FN_INITVAL_EXT, /**< Used in numpy ones, ones_like, zeros, zeros_like
                             impls  */
-    DPNP_FN_INVERT,      /**< Used in numpy.invert() impl  */
     DPNP_FN_MAX,         /**< Used in numpy.max() impl  */
     DPNP_FN_MAXIMUM_EXT, /**< Used in numpy.fmax() impl , requires extra
                             parameters */
@@ -132,13 +124,11 @@ enum class DPNPFuncName : size_t
                                parameters */
     DPNP_FN_PLACE,          /**< Used in numpy.place() impl  */
     DPNP_FN_PROD,           /**< Used in numpy.prod() impl  */
-    DPNP_FN_PTP,            /**< Used in numpy.ptp() impl  */
     DPNP_FN_PUT,            /**< Used in numpy.put() impl  */
     DPNP_FN_PUT_ALONG_AXIS, /**< Used in numpy.put_along_axis() impl  */
     DPNP_FN_RADIANS,        /**< Used in numpy.radians() impl  */
     DPNP_FN_RADIANS_EXT,    /**< Used in numpy.radians() impl, requires extra
                                parameters */
-    DPNP_FN_REPEAT,         /**< Used in numpy.repeat() impl  */
     DPNP_FN_RNG_BETA,       /**< Used in numpy.random.beta() impl  */
     DPNP_FN_RNG_BETA_EXT, /**< Used in numpy.random.beta() impl, requires extra
                              parameters */
@@ -262,22 +252,12 @@ enum class DPNPFuncName : size_t
     DPNP_FN_SQRT_EXT, /**< Used in numpy.sqrt() impl, requires extra parameters
                        */
     DPNP_FN_STD,      /**< Used in numpy.std() impl  */
-    DPNP_FN_SUBTRACT_EXT, /**< Used in numpy.subtract() impl, requires extra
-                             parameters */
-    DPNP_FN_SUM,          /**< Used in numpy.sum() impl  */
-    DPNP_FN_TAKE,         /**< Used in numpy.take() impl  */
-    DPNP_FN_TRANSPOSE,    /**< Used in numpy.transpose() impl  */
-    DPNP_FN_TRACE,        /**< Used in numpy.trace() impl  */
-    DPNP_FN_TRAPZ_EXT,    /**< Used in numpy.trapz() impl, requires extra
-                             parameters */
-    DPNP_FN_TRI,          /**< Used in numpy.tri() impl  */
-    DPNP_FN_TRIL,         /**< Used in numpy.tril() impl  */
-    DPNP_FN_TRIU,         /**< Used in numpy.triu() impl  */
-    DPNP_FN_VANDER,       /**< Used in numpy.vander() impl  */
-    DPNP_FN_VAR,          /**< Used in numpy.var() impl  */
-    DPNP_FN_ZEROS,        /**< Used in numpy.zeros() impl */
-    DPNP_FN_ZEROS_LIKE,   /**< Used in numpy.zeros_like() impl */
-    DPNP_FN_LAST,         /**< The latest element of the enumeration */
+    DPNP_FN_SUM,      /**< Used in numpy.sum() impl  */
+    DPNP_FN_TAKE,     /**< Used in numpy.take() impl  */
+    DPNP_FN_VAR,      /**< Used in numpy.var() impl  */
+    DPNP_FN_ZEROS,    /**< Used in numpy.zeros() impl */
+    DPNP_FN_ZEROS_LIKE, /**< Used in numpy.zeros_like() impl */
+    DPNP_FN_LAST,       /**< The latest element of the enumeration */
 };
 
 /**
@@ -381,14 +361,4 @@ void *get_dpnp_function_ptr1(
     DPNPFuncType first_type,
     DPNPFuncType second_type = DPNPFuncType::DPNP_FT_NONE);
 
-/**
- * DEPRECATED.
- * Experimental interface. DO NOT USE IT!
- *
- * parameter @ref type_name will be converted into var_args or char *[] with
- * extra length parameter
- */
-INP_DLLEXPORT
-void *get_backend_function_name(const char *func_name, const char *type_name);
-
 #endif // BACKEND_IFACE_FPTR_H
diff --git a/dpnp/backend/kernels/dpnp_krnl_arraycreation.cpp b/dpnp/backend/kernels/dpnp_krnl_arraycreation.cpp
index 175eb3d76987..ebcffa944c04 100644
--- a/dpnp/backend/kernels/dpnp_krnl_arraycreation.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_arraycreation.cpp
@@ -31,355 +31,6 @@
 #include "dpnpc_memory_adapter.hpp"
 #include "queue_sycl.hpp"
 
-template <typename _KernelNameSpecialization>
-class dpnp_arange_c_kernel;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_arange_c(DPCTLSyclQueueRef q_ref,
-                                size_t start,
-                                size_t step,
-                                void *result1,
-                                size_t size,
-                                const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // parameter `size` used instead `stop` to avoid dependency on array length
-    // calculation algorithm
-    // TODO: floating point (and negatives) types from `start` and `step`
-
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    validate_type_for_device<_DataType>(q);
-
-    _DataType *result = reinterpret_cast<_DataType *>(result1);
-
-    sycl::range<1> gws(size);
-    auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-        size_t i = global_id[0];
-
-        result[i] = start + i * step;
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_arange_c_kernel<_DataType>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    event = q.submit(kernel_func);
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_arange_c(size_t start, size_t step, void *result1, size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_arange_c<_DataType>(
-        q_ref, start, step, result1, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_arange_default_c)(size_t, size_t, void *, size_t) =
-    dpnp_arange_c<_DataType>;
-
-// Explicit instantiation of the function, since dpnp_arange_c() is used by
-// other template functions, but implicit instantiation is not applied anymore.
-template DPCTLSyclEventRef dpnp_arange_c<int32_t>(DPCTLSyclQueueRef,
-                                                  size_t,
-                                                  size_t,
-                                                  void *,
-                                                  size_t,
-                                                  const DPCTLEventVectorRef);
-
-template DPCTLSyclEventRef dpnp_arange_c<int64_t>(DPCTLSyclQueueRef,
-                                                  size_t,
-                                                  size_t,
-                                                  void *,
-                                                  size_t,
-                                                  const DPCTLEventVectorRef);
-
-template DPCTLSyclEventRef dpnp_arange_c<float>(DPCTLSyclQueueRef,
-                                                size_t,
-                                                size_t,
-                                                void *,
-                                                size_t,
-                                                const DPCTLEventVectorRef);
-
-template DPCTLSyclEventRef dpnp_arange_c<double>(DPCTLSyclQueueRef,
-                                                 size_t,
-                                                 size_t,
-                                                 void *,
-                                                 size_t,
-                                                 const DPCTLEventVectorRef);
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_diag_c(DPCTLSyclQueueRef q_ref,
-                              void *v_in,
-                              void *result1,
-                              const int k,
-                              shape_elem_type *shape,
-                              shape_elem_type *res_shape,
-                              const size_t ndim,
-                              const size_t res_ndim,
-                              const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)res_ndim;
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    validate_type_for_device<_DataType>(q);
-
-    const size_t input1_size = std::accumulate(
-        shape, shape + ndim, 1, std::multiplies<shape_elem_type>());
-    const size_t result_size = std::accumulate(
-        res_shape, res_shape + res_ndim, 1, std::multiplies<shape_elem_type>());
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, v_in, input1_size, true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, result1, result_size, true,
-                                            true);
-    _DataType *v = input1_ptr.get_ptr();
-    _DataType *result = result_ptr.get_ptr();
-
-    size_t init0 = std::max(0, -k);
-    size_t init1 = std::max(0, k);
-
-    if (ndim == 1) {
-        for (size_t i = 0; i < static_cast<size_t>(shape[0]); ++i) {
-            size_t ind = (init0 + i) * res_shape[1] + init1 + i;
-            result[ind] = v[i];
-        }
-    }
-    else {
-        for (size_t i = 0; i < static_cast<size_t>(res_shape[0]); ++i) {
-            size_t ind = (init0 + i) * shape[1] + init1 + i;
-            result[i] = v[ind];
-        }
-    }
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_diag_c(void *v_in,
-                 void *result1,
-                 const int k,
-                 shape_elem_type *shape,
-                 shape_elem_type *res_shape,
-                 const size_t ndim,
-                 const size_t res_ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_diag_c<_DataType>(q_ref, v_in, result1, k, shape, res_shape, ndim,
-                               res_ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_diag_default_c)(void *,
-                            void *,
-                            const int,
-                            shape_elem_type *,
-                            shape_elem_type *,
-                            const size_t,
-                            const size_t) = dpnp_diag_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_eye_c(DPCTLSyclQueueRef q_ref,
-                             void *result1,
-                             int k,
-                             const shape_elem_type *res_shape,
-                             const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (result1 == nullptr) {
-        return event_ref;
-    }
-
-    if (res_shape == nullptr) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    validate_type_for_device<_DataType>(q);
-
-    size_t result_size = res_shape[0] * res_shape[1];
-
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, result1, result_size, true,
-                                            true);
-    _DataType *result = result_ptr.get_ptr();
-
-    int diag_val_;
-    diag_val_ = std::min((int)res_shape[0], (int)res_shape[1]);
-    diag_val_ = std::min(diag_val_, ((int)res_shape[0] + k));
-    diag_val_ = std::min(diag_val_, ((int)res_shape[1] - k));
-
-    size_t diag_val = (diag_val_ < 0) ? 0 : (size_t)diag_val_;
-
-    for (size_t i = 0; i < result_size; ++i) {
-        result[i] = 0;
-        for (size_t j = 0; j < diag_val; ++j) {
-            size_t ind = (k >= 0) ? (j * res_shape[1] + j + k)
-                                  : (j - k) * res_shape[1] + j;
-            if (i == ind) {
-                result[i] = 1;
-                break;
-            }
-        }
-    }
-
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_eye_c(void *result1, int k, const shape_elem_type *res_shape)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_eye_c<_DataType>(q_ref, result1, k, res_shape, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_eye_default_c)(void *,
-                           int,
-                           const shape_elem_type *) = dpnp_eye_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_full_c(DPCTLSyclQueueRef q_ref,
-                              void *array_in,
-                              void *result,
-                              const size_t size,
-                              const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    return dpnp_initval_c<_DataType>(q_ref, result, array_in, size,
-                                     dep_event_vec_ref);
-}
-
-template <typename _DataType>
-void dpnp_full_c(void *array_in, void *result, const size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_full_c<_DataType>(
-        q_ref, array_in, result, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_full_default_c)(void *,
-                            void *,
-                            const size_t) = dpnp_full_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_full_like_c(DPCTLSyclQueueRef q_ref,
-                                   void *array_in,
-                                   void *result,
-                                   const size_t size,
-                                   const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    return dpnp_full_c<_DataType>(q_ref, array_in, result, size,
-                                  dep_event_vec_ref);
-}
-
-template <typename _DataType>
-void dpnp_full_like_c(void *array_in, void *result, const size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_full_like_c<_DataType>(
-        q_ref, array_in, result, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_full_like_default_c)(void *,
-                                 void *,
-                                 const size_t) = dpnp_full_like_c<_DataType>;
-
-template <typename _KernelNameSpecialization>
-class dpnp_identity_c_kernel;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_identity_c(DPCTLSyclQueueRef q_ref,
-                                  void *result1,
-                                  const size_t n,
-                                  const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (n == 0) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    validate_type_for_device<_DataType>(q);
-
-    _DataType *result = static_cast<_DataType *>(result1);
-
-    sycl::range<2> gws(n, n);
-    auto kernel_parallel_for_func = [=](sycl::id<2> global_id) {
-        size_t i = global_id[0];
-        size_t j = global_id[1];
-        result[i * n + j] = i == j;
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_identity_c_kernel<_DataType>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    event = q.submit(kernel_func);
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_identity_c(void *result1, const size_t n)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_identity_c<_DataType>(q_ref, result1, n, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_identity_default_c)(void *,
-                                const size_t) = dpnp_identity_c<_DataType>;
-
 template <typename _DataType>
 class dpnp_ones_c_kernel;
 
@@ -442,632 +93,6 @@ void dpnp_ones_like_c(void *result, size_t size)
 template <typename _DataType>
 void (*dpnp_ones_like_default_c)(void *, size_t) = dpnp_ones_like_c<_DataType>;
 
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_ptp_c(DPCTLSyclQueueRef q_ref,
-                             void *result1_out,
-                             const size_t result_size,
-                             const size_t result_ndim,
-                             const shape_elem_type *result_shape,
-                             const shape_elem_type *result_strides,
-                             const void *input1_in,
-                             const size_t input_size,
-                             const size_t input_ndim,
-                             const shape_elem_type *input_shape,
-                             const shape_elem_type *input_strides,
-                             const shape_elem_type *axis,
-                             const size_t naxis,
-                             const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)result_strides;
-    (void)input_strides;
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-    DPCTLSyclEventRef e1_ref = nullptr;
-    DPCTLSyclEventRef e2_ref = nullptr;
-    DPCTLSyclEventRef e3_ref = nullptr;
-
-    if ((input1_in == nullptr) || (result1_out == nullptr)) {
-        return event_ref;
-    }
-
-    if (input_ndim < 1) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    validate_type_for_device<_DataType>(q);
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, input1_in, input_size, true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, result1_out, result_size,
-                                            false, true);
-    _DataType *arr = input1_ptr.get_ptr();
-    _DataType *result = result_ptr.get_ptr();
-
-    _DataType *min_arr = reinterpret_cast<_DataType *>(
-        sycl::malloc_shared(result_size * sizeof(_DataType), q));
-    _DataType *max_arr = reinterpret_cast<_DataType *>(
-        sycl::malloc_shared(result_size * sizeof(_DataType), q));
-
-    e1_ref = dpnp_min_c<_DataType>(q_ref, arr, min_arr, result_size,
-                                   input_shape, input_ndim, axis, naxis, NULL);
-    e2_ref = dpnp_max_c<_DataType>(q_ref, arr, max_arr, result_size,
-                                   input_shape, input_ndim, axis, naxis, NULL);
-
-    shape_elem_type *_strides = reinterpret_cast<shape_elem_type *>(
-        sycl::malloc_shared(result_ndim * sizeof(shape_elem_type), q));
-    get_shape_offsets_inkernel(result_shape, result_ndim, _strides);
-
-    e3_ref = dpnp_subtract_c<_DataType, _DataType, _DataType>(
-        q_ref, result, result_size, result_ndim, result_shape, result_strides,
-        max_arr, result_size, result_ndim, result_shape, _strides, min_arr,
-        result_size, result_ndim, result_shape, _strides, NULL, NULL);
-
-    DPCTLEvent_Wait(e1_ref);
-    DPCTLEvent_Wait(e2_ref);
-    DPCTLEvent_Wait(e3_ref);
-    DPCTLEvent_Delete(e1_ref);
-    DPCTLEvent_Delete(e2_ref);
-    DPCTLEvent_Delete(e3_ref);
-
-    sycl::free(min_arr, q);
-    sycl::free(max_arr, q);
-    sycl::free(_strides, q);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_ptp_c(void *result1_out,
-                const size_t result_size,
-                const size_t result_ndim,
-                const shape_elem_type *result_shape,
-                const shape_elem_type *result_strides,
-                const void *input1_in,
-                const size_t input_size,
-                const size_t input_ndim,
-                const shape_elem_type *input_shape,
-                const shape_elem_type *input_strides,
-                const shape_elem_type *axis,
-                const size_t naxis)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_ptp_c<_DataType>(
-        q_ref, result1_out, result_size, result_ndim, result_shape,
-        result_strides, input1_in, input_size, input_ndim, input_shape,
-        input_strides, axis, naxis, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_ptp_default_c)(void *,
-                           const size_t,
-                           const size_t,
-                           const shape_elem_type *,
-                           const shape_elem_type *,
-                           const void *,
-                           const size_t,
-                           const size_t,
-                           const shape_elem_type *,
-                           const shape_elem_type *,
-                           const shape_elem_type *,
-                           const size_t) = dpnp_ptp_c<_DataType>;
-
-template <typename _DataType_input, typename _DataType_output>
-DPCTLSyclEventRef dpnp_vander_c(DPCTLSyclQueueRef q_ref,
-                                const void *array1_in,
-                                void *result1,
-                                const size_t size_in,
-                                const size_t N,
-                                const int increasing,
-                                const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if ((array1_in == nullptr) || (result1 == nullptr))
-        return event_ref;
-
-    if (!size_in || !N)
-        return event_ref;
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    validate_type_for_device<_DataType_input>(q);
-    validate_type_for_device<_DataType_output>(q);
-
-    DPNPC_ptr_adapter<_DataType_input> input1_ptr(q_ref, array1_in, size_in,
-                                                  true);
-    DPNPC_ptr_adapter<_DataType_output> result_ptr(q_ref, result1, size_in * N,
-                                                   true, true);
-    const _DataType_input *array_in = input1_ptr.get_ptr();
-    _DataType_output *result = result_ptr.get_ptr();
-
-    if (N == 1) {
-        return dpnp_ones_c<_DataType_output>(q_ref, result, size_in,
-                                             dep_event_vec_ref);
-    }
-
-    if (increasing) {
-        for (size_t i = 0; i < size_in; ++i) {
-            result[i * N] = 1;
-        }
-        for (size_t i = 1; i < N; ++i) {
-            for (size_t j = 0; j < size_in; ++j) {
-                result[j * N + i] = result[j * N + i - 1] * array_in[j];
-            }
-        }
-    }
-    else {
-        for (size_t i = 0; i < size_in; ++i) {
-            result[i * N + N - 1] = 1;
-        }
-        for (size_t i = N - 2; i > 0; --i) {
-            for (size_t j = 0; j < size_in; ++j) {
-                result[j * N + i] = result[j * N + i + 1] * array_in[j];
-            }
-        }
-
-        for (size_t i = 0; i < size_in; ++i) {
-            result[i * N] = result[i * N + 1] * array_in[i];
-        }
-    }
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType_input, typename _DataType_output>
-void dpnp_vander_c(const void *array1_in,
-                   void *result1,
-                   const size_t size_in,
-                   const size_t N,
-                   const int increasing)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_vander_c<_DataType_input, _DataType_output>(
-            q_ref, array1_in, result1, size_in, N, increasing,
-            dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType_input, typename _DataType_output>
-void (*dpnp_vander_default_c)(const void *,
-                              void *,
-                              const size_t,
-                              const size_t,
-                              const int) =
-    dpnp_vander_c<_DataType_input, _DataType_output>;
-
-template <typename _DataType, typename _ResultType>
-class dpnp_trace_c_kernel;
-
-template <typename _DataType, typename _ResultType>
-DPCTLSyclEventRef dpnp_trace_c(DPCTLSyclQueueRef q_ref,
-                               const void *array1_in,
-                               void *result_in,
-                               const shape_elem_type *shape_,
-                               const size_t ndim,
-                               const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!array1_in || !result_in || !shape_ || !ndim) {
-        return event_ref;
-    }
-
-    const size_t last_dim = shape_[ndim - 1];
-    const size_t size = std::accumulate(shape_, shape_ + (ndim - 1), 1,
-                                        std::multiplies<shape_elem_type>());
-    if (!size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    validate_type_for_device<_DataType>(q);
-    validate_type_for_device<_ResultType>(q);
-
-    const _DataType *input = static_cast<const _DataType *>(array1_in);
-    _ResultType *result = static_cast<_ResultType *>(result_in);
-
-    sycl::range<1> gws(size);
-    auto kernel_parallel_for_func = [=](auto index) {
-        size_t i = index[0];
-        _ResultType acc = _ResultType(0);
-
-        for (size_t j = 0; j < last_dim; ++j) {
-            acc += input[i * last_dim + j];
-        }
-
-        result[i] = acc;
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_trace_c_kernel<_DataType, _ResultType>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    auto event = q.submit(kernel_func);
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType, typename _ResultType>
-void dpnp_trace_c(const void *array1_in,
-                  void *result_in,
-                  const shape_elem_type *shape_,
-                  const size_t ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_trace_c<_DataType, _ResultType>(
-        q_ref, array1_in, result_in, shape_, ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType, typename _ResultType>
-void (*dpnp_trace_default_c)(const void *,
-                             void *,
-                             const shape_elem_type *,
-                             const size_t) =
-    dpnp_trace_c<_DataType, _ResultType>;
-
-template <typename _DataType>
-class dpnp_tri_c_kernel;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_tri_c(DPCTLSyclQueueRef q_ref,
-                             void *result1,
-                             const size_t N,
-                             const size_t M,
-                             const int k,
-                             const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    sycl::event event;
-
-    if (!result1 || !N || !M) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    validate_type_for_device<_DataType>(q);
-
-    _DataType *result = static_cast<_DataType *>(result1);
-
-    size_t idx = N * M;
-    sycl::range<1> gws(idx);
-    auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-        size_t ind = global_id[0];
-        size_t i = ind / M;
-        size_t j = ind % M;
-
-        int val = i + k + 1;
-        size_t diag_idx_ = (val > 0) ? (size_t)val : 0;
-        size_t diag_idx = (M < diag_idx_) ? M : diag_idx_;
-
-        if (j < diag_idx) {
-            result[ind] = 1;
-        }
-        else {
-            result[ind] = 0;
-        }
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_tri_c_kernel<_DataType>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    event = q.submit(kernel_func);
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_tri_c(void *result1, const size_t N, const size_t M, const int k)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_tri_c<_DataType>(q_ref, result1, N, M, k, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_tri_default_c)(void *, const size_t, const size_t, const int) =
-    dpnp_tri_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_tril_c(DPCTLSyclQueueRef q_ref,
-                              void *array_in,
-                              void *result1,
-                              const int k,
-                              shape_elem_type *shape,
-                              shape_elem_type *res_shape,
-                              const size_t ndim,
-                              const size_t res_ndim,
-                              const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if ((array_in == nullptr) || (result1 == nullptr)) {
-        return event_ref;
-    }
-
-    if ((shape == nullptr) || (res_shape == nullptr)) {
-        return event_ref;
-    }
-
-    if ((ndim == 0) || (res_ndim == 0)) {
-        return event_ref;
-    }
-
-    const size_t res_size = std::accumulate(res_shape, res_shape + res_ndim, 1,
-                                            std::multiplies<shape_elem_type>());
-    if (res_size == 0) {
-        return event_ref;
-    }
-
-    const size_t input_size = std::accumulate(
-        shape, shape + ndim, 1, std::multiplies<shape_elem_type>());
-    if (input_size == 0) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    validate_type_for_device<_DataType>(q);
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array_in, input_size, true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, result1, res_size, true,
-                                            true);
-    _DataType *array_m = input1_ptr.get_ptr();
-    _DataType *result = result_ptr.get_ptr();
-
-    int *ids = new int[res_ndim];
-
-    if (ndim == 1) {
-        for (size_t i = 0; i < res_size; ++i) {
-            size_t n = res_size;
-            size_t val = i;
-            for (size_t j = 0; j < res_ndim; ++j) {
-                n /= res_shape[j];
-                size_t p = val / n;
-                ids[j] = p;
-                if (p != 0) {
-                    val = val - p * n;
-                }
-            }
-
-            int diag_idx_ =
-                (ids[res_ndim - 2] + k > -1) ? (ids[res_ndim - 2] + k) : -1;
-            int values = res_shape[res_ndim - 1];
-            int diag_idx = (values < diag_idx_) ? values : diag_idx_;
-
-            if (ids[res_ndim - 1] <= diag_idx) {
-                result[i] = array_m[ids[res_ndim - 1]];
-            }
-            else {
-                result[i] = 0;
-            }
-        }
-    }
-    else {
-        for (size_t i = 0; i < res_size; ++i) {
-            size_t n = res_size;
-            size_t val = i;
-            for (size_t j = 0; j < res_ndim; ++j) {
-                n /= res_shape[j];
-                size_t p = val / n;
-                ids[j] = p;
-                if (p != 0) {
-                    val = val - p * n;
-                }
-            }
-
-            int diag_idx_ =
-                (ids[res_ndim - 2] + k > -1) ? (ids[res_ndim - 2] + k) : -1;
-            int values = res_shape[res_ndim - 1];
-            int diag_idx = (values < diag_idx_) ? values : diag_idx_;
-
-            if (ids[res_ndim - 1] <= diag_idx) {
-                result[i] = array_m[i];
-            }
-            else {
-                result[i] = 0;
-            }
-        }
-    }
-
-    delete[] ids;
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_tril_c(void *array_in,
-                 void *result1,
-                 const int k,
-                 shape_elem_type *shape,
-                 shape_elem_type *res_shape,
-                 const size_t ndim,
-                 const size_t res_ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_tril_c<_DataType>(q_ref, array_in, result1, k, shape, res_shape,
-                               ndim, res_ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_tril_default_c)(void *,
-                            void *,
-                            const int,
-                            shape_elem_type *,
-                            shape_elem_type *,
-                            const size_t,
-                            const size_t) = dpnp_tril_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_triu_c(DPCTLSyclQueueRef q_ref,
-                              void *array_in,
-                              void *result1,
-                              const int k,
-                              shape_elem_type *shape,
-                              shape_elem_type *res_shape,
-                              const size_t ndim,
-                              const size_t res_ndim,
-                              const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if ((array_in == nullptr) || (result1 == nullptr)) {
-        return event_ref;
-    }
-
-    if ((shape == nullptr) || (res_shape == nullptr)) {
-        return event_ref;
-    }
-
-    if ((ndim == 0) || (res_ndim == 0)) {
-        return event_ref;
-    }
-
-    const size_t res_size = std::accumulate(res_shape, res_shape + res_ndim, 1,
-                                            std::multiplies<shape_elem_type>());
-    if (res_size == 0) {
-        return event_ref;
-    }
-
-    const size_t input_size = std::accumulate(
-        shape, shape + ndim, 1, std::multiplies<shape_elem_type>());
-    if (input_size == 0) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    validate_type_for_device<_DataType>(q);
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array_in, input_size, true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, result1, res_size, true,
-                                            true);
-    _DataType *array_m = input1_ptr.get_ptr();
-    _DataType *result = result_ptr.get_ptr();
-
-    int *ids = new int[res_ndim];
-
-    if (ndim == 1) {
-        for (size_t i = 0; i < res_size; ++i) {
-            size_t n = res_size;
-            size_t val = i;
-            for (size_t j = 0; j < res_ndim; ++j) {
-                n /= res_shape[j];
-                size_t p = val / n;
-                ids[j] = p;
-                if (p != 0) {
-                    val = val - p * n;
-                }
-            }
-
-            int diag_idx_ =
-                (ids[res_ndim - 2] + k > -1) ? (ids[res_ndim - 2] + k) : -1;
-            int values = res_shape[res_ndim - 1];
-            int diag_idx = (values < diag_idx_) ? values : diag_idx_;
-
-            if (ids[res_ndim - 1] >= diag_idx) {
-                result[i] = array_m[ids[res_ndim - 1]];
-            }
-            else {
-                result[i] = 0;
-            }
-        }
-    }
-    else {
-        for (size_t i = 0; i < res_size; ++i) {
-            size_t n = res_size;
-            size_t val = i;
-            for (size_t j = 0; j < res_ndim; ++j) {
-                n /= res_shape[j];
-                size_t p = val / n;
-                ids[j] = p;
-                if (p != 0) {
-                    val = val - p * n;
-                }
-            }
-
-            int diag_idx_ =
-                (ids[res_ndim - 2] + k > -1) ? (ids[res_ndim - 2] + k) : -1;
-            int values = res_shape[res_ndim - 1];
-            int diag_idx = (values < diag_idx_) ? values : diag_idx_;
-
-            if (ids[res_ndim - 1] >= diag_idx) {
-                result[i] = array_m[i];
-            }
-            else {
-                result[i] = 0;
-            }
-        }
-    }
-
-    delete[] ids;
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_triu_c(void *array_in,
-                 void *result1,
-                 const int k,
-                 shape_elem_type *shape,
-                 shape_elem_type *res_shape,
-                 const size_t ndim,
-                 const size_t res_ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_triu_c<_DataType>(q_ref, array_in, result1, k, shape, res_shape,
-                               ndim, res_ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_triu_default_c)(void *,
-                            void *,
-                            const int,
-                            shape_elem_type *,
-                            shape_elem_type *,
-                            const size_t,
-                            const size_t) = dpnp_triu_c<_DataType>;
-
 template <typename _DataType>
 DPCTLSyclEventRef dpnp_zeros_c(DPCTLSyclQueueRef q_ref,
                                void *result,
@@ -1130,72 +155,7 @@ void (*dpnp_zeros_like_default_c)(void *,
 
 void func_map_init_arraycreation(func_map_t &fmap)
 {
-    fmap[DPNPFuncName::DPNP_FN_ARANGE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_arange_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARANGE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_arange_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_ARANGE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_arange_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_ARANGE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_arange_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_DIAG][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_diag_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIAG][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_diag_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIAG][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_diag_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_DIAG][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_diag_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_EYE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_eye_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_EYE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_eye_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_EYE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_eye_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_EYE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_eye_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_FULL][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_full_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FULL][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_full_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FULL][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_full_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_FULL][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_full_default_c<double>};
-    fmap[DPNPFuncName::DPNP_FN_FULL][eft_BLN][eft_BLN] = {
-        eft_BLN, (void *)dpnp_full_default_c<bool>};
-    fmap[DPNPFuncName::DPNP_FN_FULL][eft_C128][eft_C128] = {
-        eft_C128, (void *)dpnp_full_default_c<std::complex<double>>};
-
-    fmap[DPNPFuncName::DPNP_FN_FULL_LIKE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_full_like_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FULL_LIKE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_full_like_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FULL_LIKE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_full_like_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_FULL_LIKE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_full_like_default_c<double>};
-    fmap[DPNPFuncName::DPNP_FN_FULL_LIKE][eft_BLN][eft_BLN] = {
-        eft_BLN, (void *)dpnp_full_like_default_c<bool>};
-    fmap[DPNPFuncName::DPNP_FN_FULL_LIKE][eft_C128][eft_C128] = {
-        eft_C128, (void *)dpnp_full_like_default_c<std::complex<double>>};
-
-    fmap[DPNPFuncName::DPNP_FN_IDENTITY][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_identity_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_IDENTITY][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_identity_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_IDENTITY][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_identity_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_IDENTITY][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_identity_default_c<double>};
-    fmap[DPNPFuncName::DPNP_FN_IDENTITY][eft_BLN][eft_BLN] = {
-        eft_BLN, (void *)dpnp_identity_default_c<bool>};
-    fmap[DPNPFuncName::DPNP_FN_IDENTITY][eft_C128][eft_C128] = {
-        eft_C128, (void *)dpnp_identity_default_c<std::complex<double>>};
-
+    // Used in dpnp_rng_geometric_c
     fmap[DPNPFuncName::DPNP_FN_ONES][eft_INT][eft_INT] = {
         eft_INT, (void *)dpnp_ones_default_c<int32_t>};
     fmap[DPNPFuncName::DPNP_FN_ONES][eft_LNG][eft_LNG] = {
@@ -1222,90 +182,8 @@ void func_map_init_arraycreation(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_ONES_LIKE][eft_C128][eft_C128] = {
         eft_C128, (void *)dpnp_ones_like_default_c<std::complex<double>>};
 
-    fmap[DPNPFuncName::DPNP_FN_PTP][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_ptp_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_PTP][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_ptp_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_PTP][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_ptp_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_PTP][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_ptp_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_VANDER][eft_INT][eft_INT] = {
-        eft_LNG, (void *)dpnp_vander_default_c<int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_VANDER][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_vander_default_c<int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_VANDER][eft_FLT][eft_FLT] = {
-        eft_DBL, (void *)dpnp_vander_default_c<float, double>};
-    fmap[DPNPFuncName::DPNP_FN_VANDER][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_vander_default_c<double, double>};
-    fmap[DPNPFuncName::DPNP_FN_VANDER][eft_BLN][eft_BLN] = {
-        eft_LNG, (void *)dpnp_vander_default_c<bool, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_VANDER][eft_C128][eft_C128] = {
-        eft_C128,
-        (void *)
-            dpnp_vander_default_c<std::complex<double>, std::complex<double>>};
-
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_trace_default_c<int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_LNG][eft_INT] = {
-        eft_INT, (void *)dpnp_trace_default_c<int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_FLT][eft_INT] = {
-        eft_INT, (void *)dpnp_trace_default_c<float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_DBL][eft_INT] = {
-        eft_INT, (void *)dpnp_trace_default_c<double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_INT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_trace_default_c<int32_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_trace_default_c<int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_FLT][eft_LNG] = {
-        eft_LNG, (void *)dpnp_trace_default_c<float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_DBL][eft_LNG] = {
-        eft_LNG, (void *)dpnp_trace_default_c<double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_INT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_trace_default_c<int32_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_LNG][eft_FLT] = {
-        eft_FLT, (void *)dpnp_trace_default_c<int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_trace_default_c<float, float>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_DBL][eft_FLT] = {
-        eft_FLT, (void *)dpnp_trace_default_c<double, float>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_INT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trace_default_c<int32_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_LNG][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trace_default_c<int64_t, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_FLT][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trace_default_c<float, double>};
-    fmap[DPNPFuncName::DPNP_FN_TRACE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_trace_default_c<double, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_TRI][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_tri_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRI][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_tri_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRI][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_tri_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_TRI][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_tri_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_TRIL][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_tril_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRIL][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_tril_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRIL][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_tril_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_TRIL][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_tril_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_TRIU][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_triu_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRIU][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_triu_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRIU][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_triu_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_TRIU][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_triu_default_c<double>};
-
+    // Used in dpnp_rng_binomial_c, dpnp_rng_gamma_c, dpnp_rng_hypergeometric_c
+    //         dpnp_rng_laplace_c, dpnp_rng_multinomial_c, dpnp_rng_weibull_c
     fmap[DPNPFuncName::DPNP_FN_ZEROS][eft_INT][eft_INT] = {
         eft_INT, (void *)dpnp_zeros_default_c<int32_t>};
     fmap[DPNPFuncName::DPNP_FN_ZEROS][eft_LNG][eft_LNG] = {
diff --git a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
index 20be65f53cab..e3797bd22e6e 100644
--- a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
@@ -1026,19 +1026,6 @@ static void func_map_init_elemwise_1arg_1type(func_map_t &fmap)
 
 #include <dpnp_gen_2arg_3type_tbl.hpp>
 
-template <DPNPFuncType FT1, DPNPFuncType... FTs>
-static void func_map_elemwise_2arg_3type_core(func_map_t &fmap)
-{
-    // dpnp_subtract_c_ext is implicitly used by dpnp_ptp_c
-    ((fmap[DPNPFuncName::DPNP_FN_SUBTRACT_EXT][FT1][FTs] =
-          {populate_func_types<FT1, FTs>(),
-           (void *)dpnp_subtract_c_ext<
-               func_type_map_t::find_type<populate_func_types<FT1, FTs>()>,
-               func_type_map_t::find_type<FT1>,
-               func_type_map_t::find_type<FTs>>}),
-     ...);
-}
-
 template <DPNPFuncType FT1, DPNPFuncType... FTs>
 static void func_map_elemwise_2arg_3type_short_core(func_map_t &fmap)
 {
@@ -1072,12 +1059,6 @@ static void func_map_elemwise_2arg_3type_short_core(func_map_t &fmap)
      ...);
 }
 
-template <DPNPFuncType... FTs>
-static void func_map_elemwise_2arg_3type_helper(func_map_t &fmap)
-{
-    ((func_map_elemwise_2arg_3type_core<FTs, FTs...>(fmap)), ...);
-}
-
 template <DPNPFuncType... FTs>
 static void func_map_elemwise_2arg_3type_short_helper(func_map_t &fmap)
 {
@@ -1189,9 +1170,6 @@ static void func_map_init_elemwise_2arg_3type(func_map_t &fmap)
         (void *)dpnp_multiply_c_default<
             std::complex<double>, std::complex<double>, std::complex<double>>};
 
-    func_map_elemwise_2arg_3type_helper<eft_BLN, eft_INT, eft_LNG, eft_FLT,
-                                        eft_DBL, eft_C64, eft_C128>(fmap);
-
     func_map_elemwise_2arg_3type_short_helper<eft_INT, eft_LNG, eft_FLT,
                                               eft_DBL>(fmap);
 
diff --git a/dpnp/backend/kernels/dpnp_krnl_indexing.cpp b/dpnp/backend/kernels/dpnp_krnl_indexing.cpp
index dcbf6ca906c7..523acd447c64 100644
--- a/dpnp/backend/kernels/dpnp_krnl_indexing.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_indexing.cpp
@@ -125,31 +125,6 @@ DPCTLSyclEventRef (*dpnp_choose_ext_c)(DPCTLSyclQueueRef,
                                        const DPCTLEventVectorRef) =
     dpnp_choose_c<_DataType1, _DataType2>;
 
-template <typename _DataType>
-DPCTLSyclEventRef
-    dpnp_diag_indices_c(DPCTLSyclQueueRef q_ref,
-                        void *result1,
-                        size_t size,
-                        const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    return dpnp_arange_c<_DataType>(q_ref, 0, 1, result1, size,
-                                    dep_event_vec_ref);
-}
-
-template <typename _DataType>
-void dpnp_diag_indices_c(void *result1, size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_diag_indices_c<_DataType>(q_ref, result1, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_diag_indices_default_c)(void *,
-                                    size_t) = dpnp_diag_indices_c<_DataType>;
-
 template <typename _DataType>
 DPCTLSyclEventRef dpnp_diagonal_c(DPCTLSyclQueueRef q_ref,
                                   void *array1_in,
@@ -873,15 +848,6 @@ void func_map_init_indexing_func(func_map_t &fmap)
     fmap[DPNPFuncName::DPNP_FN_CHOOSE_EXT][eft_LNG][eft_DBL] = {
         eft_DBL, (void *)dpnp_choose_ext_c<int64_t, double>};
 
-    fmap[DPNPFuncName::DPNP_FN_DIAG_INDICES][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_diag_indices_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIAG_INDICES][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_diag_indices_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIAG_INDICES][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_diag_indices_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_DIAG_INDICES][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_diag_indices_default_c<double>};
-
     fmap[DPNPFuncName::DPNP_FN_DIAGONAL][eft_INT][eft_INT] = {
         eft_INT, (void *)dpnp_diagonal_default_c<int32_t>};
     fmap[DPNPFuncName::DPNP_FN_DIAGONAL][eft_LNG][eft_LNG] = {
diff --git a/dpnp/backend/kernels/dpnp_krnl_manipulation.cpp b/dpnp/backend/kernels/dpnp_krnl_manipulation.cpp
deleted file mode 100644
index aaaa5a179dd7..000000000000
--- a/dpnp/backend/kernels/dpnp_krnl_manipulation.cpp
+++ /dev/null
@@ -1,235 +0,0 @@
-//*****************************************************************************
-// Copyright (c) 2016-2024, Intel Corporation
-// All rights reserved.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
-// THE POSSIBILITY OF SUCH DAMAGE.
-//*****************************************************************************
-
-#include <cmath>
-#include <iostream>
-#include <vector>
-
-#include <dpnp_iface.hpp>
-
-#include "dpnp_fptr.hpp"
-#include "dpnp_utils.hpp"
-#include "dpnpc_memory_adapter.hpp"
-#include "queue_sycl.hpp"
-
-template <typename _DataType>
-class dpnp_repeat_c_kernel;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_repeat_c(DPCTLSyclQueueRef q_ref,
-                                const void *array1_in,
-                                void *result1,
-                                const size_t repeats,
-                                const size_t size,
-                                const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!array1_in || !result1) {
-        return event_ref;
-    }
-
-    if (!size || !repeats) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array1_in, size);
-    const _DataType *array_in = input1_ptr.get_ptr();
-    _DataType *result = reinterpret_cast<_DataType *>(result1);
-
-    sycl::range<2> gws(size, repeats);
-    auto kernel_parallel_for_func = [=](sycl::id<2> global_id) {
-        size_t idx1 = global_id[0];
-        size_t idx2 = global_id[1];
-        result[(idx1 * repeats) + idx2] = array_in[idx1];
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_repeat_c_kernel<_DataType>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    event = q.submit(kernel_func);
-
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType>
-void dpnp_repeat_c(const void *array1_in,
-                   void *result1,
-                   const size_t repeats,
-                   const size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_repeat_c<_DataType>(
-        q_ref, array1_in, result1, repeats, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_repeat_default_c)(const void *,
-                              void *,
-                              const size_t,
-                              const size_t) = dpnp_repeat_c<_DataType>;
-
-template <typename _KernelNameSpecialization>
-class dpnp_elemwise_transpose_c_kernel;
-
-template <typename _DataType>
-DPCTLSyclEventRef
-    dpnp_elemwise_transpose_c(DPCTLSyclQueueRef q_ref,
-                              void *array1_in,
-                              const shape_elem_type *input_shape,
-                              const shape_elem_type *result_shape,
-                              const shape_elem_type *permute_axes,
-                              size_t ndim,
-                              void *result1,
-                              size_t size,
-                              const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    sycl::event event;
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array1_in, size);
-    _DataType *array1 = input1_ptr.get_ptr();
-    _DataType *result = reinterpret_cast<_DataType *>(result1);
-
-    shape_elem_type *input_offset_shape = reinterpret_cast<shape_elem_type *>(
-        sycl::malloc_shared(ndim * sizeof(shape_elem_type), q));
-    get_shape_offsets_inkernel(input_shape, ndim, input_offset_shape);
-
-    shape_elem_type *temp_result_offset_shape =
-        reinterpret_cast<shape_elem_type *>(
-            sycl::malloc_shared(ndim * sizeof(shape_elem_type), q));
-    get_shape_offsets_inkernel(result_shape, ndim, temp_result_offset_shape);
-
-    shape_elem_type *result_offset_shape = reinterpret_cast<shape_elem_type *>(
-        sycl::malloc_shared(ndim * sizeof(shape_elem_type), q));
-    for (size_t axis = 0; axis < ndim; ++axis) {
-        result_offset_shape[permute_axes[axis]] =
-            temp_result_offset_shape[axis];
-    }
-
-    sycl::range<1> gws(size);
-    auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-        const size_t idx = global_id[0];
-
-        size_t output_index = 0;
-        size_t reminder = idx;
-        for (size_t axis = 0; axis < ndim; ++axis) {
-            /* reconstruct [x][y][z] from given linear idx */
-            size_t xyz_id = reminder / input_offset_shape[axis];
-            reminder = reminder % input_offset_shape[axis];
-
-            /* calculate destination index based on reconstructed [x][y][z] */
-            output_index += (xyz_id * result_offset_shape[axis]);
-        }
-
-        result[output_index] = array1[idx];
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_elemwise_transpose_c_kernel<_DataType>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    event = q.submit(kernel_func);
-
-    event.wait();
-
-    sycl::free(input_offset_shape, q);
-    sycl::free(temp_result_offset_shape, q);
-    sycl::free(result_offset_shape, q);
-
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_elemwise_transpose_c(void *array1_in,
-                               const shape_elem_type *input_shape,
-                               const shape_elem_type *result_shape,
-                               const shape_elem_type *permute_axes,
-                               size_t ndim,
-                               void *result1,
-                               size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_elemwise_transpose_c<_DataType>(
-        q_ref, array1_in, input_shape, result_shape, permute_axes, ndim,
-        result1, size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_elemwise_transpose_default_c)(void *,
-                                          const shape_elem_type *,
-                                          const shape_elem_type *,
-                                          const shape_elem_type *,
-                                          size_t,
-                                          void *,
-                                          size_t) =
-    dpnp_elemwise_transpose_c<_DataType>;
-
-void func_map_init_manipulation(func_map_t &fmap)
-{
-    fmap[DPNPFuncName::DPNP_FN_REPEAT][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_repeat_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_REPEAT][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_repeat_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_REPEAT][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_repeat_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_REPEAT][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_repeat_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_TRANSPOSE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_elemwise_transpose_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRANSPOSE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_elemwise_transpose_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TRANSPOSE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_elemwise_transpose_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_TRANSPOSE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_elemwise_transpose_default_c<double>};
-    return;
-}
diff --git a/dpnp/backend/src/dpnp_fptr.hpp b/dpnp/backend/src/dpnp_fptr.hpp
index 2a9c42eb1720..73d627812a5c 100644
--- a/dpnp/backend/src/dpnp_fptr.hpp
+++ b/dpnp/backend/src/dpnp_fptr.hpp
@@ -331,7 +331,6 @@ void func_map_init_fft_func(func_map_t &fmap);
 void func_map_init_indexing_func(func_map_t &fmap);
 void func_map_init_linalg(func_map_t &fmap);
 void func_map_init_logic(func_map_t &fmap);
-void func_map_init_manipulation(func_map_t &fmap);
 void func_map_init_mathematical(func_map_t &fmap);
 void func_map_init_random(func_map_t &fmap);
 void func_map_init_reduction(func_map_t &fmap);
diff --git a/dpnp/backend/src/dpnp_iface_fptr.cpp b/dpnp/backend/src/dpnp_iface_fptr.cpp
index f8214212728d..f80c5b358639 100644
--- a/dpnp/backend/src/dpnp_iface_fptr.cpp
+++ b/dpnp/backend/src/dpnp_iface_fptr.cpp
@@ -96,45 +96,6 @@ void (*dpnp_dot_default_c)(void *,
                            const shape_elem_type *) =
     dpnp_dot_c<_DataType_output, _DataType_input1, _DataType_input2>;
 
-void *get_backend_function_name(const char *func_name, const char *type_name)
-{
-    /** Implement it in this way to allow easier play with it */
-    const char *supported_func_name = "dpnp_dot";
-    const char *supported_type1_name = "double";
-    const char *supported_type2_name = "float";
-    const char *supported_type3_name = "long";
-    const char *supported_type4_name = "int";
-
-    /** of coerce it will be converted into std::map later */
-    if (!strncmp(func_name, supported_func_name, strlen(supported_func_name))) {
-        if (!strncmp(type_name, supported_type1_name,
-                     strlen(supported_type1_name))) {
-            return reinterpret_cast<void *>(
-                dpnp_dot_default_c<double, double, double>);
-        }
-        else if (!strncmp(type_name, supported_type2_name,
-                          strlen(supported_type2_name)))
-        {
-            return reinterpret_cast<void *>(
-                dpnp_dot_default_c<float, float, float>);
-        }
-        else if (!strncmp(type_name, supported_type3_name,
-                          strlen(supported_type3_name)))
-        {
-            return reinterpret_cast<void *>(
-                dpnp_dot_default_c<int64_t, int64_t, int64_t>);
-        }
-        else if (!strncmp(type_name, supported_type4_name,
-                          strlen(supported_type4_name)))
-        {
-            return reinterpret_cast<void *>(
-                dpnp_dot_default_c<int32_t, int32_t, int32_t>);
-        }
-    }
-
-    throw std::runtime_error("DPNP Error: Unsupported function call");
-}
-
 /**
  * This operator is needed for compatibility with Cython 0.29 which has a bug in
  * Enum handling
@@ -172,7 +133,6 @@ static func_map_t func_map_init()
     func_map_init_indexing_func(fmap);
     func_map_init_linalg(fmap);
     func_map_init_logic(fmap);
-    func_map_init_manipulation(fmap);
     func_map_init_mathematical(fmap);
     func_map_init_random(fmap);
     func_map_init_reduction(fmap);
diff --git a/dpnp/backend/src/queue_sycl.cpp b/dpnp/backend/src/queue_sycl.cpp
index 5e6df29d21d2..786752facd60 100644
--- a/dpnp/backend/src/queue_sycl.cpp
+++ b/dpnp/backend/src/queue_sycl.cpp
@@ -80,36 +80,6 @@
 }
 #endif
 
-#if defined(DPNPC_TOUCH_KERNEL_TO_LINK)
-/**
- * Function push the SYCL kernels to be linked (final stage of the compilation)
- * for the current queue
- *
- * TODO it is not the best idea to just a call some kernel. Needs better
- * solution.
- */
-static long dpnp_kernels_link()
-{
-    /* must use memory pre-allocated at the current queue */
-    long *value_ptr =
-        reinterpret_cast<long *>(dpnp_memory_alloc_c(1 * sizeof(long)));
-    long *result_ptr =
-        reinterpret_cast<long *>(dpnp_memory_alloc_c(1 * sizeof(long)));
-    long result = 1;
-
-    *value_ptr = 2;
-
-    dpnp_square_c<long>(value_ptr, result_ptr, 1);
-
-    result = *result_ptr;
-
-    dpnp_memory_free_c(result_ptr);
-    dpnp_memory_free_c(value_ptr);
-
-    return result;
-}
-#endif
-
 size_t dpnp_queue_is_cpu_c()
 {
     const auto &be = backend_sycl::get();
diff --git a/dpnp/dpnp_algo/dpnp_algo.pxd b/dpnp/dpnp_algo/dpnp_algo.pxd
index 37663bee8343..0c8bd1134a78 100644
--- a/dpnp/dpnp_algo/dpnp_algo.pxd
+++ b/dpnp/dpnp_algo/dpnp_algo.pxd
@@ -84,7 +84,6 @@ cdef extern from "dpnp_iface_fptr.hpp" namespace "DPNPFuncName":  # need this na
         DPNP_FN_RNG_WALD_EXT
         DPNP_FN_RNG_WEIBULL_EXT
         DPNP_FN_RNG_ZIPF_EXT
-        DPNP_FN_TRAPZ_EXT
 
 cdef extern from "dpnp_iface_fptr.hpp" namespace "DPNPFuncType":  # need this namespace for Enum import
     cdef enum DPNPFuncType "DPNPFuncType":
diff --git a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
index fca1e6dc3036..28b89ce60a1a 100644
--- a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
+++ b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
@@ -40,16 +40,12 @@ __all__ += [
     "dpnp_fmax",
     "dpnp_fmin",
     "dpnp_modf",
-    "dpnp_trapz",
 ]
 
 
 ctypedef c_dpctl.DPCTLSyclEventRef(*fptr_1in_2out_t)(c_dpctl.DPCTLSyclQueueRef,
                                                      void * , void * , void * , size_t,
                                                      const c_dpctl.DPCTLEventVectorRef)
-ctypedef c_dpctl.DPCTLSyclEventRef(*ftpr_custom_trapz_2in_1out_with_2size_t)(c_dpctl.DPCTLSyclQueueRef,
-                                                                             void *, void * , void * , double, size_t, size_t,
-                                                                             const c_dpctl.DPCTLEventVectorRef)
 
 
 cpdef utils.dpnp_descriptor dpnp_ediff1d(utils.dpnp_descriptor x1):
@@ -166,41 +162,3 @@ cpdef tuple dpnp_modf(utils.dpnp_descriptor x1):
     c_dpctl.DPCTLEvent_Delete(event_ref)
 
     return (result1.get_pyobj(), result2.get_pyobj())
-
-
-cpdef utils.dpnp_descriptor dpnp_trapz(utils.dpnp_descriptor y1, utils.dpnp_descriptor x1, double dx):
-
-    cdef DPNPFuncType param1_type = dpnp_dtype_to_DPNPFuncType(y1.dtype)
-    cdef DPNPFuncType param2_type = dpnp_dtype_to_DPNPFuncType(x1.dtype)
-    cdef DPNPFuncData kernel_data = get_dpnp_function_ptr(DPNP_FN_TRAPZ_EXT, param1_type, param2_type)
-
-    result_sycl_device, result_usm_type, result_sycl_queue = utils.get_common_usm_allocation(y1, x1)
-
-    # create result array with type given by FPTR data
-    cdef shape_type_c result_shape = (1,)
-    cdef utils.dpnp_descriptor result = utils.create_output_descriptor(result_shape,
-                                                                       kernel_data.return_type,
-                                                                       None,
-                                                                       device=result_sycl_device,
-                                                                       usm_type=result_usm_type,
-                                                                       sycl_queue=result_sycl_queue)
-
-    result_sycl_queue = result.get_array().sycl_queue
-
-    cdef c_dpctl.SyclQueue q = <c_dpctl.SyclQueue> result_sycl_queue
-    cdef c_dpctl.DPCTLSyclQueueRef q_ref = q.get_queue_ref()
-
-    cdef ftpr_custom_trapz_2in_1out_with_2size_t func = <ftpr_custom_trapz_2in_1out_with_2size_t > kernel_data.ptr
-    cdef c_dpctl.DPCTLSyclEventRef event_ref = func(q_ref,
-                                                    y1.get_data(),
-                                                    x1.get_data(),
-                                                    result.get_data(),
-                                                    dx,
-                                                    y1.size,
-                                                    x1.size,
-                                                    NULL)  # dep_events_ref
-
-    with nogil: c_dpctl.DPCTLEvent_WaitAndThrow(event_ref)
-    c_dpctl.DPCTLEvent_Delete(event_ref)
-
-    return result
diff --git a/dpnp/dpnp_iface_mathematical.py b/dpnp/dpnp_iface_mathematical.py
index 1fe7839f5967..1caf1359be3e 100644
--- a/dpnp/dpnp_iface_mathematical.py
+++ b/dpnp/dpnp_iface_mathematical.py
@@ -64,7 +64,6 @@
     dpnp_fmax,
     dpnp_fmin,
     dpnp_modf,
-    dpnp_trapz,
 )
 from .dpnp_algo.dpnp_elementwise_common import (
     DPNPAngle,
@@ -3287,36 +3286,6 @@ def trapz(y1, x1=None, dx=1.0, axis=-1):
 
     """
 
-    y_desc = dpnp.get_dpnp_descriptor(y1, copy_when_nondefault_queue=False)
-    if y_desc:
-        if y_desc.ndim > 1:
-            pass
-        else:
-            y_obj = y_desc.get_array()
-            if x1 is None:
-                x_obj = dpnp.empty(
-                    y_desc.shape,
-                    dtype=y_desc.dtype,
-                    device=y_obj.sycl_device,
-                    usm_type=y_obj.usm_type,
-                    sycl_queue=y_obj.sycl_queue,
-                )
-            else:
-                x_obj = x1
-
-            x_desc = dpnp.get_dpnp_descriptor(
-                x_obj, copy_when_nondefault_queue=False
-            )
-            # TODO: change to "not x_desc"
-            if x_desc:
-                pass
-            elif y_desc.size != x_desc.size:
-                pass
-            elif y_desc.shape != x_desc.shape:
-                pass
-            else:
-                return dpnp_trapz(y_desc, x_desc, dx).get_pyobj()
-
     return call_origin(numpy.trapz, y1, x1, dx, axis)
 
 

From 03b585b09145525e21edb688436e14cf9a19d484 Mon Sep 17 00:00:00 2001
From: Natalia Polina <natalia.polina@intel.com>
Date: Thu, 4 Jul 2024 04:04:20 -0700
Subject: [PATCH 44/49] Clean up legacy indexing implementation from the
 backend (#1908)

* Clean up legacy indexing implementation from the backend

* fix pre-commit
---
 dpnp/backend/include/dpnp_iface.hpp         | 224 ------
 dpnp/backend/include/dpnp_iface_fptr.hpp    |  49 +-
 dpnp/backend/kernels/dpnp_krnl_indexing.cpp | 767 --------------------
 3 files changed, 21 insertions(+), 1019 deletions(-)

diff --git a/dpnp/backend/include/dpnp_iface.hpp b/dpnp/backend/include/dpnp_iface.hpp
index 0fc5595041c6..4efea15a38b4 100644
--- a/dpnp/backend/include/dpnp_iface.hpp
+++ b/dpnp/backend/include/dpnp_iface.hpp
@@ -205,38 +205,6 @@ INP_DLLEXPORT void dpnp_nanvar_c(void *array,
                                  const size_t result_size,
                                  size_t size);
 
-/**
- * @ingroup BACKEND_API
- * @brief Return the indices of the elements that are non-zero.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1              Input array.
- * @param [out] result1             Output array.
- * @param [in]  result_size         Output array size.
- * @param [in]  shape               Shape of input array.
- * @param [in]  ndim                Number of elements in shape.
- * @param [in]  j                   Number input array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_nonzero_c(DPCTLSyclQueueRef q_ref,
-                   const void *array1,
-                   void *result1,
-                   const size_t result_size,
-                   const shape_elem_type *shape,
-                   const size_t ndim,
-                   const size_t j,
-                   const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_nonzero_c(const void *array1,
-                                  void *result1,
-                                  const size_t result_size,
-                                  const shape_elem_type *shape,
-                                  const size_t ndim,
-                                  const size_t j);
-
 /**
  * @ingroup BACKEND_API
  * @brief Custom implementation of dot function
@@ -448,35 +416,6 @@ INP_DLLEXPORT void dpnp_partition_c(void *array,
                                     const shape_elem_type *shape,
                                     const size_t ndim);
 
-/**
- * @ingroup BACKEND_API
- * @brief Place of array elements
- *
- * @param [in]  q_ref             Reference to SYCL queue.
- * @param [in]  arr               Input array.
- * @param [in]  mask              Mask array.
- * @param [in]  vals              Vals array.
- * @param [in]  arr_size          Number of input elements in `arr`.
- * @param [in]  vals_size         Number of input elements in `vals`.
- * @param [in]  dep_event_vec_ref Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_place_c(DPCTLSyclQueueRef q_ref,
-                 void *arr,
-                 long *mask,
-                 void *vals,
-                 const size_t arr_size,
-                 const size_t vals_size,
-                 const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_place_c(void *arr,
-                                long *mask,
-                                void *vals,
-                                const size_t arr_size,
-                                const size_t vals_size);
-
 /**
  * @ingroup BACKEND_API
  * @brief Compute Product of input array elements.
@@ -523,78 +462,6 @@ INP_DLLEXPORT void dpnp_prod_c(void *result_out,
                                const void *initial,
                                const long *where);
 
-/**
- * @ingroup BACKEND_API
- * @brief Replaces specified elements of an array with given values.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array               Input array.
- * @param [in]  ind                 Target indices, interpreted as integers.
- * @param [in]  v                   Values to place in array at target indices.
- * @param [in]  size                Number of input elements in `array`.
- * @param [in]  size_ind            Number of input elements in `ind`.
- * @param [in]  size_v              Number of input elements in `v`.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType, typename _IndecesType, typename _ValueType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_put_c(DPCTLSyclQueueRef q_ref,
-               void *array,
-               void *ind,
-               void *v,
-               const size_t size,
-               const size_t size_ind,
-               const size_t size_v,
-               const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType, typename _IndecesType, typename _ValueType>
-INP_DLLEXPORT void dpnp_put_c(void *array,
-                              void *ind,
-                              void *v,
-                              const size_t size,
-                              const size_t size_ind,
-                              const size_t size_v);
-
-/**
- * @ingroup BACKEND_API
- * @brief Put values into the destination array by matching 1d index and data
- * slices.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  arr_in              Input array.
- * @param [in]  indices_in          Indices to change along each 1d slice of
- * arr.
- * @param [in]  values_in           Values to insert at those indices.
- * @param [in]  axis                The axis to take 1d slices along.
- * @param [in]  shape               Shape of input array.
- * @param [in]  ndim                Number of input array dimensions.
- * @param [in]  size_indices        Size of indices.
- * @param [in]  values_size         Size of values.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_put_along_axis_c(DPCTLSyclQueueRef q_ref,
-                          void *arr_in,
-                          long *indices_in,
-                          void *values_in,
-                          size_t axis,
-                          const shape_elem_type *shape,
-                          size_t ndim,
-                          size_t size_indices,
-                          size_t values_size,
-                          const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_put_along_axis_c(void *arr_in,
-                                         long *indices_in,
-                                         void *values_in,
-                                         size_t axis,
-                                         const shape_elem_type *shape,
-                                         size_t ndim,
-                                         size_t size_indices,
-                                         size_t values_size);
-
 /**
  * @ingroup BACKEND_API
 
@@ -776,42 +643,6 @@ INP_DLLEXPORT void dpnp_choose_c(void *result1,
                                  size_t choices_size,
                                  size_t choice_size);
 
-/**
- * @ingroup BACKEND_API
- * @brief math library implementation of diagonal function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1_in           Input array with data.
- * @param [in]  input1_size         Input1 data size.
- * @param [out] result1             Output array.
- * @param [in]  offset              Offset of the diagonal from the main
- * diagonal.
- * @param [in]  shape               Shape of input array.
- * @param [in]  res_shape           Shape of output array.
- * @param [in]  res_ndim            Number of elements in shape.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_diagonal_c(DPCTLSyclQueueRef q_ref,
-                    void *array1_in,
-                    const size_t input1_size,
-                    void *result1,
-                    const size_t offset,
-                    shape_elem_type *shape,
-                    shape_elem_type *res_shape,
-                    const size_t res_ndim,
-                    const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_diagonal_c(void *array1_in,
-                                   const size_t input1_size,
-                                   void *result1,
-                                   const size_t offset,
-                                   shape_elem_type *shape,
-                                   shape_elem_type *res_shape,
-                                   const size_t res_ndim);
-
 /**
  * @ingroup BACKEND_API
  * @brief implementation of creating filled with value array function
@@ -1044,35 +875,6 @@ INP_DLLEXPORT void dpnp_std_c(void *array,
                               size_t naxis,
                               size_t ddof);
 
-/**
- * @ingroup BACKEND_API
- * @brief math library implementation of take function
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array               Input array with data.
- * @param [in]  array1_size         Input array size.
- * @param [in]  indices             Input array with indices.
- * @param [out] result              Output array.
- * @param [in]  size                Number of elements in the input array.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType, typename _IndecesType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_take_c(DPCTLSyclQueueRef q_ref,
-                void *array,
-                const size_t array1_size,
-                void *indices,
-                void *result,
-                size_t size,
-                const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType, typename _IndecesType>
-INP_DLLEXPORT void dpnp_take_c(void *array,
-                               const size_t array1_size,
-                               void *indices,
-                               void *result,
-                               size_t size);
-
 /**
  * @ingroup BACKEND_API
  * @brief math library implementation of var function
@@ -1183,32 +985,6 @@ INP_DLLEXPORT void dpnp_var_c(void *array,
 
 #include <dpnp_gen_2arg_3type_tbl.hpp>
 
-/**
- * @ingroup BACKEND_API
- * @brief fill_diagonal function.
- *
- * @param [in]  q_ref               Reference to SYCL queue.
- * @param [in]  array1_in           Input array.
- * @param [in]  val                 Value to write on the diagonal.
- * @param [in]  shape               Input shape.
- * @param [in]  ndim                Number of elements in shape.
- * @param [in]  dep_event_vec_ref   Reference to vector of SYCL events.
- */
-template <typename _DataType>
-INP_DLLEXPORT DPCTLSyclEventRef
-    dpnp_fill_diagonal_c(DPCTLSyclQueueRef q_ref,
-                         void *array1_in,
-                         void *val,
-                         shape_elem_type *shape,
-                         const size_t ndim,
-                         const DPCTLEventVectorRef dep_event_vec_ref);
-
-template <typename _DataType>
-INP_DLLEXPORT void dpnp_fill_diagonal_c(void *array1_in,
-                                        void *val,
-                                        shape_elem_type *shape,
-                                        const size_t ndim);
-
 /**
  * @ingroup BACKEND_API
  * @brief modf function.
diff --git a/dpnp/backend/include/dpnp_iface_fptr.hpp b/dpnp/backend/include/dpnp_iface_fptr.hpp
index d62e5998583e..aaaf90c27bb0 100644
--- a/dpnp/backend/include/dpnp_iface_fptr.hpp
+++ b/dpnp/backend/include/dpnp_iface_fptr.hpp
@@ -81,22 +81,20 @@ enum class DPNPFuncName : size_t
     DPNP_FN_DEGREES,       /**< Used in numpy.degrees() impl  */
     DPNP_FN_DEGREES_EXT,   /**< Used in numpy.degrees() impl, requires extra
                               parameters */
-    DPNP_FN_DIAGONAL,      /**< Used in numpy.diagonal() impl  */
     DPNP_FN_DOT,           /**< Used in numpy.dot() impl  */
     DPNP_FN_DOT_EXT, /**< Used in numpy.dot() impl, requires extra parameters */
     DPNP_FN_EDIFF1D, /**< Used in numpy.ediff1d() impl  */
-    DPNP_FN_EDIFF1D_EXT,   /**< Used in numpy.ediff1d() impl, requires extra
-                              parameters */
-    DPNP_FN_ERF,           /**< Used in scipy.special.erf impl  */
-    DPNP_FN_ERF_EXT,       /**< Used in scipy.special.erf impl, requires extra
-                              parameters */
-    DPNP_FN_FFT_FFT,       /**< Used in numpy.fft.fft() impl  */
-    DPNP_FN_FFT_FFT_EXT,   /**< Used in numpy.fft.fft() impl, requires extra
-                              parameters */
-    DPNP_FN_FFT_RFFT,      /**< Used in numpy.fft.rfft() impl  */
-    DPNP_FN_FFT_RFFT_EXT,  /**< Used in numpy.fft.rfft() impl, requires extra
-                              parameters */
-    DPNP_FN_FILL_DIAGONAL, /**< Used in numpy.fill_diagonal() impl  */
+    DPNP_FN_EDIFF1D_EXT,  /**< Used in numpy.ediff1d() impl, requires extra
+                             parameters */
+    DPNP_FN_ERF,          /**< Used in scipy.special.erf impl  */
+    DPNP_FN_ERF_EXT,      /**< Used in scipy.special.erf impl, requires extra
+                             parameters */
+    DPNP_FN_FFT_FFT,      /**< Used in numpy.fft.fft() impl  */
+    DPNP_FN_FFT_FFT_EXT,  /**< Used in numpy.fft.fft() impl, requires extra
+                             parameters */
+    DPNP_FN_FFT_RFFT,     /**< Used in numpy.fft.rfft() impl  */
+    DPNP_FN_FFT_RFFT_EXT, /**< Used in numpy.fft.rfft() impl, requires extra
+                             parameters */
     DPNP_FN_INITVAL, /**< Used in numpy ones, ones_like, zeros, zeros_like impls
                       */
     DPNP_FN_INITVAL_EXT, /**< Used in numpy ones, ones_like, zeros, zeros_like
@@ -116,23 +114,19 @@ enum class DPNPFuncName : size_t
                         */
     DPNP_FN_MULTIPLY,  /**< Used in numpy.multiply() impl  */
     DPNP_FN_NANVAR,    /**< Used in numpy.nanvar() impl  */
-    DPNP_FN_NONZERO,   /**< Used in numpy.nonzero() impl  */
     DPNP_FN_ONES,      /**< Used in numpy.ones() impl */
     DPNP_FN_ONES_LIKE, /**< Used in numpy.ones_like() impl */
     DPNP_FN_PARTITION, /**< Used in numpy.partition() impl */
-    DPNP_FN_PARTITION_EXT,  /**< Used in numpy.partition() impl, requires extra
-                               parameters */
-    DPNP_FN_PLACE,          /**< Used in numpy.place() impl  */
-    DPNP_FN_PROD,           /**< Used in numpy.prod() impl  */
-    DPNP_FN_PUT,            /**< Used in numpy.put() impl  */
-    DPNP_FN_PUT_ALONG_AXIS, /**< Used in numpy.put_along_axis() impl  */
-    DPNP_FN_RADIANS,        /**< Used in numpy.radians() impl  */
-    DPNP_FN_RADIANS_EXT,    /**< Used in numpy.radians() impl, requires extra
-                               parameters */
-    DPNP_FN_RNG_BETA,       /**< Used in numpy.random.beta() impl  */
-    DPNP_FN_RNG_BETA_EXT, /**< Used in numpy.random.beta() impl, requires extra
-                             parameters */
-    DPNP_FN_RNG_BINOMIAL, /**< Used in numpy.random.binomial() impl  */
+    DPNP_FN_PARTITION_EXT, /**< Used in numpy.partition() impl, requires extra
+                              parameters */
+    DPNP_FN_PROD,          /**< Used in numpy.prod() impl  */
+    DPNP_FN_RADIANS,       /**< Used in numpy.radians() impl  */
+    DPNP_FN_RADIANS_EXT,   /**< Used in numpy.radians() impl, requires extra
+                              parameters */
+    DPNP_FN_RNG_BETA,      /**< Used in numpy.random.beta() impl  */
+    DPNP_FN_RNG_BETA_EXT,  /**< Used in numpy.random.beta() impl, requires extra
+                              parameters */
+    DPNP_FN_RNG_BINOMIAL,  /**< Used in numpy.random.binomial() impl  */
     DPNP_FN_RNG_BINOMIAL_EXT,  /**< Used in numpy.random.binomial() impl,
                                   requires extra parameters */
     DPNP_FN_RNG_CHISQUARE,     /**< Used in numpy.random.chisquare() impl  */
@@ -253,7 +247,6 @@ enum class DPNPFuncName : size_t
                        */
     DPNP_FN_STD,      /**< Used in numpy.std() impl  */
     DPNP_FN_SUM,      /**< Used in numpy.sum() impl  */
-    DPNP_FN_TAKE,     /**< Used in numpy.take() impl  */
     DPNP_FN_VAR,      /**< Used in numpy.var() impl  */
     DPNP_FN_ZEROS,    /**< Used in numpy.zeros() impl */
     DPNP_FN_ZEROS_LIKE, /**< Used in numpy.zeros_like() impl */
diff --git a/dpnp/backend/kernels/dpnp_krnl_indexing.cpp b/dpnp/backend/kernels/dpnp_krnl_indexing.cpp
index 523acd447c64..5400da817581 100644
--- a/dpnp/backend/kernels/dpnp_krnl_indexing.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_indexing.cpp
@@ -125,693 +125,6 @@ DPCTLSyclEventRef (*dpnp_choose_ext_c)(DPCTLSyclQueueRef,
                                        const DPCTLEventVectorRef) =
     dpnp_choose_c<_DataType1, _DataType2>;
 
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_diagonal_c(DPCTLSyclQueueRef q_ref,
-                                  void *array1_in,
-                                  const size_t input1_size,
-                                  void *result1,
-                                  const size_t offset,
-                                  shape_elem_type *shape,
-                                  shape_elem_type *res_shape,
-                                  const size_t res_ndim,
-                                  const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    const size_t res_size = std::accumulate(res_shape, res_shape + res_ndim, 1,
-                                            std::multiplies<shape_elem_type>());
-    if (!(res_size && input1_size)) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, array1_in, input1_size,
-                                            true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, result1, res_size, true,
-                                            true);
-    _DataType *array_1 = input1_ptr.get_ptr();
-    _DataType *result = result_ptr.get_ptr();
-
-    const size_t res_shape_ndim_sub_1 =
-        static_cast<size_t>(res_shape[res_ndim - 1]);
-
-    if (res_ndim <= 1) {
-        for (size_t i = 0; i < res_shape_ndim_sub_1; ++i) {
-            result[i] = array_1[i * shape[res_ndim] + i + offset];
-        }
-    }
-    else {
-        std::map<size_t, std::vector<size_t>> xyz;
-        for (size_t i = 0; i < static_cast<size_t>(res_shape[0]); i++) {
-            xyz[i] = {i};
-        }
-
-        size_t index = 1;
-        while (index < res_ndim - 1) {
-            size_t shape_element = res_shape[index];
-            std::map<size_t, std::vector<size_t>> new_shape_array;
-            size_t ind = 0;
-            for (size_t i = 0; i < shape_element; i++) {
-                for (size_t j = 0; j < xyz.size(); j++) {
-                    std::vector<size_t> new_shape;
-                    std::vector<size_t> list_ind = xyz[j];
-                    for (size_t k = 0; k < list_ind.size(); k++) {
-                        new_shape.push_back(list_ind.at(k));
-                    }
-                    new_shape.push_back(i);
-                    new_shape_array[ind] = new_shape;
-                    ind += 1;
-                }
-            }
-            size_t len_new_shape_array = new_shape_array.size() * (index + 1);
-
-            for (size_t k = 0; k < len_new_shape_array; k++) {
-                xyz[k] = new_shape_array[k];
-            }
-            index += 1;
-        }
-
-        for (size_t i = 0; i < res_shape_ndim_sub_1; i++) {
-            for (size_t j = 0; j < xyz.size(); j++) {
-                std::vector<size_t> ind_list = xyz[j];
-                if (ind_list.size() == 0) {
-                    continue;
-                }
-                else {
-                    std::vector<size_t> ind_input_{i, i + offset};
-                    ind_input_.insert(ind_input_.end(), ind_list.begin(),
-                                      ind_list.end());
-
-                    std::vector<size_t> ind_output_ = ind_list;
-                    ind_output_.push_back(i);
-
-                    const size_t ind_output_size = ind_output_.size();
-                    size_t ind_output = 0;
-                    size_t n = 1;
-                    for (size_t k = 0; k < ind_output_size; k++) {
-                        size_t ind = ind_output_size - 1 - k;
-                        ind_output += n * ind_output_[ind];
-                        n *= res_shape[ind];
-                    }
-
-                    const size_t ind_input_size = ind_input_.size();
-                    size_t ind_input = 0;
-                    size_t m = 1;
-                    for (size_t k = 0; k < ind_input_size; k++) {
-                        size_t ind = ind_input_size - 1 - k;
-                        ind_input += m * ind_input_[ind];
-                        m *= shape[ind];
-                    }
-
-                    result[ind_output] = array_1[ind_input];
-                }
-            }
-        }
-    }
-
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_diagonal_c(void *array1_in,
-                     const size_t input1_size,
-                     void *result1,
-                     const size_t offset,
-                     shape_elem_type *shape,
-                     shape_elem_type *res_shape,
-                     const size_t res_ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_diagonal_c<_DataType>(
-        q_ref, array1_in, input1_size, result1, offset, shape, res_shape,
-        res_ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_diagonal_default_c)(void *,
-                                const size_t,
-                                void *,
-                                const size_t,
-                                shape_elem_type *,
-                                shape_elem_type *,
-                                const size_t) = dpnp_diagonal_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef
-    dpnp_fill_diagonal_c(DPCTLSyclQueueRef q_ref,
-                         void *array1_in,
-                         void *val_in,
-                         shape_elem_type *shape,
-                         const size_t ndim,
-                         const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    const size_t result_size = std::accumulate(
-        shape, shape + ndim, 1, std::multiplies<shape_elem_type>());
-    if (!(result_size && array1_in)) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, array1_in, result_size, true,
-                                            true);
-    DPNPC_ptr_adapter<_DataType> val_ptr(q_ref, val_in, 1, true);
-    _DataType *array_1 = result_ptr.get_ptr();
-    _DataType *val_arr = val_ptr.get_ptr();
-
-    shape_elem_type min_shape = shape[0];
-    for (size_t i = 0; i < ndim; ++i) {
-        if (shape[i] < min_shape) {
-            min_shape = shape[i];
-        }
-    }
-
-    _DataType val = val_arr[0];
-
-    for (size_t i = 0; i < static_cast<size_t>(min_shape); ++i) {
-        size_t ind = 0;
-        size_t n = 1;
-        for (size_t k = 0; k < ndim; k++) {
-            size_t ind_ = ndim - 1 - k;
-            ind += n * i;
-            n *= shape[ind_];
-        }
-        array_1[ind] = val;
-    }
-
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_fill_diagonal_c(void *array1_in,
-                          void *val_in,
-                          shape_elem_type *shape,
-                          const size_t ndim)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_fill_diagonal_c<_DataType>(
-        q_ref, array1_in, val_in, shape, ndim, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_fill_diagonal_default_c)(void *,
-                                     void *,
-                                     shape_elem_type *,
-                                     const size_t) =
-    dpnp_fill_diagonal_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_nonzero_c(DPCTLSyclQueueRef q_ref,
-                                 const void *in_array1,
-                                 void *result1,
-                                 const size_t result_size,
-                                 const shape_elem_type *shape,
-                                 const size_t ndim,
-                                 const size_t j,
-                                 const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if ((in_array1 == nullptr) || (result1 == nullptr)) {
-        return event_ref;
-    }
-
-    if (ndim == 0) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    const size_t input1_size = std::accumulate(
-        shape, shape + ndim, 1, std::multiplies<shape_elem_type>());
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, in_array1, input1_size,
-                                            true);
-    DPNPC_ptr_adapter<long> result_ptr(q_ref, result1, result_size, true, true);
-    const _DataType *arr = input1_ptr.get_ptr();
-    long *result = result_ptr.get_ptr();
-
-    size_t idx = 0;
-    size_t *ids = new size_t[ndim];
-
-    for (size_t i = 0; i < input1_size; ++i) {
-        if (arr[i] != 0) {
-            size_t ind1 = input1_size;
-            size_t ind2 = i;
-
-            for (size_t k = 0; k < ndim; ++k) {
-                ind1 = ind1 / shape[k];
-                ids[k] = ind2 / ind1;
-                ind2 = ind2 % ind1;
-            }
-
-            result[idx] = ids[j];
-            idx += 1;
-        }
-    }
-    delete[] ids;
-
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_nonzero_c(const void *in_array1,
-                    void *result1,
-                    const size_t result_size,
-                    const shape_elem_type *shape,
-                    const size_t ndim,
-                    const size_t j)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_nonzero_c<_DataType>(q_ref, in_array1, result1, result_size, shape,
-                                  ndim, j, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_nonzero_default_c)(const void *,
-                               void *,
-                               const size_t,
-                               const shape_elem_type *,
-                               const size_t,
-                               const size_t) = dpnp_nonzero_c<_DataType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef dpnp_place_c(DPCTLSyclQueueRef q_ref,
-                               void *arr_in,
-                               long *mask_in,
-                               void *vals_in,
-                               const size_t arr_size,
-                               const size_t vals_size,
-                               const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if (!arr_size) {
-        return event_ref;
-    }
-
-    if (!vals_size) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    DPNPC_ptr_adapter<_DataType> input1_ptr(q_ref, vals_in, vals_size, true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, arr_in, arr_size, true,
-                                            true);
-    _DataType *vals = input1_ptr.get_ptr();
-    _DataType *arr = result_ptr.get_ptr();
-
-    DPNPC_ptr_adapter<long> mask_ptr(q_ref, mask_in, arr_size, true);
-    long *mask = mask_ptr.get_ptr();
-
-    size_t counter = 0;
-    for (size_t i = 0; i < arr_size; ++i) {
-        if (mask[i]) {
-            arr[i] = vals[counter % vals_size];
-            counter += 1;
-        }
-    }
-
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_place_c(void *arr_in,
-                  long *mask_in,
-                  void *vals_in,
-                  const size_t arr_size,
-                  const size_t vals_size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_place_c<_DataType>(q_ref, arr_in, mask_in, vals_in, arr_size,
-                                vals_size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_place_default_c)(void *,
-                             long *,
-                             void *,
-                             const size_t,
-                             const size_t) = dpnp_place_c<_DataType>;
-
-template <typename _DataType, typename _IndecesType, typename _ValueType>
-DPCTLSyclEventRef dpnp_put_c(DPCTLSyclQueueRef q_ref,
-                             void *array1_in,
-                             void *ind_in,
-                             void *v_in,
-                             const size_t size,
-                             const size_t size_ind,
-                             const size_t size_v,
-                             const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-
-    if ((array1_in == nullptr) || (ind_in == nullptr) || (v_in == nullptr)) {
-        return event_ref;
-    }
-
-    if (size_v == 0) {
-        return event_ref;
-    }
-
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-    DPNPC_ptr_adapter<size_t> input1_ptr(q_ref, ind_in, size_ind, true);
-    DPNPC_ptr_adapter<_DataType> input2_ptr(q_ref, v_in, size_v, true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, array1_in, size, true, true);
-    size_t *ind = input1_ptr.get_ptr();
-    _DataType *v = input2_ptr.get_ptr();
-    _DataType *array_1 = result_ptr.get_ptr();
-
-    for (size_t i = 0; i < size; ++i) {
-        for (size_t j = 0; j < size_ind; ++j) {
-            if (i == ind[j] || (i == (size + ind[j]))) {
-                array_1[i] = v[j % size_v];
-            }
-        }
-    }
-
-    return event_ref;
-}
-
-template <typename _DataType, typename _IndecesType, typename _ValueType>
-void dpnp_put_c(void *array1_in,
-                void *ind_in,
-                void *v_in,
-                const size_t size,
-                const size_t size_ind,
-                const size_t size_v)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref =
-        dpnp_put_c<_DataType, _IndecesType, _ValueType>(
-            q_ref, array1_in, ind_in, v_in, size, size_ind, size_v,
-            dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType, typename _IndecesType, typename _ValueType>
-void (*dpnp_put_default_c)(void *,
-                           void *,
-                           void *,
-                           const size_t,
-                           const size_t,
-                           const size_t) =
-    dpnp_put_c<_DataType, _IndecesType, _ValueType>;
-
-template <typename _DataType>
-DPCTLSyclEventRef
-    dpnp_put_along_axis_c(DPCTLSyclQueueRef q_ref,
-                          void *arr_in,
-                          long *indices_in,
-                          void *values_in,
-                          size_t axis,
-                          const shape_elem_type *shape,
-                          size_t ndim,
-                          size_t size_indices,
-                          size_t values_size,
-                          const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    const size_t size_arr = std::accumulate(shape, shape + ndim, 1,
-                                            std::multiplies<shape_elem_type>());
-
-    DPNPC_ptr_adapter<size_t> input1_ptr(q_ref, indices_in, size_indices, true);
-    DPNPC_ptr_adapter<_DataType> input2_ptr(q_ref, values_in, values_size,
-                                            true);
-    DPNPC_ptr_adapter<_DataType> result_ptr(q_ref, arr_in, size_arr, true,
-                                            true);
-    size_t *indices = input1_ptr.get_ptr();
-    _DataType *values = input2_ptr.get_ptr();
-    _DataType *arr = result_ptr.get_ptr();
-
-    if (axis != (ndim - 1)) {
-        std::vector<size_t> res_shape;
-        for (size_t i = 0; i < ndim; i++) {
-            if (axis != i) {
-                res_shape.push_back(shape[i]);
-            }
-        }
-        size_t res_ndim = res_shape.size();
-
-        size_t prod = 1;
-        for (size_t i = 0; i < res_ndim; ++i) {
-            if (res_shape[i] != 0) {
-                prod *= res_shape[i];
-            }
-        }
-
-        size_t *ind_array = new size_t[prod];
-        bool *bool_ind_array = new bool[prod];
-        for (size_t i = 0; i < prod; ++i) {
-            bool_ind_array[i] = true;
-        }
-
-        size_t *arr_shape_offsets = new size_t[ndim];
-        size_t acc = 1;
-        for (size_t i = ndim - 1; i > 0; --i) {
-            arr_shape_offsets[i] = acc;
-            acc *= shape[i];
-        }
-        arr_shape_offsets[0] = acc;
-
-        size_t *output_shape_offsets = new size_t[res_ndim];
-        acc = 1;
-        if (res_ndim > 0) {
-            for (size_t i = res_ndim - 1; i > 0; --i) {
-                output_shape_offsets[i] = acc;
-                acc *= res_shape[i];
-            }
-        }
-        output_shape_offsets[0] = acc;
-
-        size_t size_result = 1;
-        for (size_t i = 0; i < res_ndim; ++i) {
-            size_result *= res_shape[i];
-        }
-
-        // init result array
-        size_t *xyz = new size_t[res_ndim];
-        for (size_t result_idx = 0; result_idx < size_result; ++result_idx) {
-            size_t remainder = result_idx;
-            for (size_t i = 0; i < res_ndim; ++i) {
-                xyz[i] = remainder / output_shape_offsets[i];
-                remainder = remainder - xyz[i] * output_shape_offsets[i];
-            }
-
-            // FIXME: computed and unused. Commented out per compiler warning
-            // size_t source_axis[ndim];
-            // size_t result_axis_idx = 0;
-            // for (size_t idx = 0; idx < ndim; ++idx) {
-            //     bool found = false;
-            //     if (axis == idx) {
-            //         found = true;
-            //     }
-            //     if (found) {
-            //         source_axis[idx] = 0;
-            //     }
-            //     else {
-            //         source_axis[idx] = xyz[result_axis_idx];
-            //         result_axis_idx++;
-            //     }
-            // }
-
-            // size_t source_idx = 0;
-            // for (size_t i = 0; i < static_cast<size_t>(ndim); ++i)
-            // {
-            //   source_idx += arr_shape_offsets[i] * source_axis[i];
-            // }
-        }
-
-        for (size_t source_idx = 0; source_idx < size_arr; ++source_idx) {
-            // reconstruct x,y,z from linear source_idx
-            size_t remainder = source_idx;
-            for (size_t i = 0; i < ndim; ++i) {
-                xyz[i] = remainder / arr_shape_offsets[i];
-                remainder = remainder - xyz[i] * arr_shape_offsets[i];
-            }
-
-            // extract result axis
-            std::vector<size_t> result_axis;
-            for (size_t idx = 0; idx < ndim; ++idx) {
-                // try to find current idx in axis array
-                if (axis != idx) {
-                    result_axis.push_back(xyz[idx]);
-                }
-            }
-
-            // Construct result offset
-            size_t result_offset = 0;
-            for (size_t i = 0; i < res_ndim; ++i) {
-                result_offset += output_shape_offsets[i] * result_axis[i];
-            }
-
-            if (bool_ind_array[result_offset]) {
-                ind_array[result_offset] = 0;
-                bool_ind_array[result_offset] = false;
-            }
-            else {
-                ind_array[result_offset] += 1;
-            }
-
-            if ((ind_array[result_offset] % size_indices) ==
-                indices[result_offset % size_indices])
-            {
-                arr[source_idx] = values[source_idx % values_size];
-            }
-        }
-
-        delete[] ind_array;
-        delete[] bool_ind_array;
-        delete[] arr_shape_offsets;
-        delete[] output_shape_offsets;
-        delete[] xyz;
-    }
-    else {
-        for (size_t i = 0; i < size_arr; ++i) {
-            size_t ind =
-                size_indices * (i / size_indices) + indices[i % size_indices];
-            arr[ind] = values[i % values_size];
-        }
-    }
-    return event_ref;
-}
-
-template <typename _DataType>
-void dpnp_put_along_axis_c(void *arr_in,
-                           long *indices_in,
-                           void *values_in,
-                           size_t axis,
-                           const shape_elem_type *shape,
-                           size_t ndim,
-                           size_t size_indices,
-                           size_t values_size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_put_along_axis_c<_DataType>(
-        q_ref, arr_in, indices_in, values_in, axis, shape, ndim, size_indices,
-        values_size, dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-}
-
-template <typename _DataType>
-void (*dpnp_put_along_axis_default_c)(void *,
-                                      long *,
-                                      void *,
-                                      size_t,
-                                      const shape_elem_type *,
-                                      size_t,
-                                      size_t,
-                                      size_t) =
-    dpnp_put_along_axis_c<_DataType>;
-
-template <typename _DataType, typename _IndecesType>
-class dpnp_take_c_kernel;
-
-template <typename _DataType, typename _IndecesType>
-DPCTLSyclEventRef dpnp_take_c(DPCTLSyclQueueRef q_ref,
-                              void *array1_in,
-                              const size_t array1_size,
-                              void *indices1,
-                              void *result1,
-                              size_t size,
-                              const DPCTLEventVectorRef dep_event_vec_ref)
-{
-    // avoid warning unused variable
-    (void)array1_size;
-    (void)dep_event_vec_ref;
-
-    DPCTLSyclEventRef event_ref = nullptr;
-    sycl::queue q = *(reinterpret_cast<sycl::queue *>(q_ref));
-
-    _DataType *array_1 = reinterpret_cast<_DataType *>(array1_in);
-    _IndecesType *indices = reinterpret_cast<_IndecesType *>(indices1);
-    _DataType *result = reinterpret_cast<_DataType *>(result1);
-
-    sycl::range<1> gws(size);
-    auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
-        const size_t idx = global_id[0];
-        result[idx] = array_1[indices[idx]];
-    };
-
-    auto kernel_func = [&](sycl::handler &cgh) {
-        cgh.parallel_for<class dpnp_take_c_kernel<_DataType, _IndecesType>>(
-            gws, kernel_parallel_for_func);
-    };
-
-    sycl::event event = q.submit(kernel_func);
-
-    event_ref = reinterpret_cast<DPCTLSyclEventRef>(&event);
-    return DPCTLEvent_Copy(event_ref);
-}
-
-template <typename _DataType, typename _IndecesType>
-void dpnp_take_c(void *array1_in,
-                 const size_t array1_size,
-                 void *indices1,
-                 void *result1,
-                 size_t size)
-{
-    DPCTLSyclQueueRef q_ref = reinterpret_cast<DPCTLSyclQueueRef>(&DPNP_QUEUE);
-    DPCTLEventVectorRef dep_event_vec_ref = nullptr;
-    DPCTLSyclEventRef event_ref = dpnp_take_c<_DataType, _IndecesType>(
-        q_ref, array1_in, array1_size, indices1, result1, size,
-        dep_event_vec_ref);
-    DPCTLEvent_WaitAndThrow(event_ref);
-    DPCTLEvent_Delete(event_ref);
-}
-
-template <typename _DataType, typename _IndecesType>
-void (*dpnp_take_default_c)(void *, const size_t, void *, void *, size_t) =
-    dpnp_take_c<_DataType, _IndecesType>;
-
-template <typename _DataType, typename _IndecesType>
-DPCTLSyclEventRef (*dpnp_take_ext_c)(DPCTLSyclQueueRef,
-                                     void *,
-                                     const size_t,
-                                     void *,
-                                     void *,
-                                     size_t,
-                                     const DPCTLEventVectorRef) =
-    dpnp_take_c<_DataType, _IndecesType>;
-
 void func_map_init_indexing_func(func_map_t &fmap)
 {
     fmap[DPNPFuncName::DPNP_FN_CHOOSE][eft_INT][eft_INT] = {
@@ -847,85 +160,5 @@ void func_map_init_indexing_func(func_map_t &fmap)
         eft_FLT, (void *)dpnp_choose_ext_c<int64_t, float>};
     fmap[DPNPFuncName::DPNP_FN_CHOOSE_EXT][eft_LNG][eft_DBL] = {
         eft_DBL, (void *)dpnp_choose_ext_c<int64_t, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_DIAGONAL][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_diagonal_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIAGONAL][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_diagonal_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_DIAGONAL][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_diagonal_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_DIAGONAL][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_diagonal_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_FILL_DIAGONAL][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_fill_diagonal_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_FILL_DIAGONAL][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_fill_diagonal_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_FILL_DIAGONAL][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_fill_diagonal_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_FILL_DIAGONAL][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_fill_diagonal_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_NONZERO][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_nonzero_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_NONZERO][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_nonzero_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_NONZERO][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_nonzero_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_NONZERO][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_nonzero_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_PLACE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_place_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_PLACE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_place_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_PLACE][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_place_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_PLACE][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_place_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_PUT][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_put_default_c<int32_t, int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_PUT][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_put_default_c<int64_t, int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_PUT][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_put_default_c<float, int64_t, float>};
-    fmap[DPNPFuncName::DPNP_FN_PUT][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_put_default_c<double, int64_t, double>};
-
-    fmap[DPNPFuncName::DPNP_FN_PUT_ALONG_AXIS][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_put_along_axis_default_c<int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_PUT_ALONG_AXIS][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_put_along_axis_default_c<int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_PUT_ALONG_AXIS][eft_FLT][eft_FLT] = {
-        eft_FLT, (void *)dpnp_put_along_axis_default_c<float>};
-    fmap[DPNPFuncName::DPNP_FN_PUT_ALONG_AXIS][eft_DBL][eft_DBL] = {
-        eft_DBL, (void *)dpnp_put_along_axis_default_c<double>};
-
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_BLN][eft_INT] = {
-        eft_BLN, (void *)dpnp_take_default_c<bool, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_INT][eft_INT] = {
-        eft_INT, (void *)dpnp_take_default_c<int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_LNG][eft_INT] = {
-        eft_LNG, (void *)dpnp_take_default_c<int64_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_FLT][eft_INT] = {
-        eft_FLT, (void *)dpnp_take_default_c<float, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_DBL][eft_INT] = {
-        eft_DBL, (void *)dpnp_take_default_c<double, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_C128][eft_INT] = {
-        eft_C128, (void *)dpnp_take_default_c<std::complex<double>, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_BLN][eft_LNG] = {
-        eft_BLN, (void *)dpnp_take_default_c<bool, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_INT][eft_LNG] = {
-        eft_INT, (void *)dpnp_take_default_c<int32_t, int32_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_LNG][eft_LNG] = {
-        eft_LNG, (void *)dpnp_take_default_c<int64_t, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_FLT][eft_LNG] = {
-        eft_FLT, (void *)dpnp_take_default_c<float, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_DBL][eft_LNG] = {
-        eft_DBL, (void *)dpnp_take_default_c<double, int64_t>};
-    fmap[DPNPFuncName::DPNP_FN_TAKE][eft_C128][eft_LNG] = {
-        eft_C128, (void *)dpnp_take_default_c<std::complex<double>, int64_t>};
-
     return;
 }

From 740b08bef1b936e1c450db609fe60b6f679891f5 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Thu, 4 Jul 2024 14:19:38 +0200
Subject: [PATCH 45/49] Update `dpnp.extract` implementation to get rid of
 limitations for input arguments (#1906)

* Remove limitations from dpnp.extract implementation

* Add more tests

* Tune rtol and atol for a histogram test, since might fail on Windows

* Fix a typo in description

* Add test to cover condition as list
---
 doc/reference/sorting.rst                     |   2 +-
 dpnp/dpnp_iface_indexing.py                   |  92 +++++--
 tests/skipped_tests.tbl                       |   5 -
 tests/skipped_tests_gpu.tbl                   |   5 -
 tests/test_histogram.py                       |   2 +-
 tests/test_indexing.py                        | 240 ++++++++++++------
 tests/test_sycl_queue.py                      |   1 +
 tests/test_usm_type.py                        |   2 +
 .../cupy/indexing_tests/test_indexing.py      |  47 +++-
 9 files changed, 276 insertions(+), 120 deletions(-)

diff --git a/doc/reference/sorting.rst b/doc/reference/sorting.rst
index d0a966c67316..ead79b1098a4 100644
--- a/doc/reference/sorting.rst
+++ b/doc/reference/sorting.rst
@@ -31,10 +31,10 @@ Searching
    dpnp.nanargmax
    dpnp.argmin
    dpnp.nanargmin
+   dpnp.argwhere
    dpnp.nonzero
    dpnp.flatnonzero
    dpnp.where
-   dpnp.argwhere
    dpnp.searchsorted
    dpnp.extract
 
diff --git a/dpnp/dpnp_iface_indexing.py b/dpnp/dpnp_iface_indexing.py
index 0a1c8529c425..20a046c82c14 100644
--- a/dpnp/dpnp_iface_indexing.py
+++ b/dpnp/dpnp_iface_indexing.py
@@ -490,42 +490,86 @@ def diagonal(a, offset=0, axis1=0, axis2=1):
     )
 
 
-def extract(condition, x):
+def extract(condition, a):
     """
     Return the elements of an array that satisfy some condition.
 
+    This is equivalent to
+    ``dpnp.compress(dpnp.ravel(condition), dpnp.ravel(a))``. If `condition`
+    is boolean :obj:`dpnp.extract` is equivalent to ``a[condition]``.
+
+    Note that :obj:`dpnp.place` does the exact opposite of :obj:`dpnp.extract`.
+
     For full documentation refer to :obj:`numpy.extract`.
 
+    Parameters
+    ----------
+    condition : {array_like, scalar}
+        An array whose non-zero or ``True`` entries indicate the element of `a`
+        to extract.
+    a : {dpnp_array, usm_ndarray}
+        Input array of the same size as `condition`.
+
     Returns
     -------
     out : dpnp.ndarray
-        Rank 1 array of values from `x` where `condition` is True.
+        Rank 1 array of values from `a` where `condition` is ``True``.
+
+    See Also
+    --------
+    :obj:`dpnp.take` : Take elements from an array along an axis.
+    :obj:`dpnp.put` : Replaces specified elements of an array with given values.
+    :obj:`dpnp.copyto` : Copies values from one array to another, broadcasting
+                         as necessary.
+    :obj:`dpnp.compress` : eturn selected slices of an array along given axis.
+    :obj:`dpnp.place` : Change elements of an array based on conditional and
+                        input values.
+
+    Examples
+    --------
+    >>> import dpnp as np
+    >>> a = np.arange(12).reshape((3, 4))
+    >>> a
+    array([[ 0,  1,  2,  3],
+           [ 4,  5,  6,  7],
+           [ 8,  9, 10, 11]])
+    >>> condition = np.mod(a, 3) == 0
+    >>> condition
+    array([[ True, False, False,  True],
+           [False, False,  True, False],
+           [False,  True, False, False]])
+    >>> np.extract(condition, a)
+    array([0, 3, 6, 9])
+
+    If `condition` is boolean:
+
+    >>> a[condition]
+    array([0, 3, 6, 9])
 
-    Limitations
-    -----------
-    Parameters `condition` and `x` are supported either as
-    :class:`dpnp.ndarray` or :class:`dpctl.tensor.usm_ndarray`.
-    Parameter `x` must be the same shape as `condition`.
-    Otherwise the function will be executed sequentially on CPU.
     """
 
-    if dpnp.is_supported_array_type(condition) and dpnp.is_supported_array_type(
-        x
-    ):
-        if condition.shape != x.shape:
-            pass
-        else:
-            dpt_condition = (
-                condition.get_array()
-                if isinstance(condition, dpnp_array)
-                else condition
-            )
-            dpt_array = x.get_array() if isinstance(x, dpnp_array) else x
-            return dpnp_array._create_from_usm_ndarray(
-                dpt.extract(dpt_condition, dpt_array)
-            )
+    usm_a = dpnp.get_usm_ndarray(a)
+    if not dpnp.is_supported_array_type(condition):
+        usm_cond = dpt.asarray(
+            condition, usm_type=a.usm_type, sycl_queue=a.sycl_queue
+        )
+    else:
+        usm_cond = dpnp.get_usm_ndarray(condition)
+
+    if usm_cond.size != usm_a.size:
+        usm_a = dpt.reshape(usm_a, -1)
+        usm_cond = dpt.reshape(usm_cond, -1)
+
+        usm_res = dpt.take(usm_a, dpt.nonzero(usm_cond)[0])
+    else:
+        if usm_cond.shape != usm_a.shape:
+            usm_a = dpt.reshape(usm_a, -1)
+            usm_cond = dpt.reshape(usm_cond, -1)
+
+        usm_res = dpt.extract(usm_cond, usm_a)
 
-    return call_origin(numpy.extract, condition, x)
+    dpnp.synchronize_array_data(usm_res)
+    return dpnp_array._create_from_usm_ndarray(usm_res)
 
 
 def fill_diagonal(a, val, wrap=False):
diff --git a/tests/skipped_tests.tbl b/tests/skipped_tests.tbl
index 37285be810fe..199566295a34 100644
--- a/tests/skipped_tests.tbl
+++ b/tests/skipped_tests.tbl
@@ -124,11 +124,6 @@ tests/third_party/cupy/indexing_tests/test_generate.py::TestUnravelIndex::test_i
 tests/third_party/cupy/indexing_tests/test_generate.py::TestUnravelIndex::test_invalid_index
 tests/third_party/cupy/indexing_tests/test_generate.py::TestUnravelIndex::test_invalid_order
 
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress_empty_1dim
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress_empty_1dim_no_axis
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress_no_axis
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress_no_bool
 tests/third_party/cupy/indexing_tests/test_indexing.py::TestSelect::test_select
 tests/third_party/cupy/indexing_tests/test_indexing.py::TestSelect::test_select_1D_choicelist
 tests/third_party/cupy/indexing_tests/test_indexing.py::TestSelect::test_select_choicelist_condlist_broadcast
diff --git a/tests/skipped_tests_gpu.tbl b/tests/skipped_tests_gpu.tbl
index 55fd91b0defb..26b521905396 100644
--- a/tests/skipped_tests_gpu.tbl
+++ b/tests/skipped_tests_gpu.tbl
@@ -174,11 +174,6 @@ tests/third_party/cupy/indexing_tests/test_generate.py::TestUnravelIndex::test_i
 tests/third_party/cupy/indexing_tests/test_generate.py::TestUnravelIndex::test_invalid_index
 tests/third_party/cupy/indexing_tests/test_generate.py::TestUnravelIndex::test_invalid_order
 
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress_empty_1dim
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress_empty_1dim_no_axis
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress_no_axis
-tests/third_party/cupy/indexing_tests/test_indexing.py::TestIndexing::test_compress_no_bool
 tests/third_party/cupy/indexing_tests/test_indexing.py::TestSelect::test_select
 tests/third_party/cupy/indexing_tests/test_indexing.py::TestSelect::test_select_1D_choicelist
 tests/third_party/cupy/indexing_tests/test_indexing.py::TestSelect::test_select_choicelist_condlist_broadcast
diff --git a/tests/test_histogram.py b/tests/test_histogram.py
index da58a4ac2f8f..0e6e33fd99c4 100644
--- a/tests/test_histogram.py
+++ b/tests/test_histogram.py
@@ -182,7 +182,7 @@ def test_density(self, dtype):
         result_hist, result_edges = dpnp.histogram(iv, density=True)
 
         if numpy.issubdtype(dtype, numpy.inexact):
-            tol = numpy.finfo(dtype).resolution
+            tol = 4 * numpy.finfo(dtype).resolution
             assert_allclose(result_hist, expected_hist, rtol=tol, atol=tol)
             assert_allclose(result_edges, expected_edges, rtol=tol, atol=tol)
         else:
diff --git a/tests/test_indexing.py b/tests/test_indexing.py
index f001f994dbd9..8b54bc482ce3 100644
--- a/tests/test_indexing.py
+++ b/tests/test_indexing.py
@@ -7,6 +7,7 @@
     assert_array_equal,
     assert_equal,
     assert_raises,
+    assert_raises_regex,
 )
 
 import dpnp
@@ -29,6 +30,169 @@ def wrapped(a, axis, **kwargs):
     return wrapped
 
 
+class TestDiagonal:
+    @pytest.mark.parametrize("dtype", get_all_dtypes(no_bool=True))
+    @pytest.mark.parametrize("offset", [-3, -1, 0, 1, 3])
+    @pytest.mark.parametrize(
+        "shape",
+        [(2, 2), (3, 3), (2, 5), (3, 2, 2), (2, 2, 2, 2), (2, 2, 2, 3)],
+        ids=[
+            "(2,2)",
+            "(3,3)",
+            "(2,5)",
+            "(3,2,2)",
+            "(2,2,2,2)",
+            "(2,2,2,3)",
+        ],
+    )
+    def test_diagonal_offset(self, shape, dtype, offset):
+        a = numpy.arange(numpy.prod(shape), dtype=dtype).reshape(shape)
+        a_dp = dpnp.array(a)
+        expected = numpy.diagonal(a, offset)
+        result = dpnp.diagonal(a_dp, offset)
+        assert_array_equal(expected, result)
+
+    @pytest.mark.parametrize("dtype", get_all_dtypes(no_bool=True))
+    @pytest.mark.parametrize(
+        "shape, axis_pairs",
+        [
+            ((3, 4), [(0, 1), (1, 0)]),
+            ((3, 4, 5), [(0, 1), (1, 2), (0, 2)]),
+            ((4, 3, 5, 2), [(0, 1), (1, 2), (2, 3), (0, 3)]),
+        ],
+    )
+    def test_diagonal_axes(self, shape, axis_pairs, dtype):
+        a = numpy.arange(numpy.prod(shape), dtype=dtype).reshape(shape)
+        a_dp = dpnp.array(a)
+        for axis1, axis2 in axis_pairs:
+            expected = numpy.diagonal(a, axis1=axis1, axis2=axis2)
+            result = dpnp.diagonal(a_dp, axis1=axis1, axis2=axis2)
+            assert_array_equal(expected, result)
+
+    def test_diagonal_errors(self):
+        a = dpnp.arange(12).reshape(3, 4)
+
+        # unsupported type
+        a_np = dpnp.asnumpy(a)
+        assert_raises(TypeError, dpnp.diagonal, a_np)
+
+        # a.ndim < 2
+        a_ndim_1 = a.flatten()
+        assert_raises(ValueError, dpnp.diagonal, a_ndim_1)
+
+        # unsupported type `offset`
+        assert_raises(TypeError, dpnp.diagonal, a, offset=1.0)
+        assert_raises(TypeError, dpnp.diagonal, a, offset=[0])
+
+        # axes are out of bounds
+        assert_raises(numpy.AxisError, a.diagonal, axis1=0, axis2=5)
+        assert_raises(numpy.AxisError, a.diagonal, axis1=5, axis2=0)
+        assert_raises(numpy.AxisError, a.diagonal, axis1=5, axis2=5)
+
+        # same axes
+        assert_raises(ValueError, a.diagonal, axis1=1, axis2=1)
+        assert_raises(ValueError, a.diagonal, axis1=1, axis2=-1)
+
+
+class TestExtins:
+    @pytest.mark.parametrize("a_dt", get_all_dtypes(no_none=True))
+    @pytest.mark.parametrize("cond_dt", get_all_dtypes(no_none=True))
+    def test_extract_diff_dtypes(self, a_dt, cond_dt):
+        a = numpy.array([-2, -1, 0, 1, 2, 3], dtype=a_dt)
+        cond = numpy.array([1, -1, 2, 0, -2, 3], dtype=cond_dt)
+        ia, icond = dpnp.array(a), dpnp.array(cond)
+
+        result = dpnp.extract(icond, ia)
+        expected = numpy.extract(cond, a)
+        assert_array_equal(result, expected)
+
+    @pytest.mark.parametrize("dt", get_all_dtypes(no_none=True))
+    def test_extract(self, dt):
+        a = numpy.array([1, 3, 2, 1, 2, 3, 3], dtype=dt)
+        ia = dpnp.array(a)
+
+        result = dpnp.extract(ia > 1, ia)
+        expected = numpy.extract(a > 1, a)
+        assert_array_equal(result, expected)
+
+    @pytest.mark.parametrize("a_dt", get_all_dtypes(no_none=True))
+    def test_extract_list_cond(self, a_dt):
+        a = numpy.array([-2, -1, 0, 1, 2, 3], dtype=a_dt)
+        cond = [1, -1, 2, 0, -2, 3]
+        ia = dpnp.array(a)
+
+        result = dpnp.extract(cond, ia)
+        expected = numpy.extract(cond, a)
+        assert_array_equal(result, expected)
+
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
+    @pytest.mark.parametrize("dt", get_all_dtypes(no_none=True))
+    def test_place(self, dt):
+        a = numpy.array([1, 4, 3, 2, 5, 8, 7], dtype=dt)
+        ia = dpnp.array(a)
+
+        dpnp.place(ia, [0, 1, 0, 1, 0, 1, 0], [2, 4, 6])
+        numpy.place(a, [0, 1, 0, 1, 0, 1, 0], [2, 4, 6])
+        assert_array_equal(ia, a)
+
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
+    def test_place_broadcast_vals(self):
+        a = numpy.array([1, 4, 3, 2, 5, 8, 7])
+        ia = dpnp.array(a)
+
+        dpnp.place(ia, [1, 0, 1, 0, 1, 0, 1], [8, 9])
+        numpy.place(a, [1, 0, 1, 0, 1, 0, 1], [8, 9])
+        assert_array_equal(ia, a)
+
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
+    def test_place_empty_vals(self):
+        a = numpy.array([1, 4, 3, 2, 5, 8, 7])
+        mask = numpy.zeros(7)
+        ia, imask = dpnp.array(a), dpnp.array(mask)
+        vals = []
+
+        dpnp.place(ia, imask, vals)
+        numpy.place(a, mask, vals)
+        assert_array_equal(ia, a)
+
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
+    @pytest.mark.parametrize("xp", [numpy, dpnp])
+    def test_place_insert_from_empty_vals(self, xp):
+        a = xp.array([1, 4, 3, 2, 5, 8, 7])
+        assert_raises_regex(
+            ValueError,
+            "Cannot insert from an empty array",
+            lambda: xp.place(a, [0, 0, 0, 0, 0, 1, 0], []),
+        )
+
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
+    @pytest.mark.parametrize("xp", [numpy, dpnp])
+    def test_place_wrong_array_type(self, xp):
+        assert_raises(TypeError, xp.place, [1, 2, 3], [True, False], [0, 1])
+
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
+    @pytest.mark.parametrize("dt", get_all_dtypes(no_none=True))
+    def test_both(self, dt):
+        a = numpy.random.rand(10).astype(dt)
+        mask = a > 0.5
+        ia, imask = dpnp.array(a), dpnp.array(mask)
+
+        result = dpnp.extract(imask, ia)
+        expected = numpy.extract(mask, a)
+        assert_array_equal(result, expected)
+
+        ic = dpnp.extract(imask, ia)
+        c = numpy.extract(mask, a)
+        assert_array_equal(ic, c)
+
+        dpnp.place(ia, imask, 0)
+        dpnp.place(ia, imask, ic)
+
+        numpy.place(a, mask, 0)
+        numpy.place(a, mask, c)
+        assert_array_equal(ia, a)
+
+
 class TestIndexing:
     def test_ellipsis_index(self):
         a = dpnp.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
@@ -477,82 +641,6 @@ def test_choose():
     assert_array_equal(expected, result)
 
 
-class TestDiagonal:
-    @pytest.mark.parametrize("dtype", get_all_dtypes(no_bool=True))
-    @pytest.mark.parametrize("offset", [-3, -1, 0, 1, 3])
-    @pytest.mark.parametrize(
-        "shape",
-        [(2, 2), (3, 3), (2, 5), (3, 2, 2), (2, 2, 2, 2), (2, 2, 2, 3)],
-        ids=[
-            "(2,2)",
-            "(3,3)",
-            "(2,5)",
-            "(3,2,2)",
-            "(2,2,2,2)",
-            "(2,2,2,3)",
-        ],
-    )
-    def test_diagonal_offset(self, shape, dtype, offset):
-        a = numpy.arange(numpy.prod(shape), dtype=dtype).reshape(shape)
-        a_dp = dpnp.array(a)
-        expected = numpy.diagonal(a, offset)
-        result = dpnp.diagonal(a_dp, offset)
-        assert_array_equal(expected, result)
-
-    @pytest.mark.parametrize("dtype", get_all_dtypes(no_bool=True))
-    @pytest.mark.parametrize(
-        "shape, axis_pairs",
-        [
-            ((3, 4), [(0, 1), (1, 0)]),
-            ((3, 4, 5), [(0, 1), (1, 2), (0, 2)]),
-            ((4, 3, 5, 2), [(0, 1), (1, 2), (2, 3), (0, 3)]),
-        ],
-    )
-    def test_diagonal_axes(self, shape, axis_pairs, dtype):
-        a = numpy.arange(numpy.prod(shape), dtype=dtype).reshape(shape)
-        a_dp = dpnp.array(a)
-        for axis1, axis2 in axis_pairs:
-            expected = numpy.diagonal(a, axis1=axis1, axis2=axis2)
-            result = dpnp.diagonal(a_dp, axis1=axis1, axis2=axis2)
-            assert_array_equal(expected, result)
-
-    def test_diagonal_errors(self):
-        a = dpnp.arange(12).reshape(3, 4)
-
-        # unsupported type
-        a_np = dpnp.asnumpy(a)
-        assert_raises(TypeError, dpnp.diagonal, a_np)
-
-        # a.ndim < 2
-        a_ndim_1 = a.flatten()
-        assert_raises(ValueError, dpnp.diagonal, a_ndim_1)
-
-        # unsupported type `offset`
-        assert_raises(TypeError, dpnp.diagonal, a, offset=1.0)
-        assert_raises(TypeError, dpnp.diagonal, a, offset=[0])
-
-        # axes are out of bounds
-        assert_raises(numpy.AxisError, a.diagonal, axis1=0, axis2=5)
-        assert_raises(numpy.AxisError, a.diagonal, axis1=5, axis2=0)
-        assert_raises(numpy.AxisError, a.diagonal, axis1=5, axis2=5)
-
-        # same axes
-        assert_raises(ValueError, a.diagonal, axis1=1, axis2=1)
-        assert_raises(ValueError, a.diagonal, axis1=1, axis2=-1)
-
-
-@pytest.mark.parametrize("arr_dtype", get_all_dtypes())
-@pytest.mark.parametrize("cond_dtype", get_all_dtypes())
-def test_extract_1d(arr_dtype, cond_dtype):
-    a = numpy.array([-2, -1, 0, 1, 2, 3], dtype=arr_dtype)
-    ia = dpnp.array(a)
-    cond = numpy.array([1, -1, 2, 0, -2, 3], dtype=cond_dtype)
-    icond = dpnp.array(cond)
-    expected = numpy.extract(cond, a)
-    result = dpnp.extract(icond, ia)
-    assert_array_equal(expected, result)
-
-
 @pytest.mark.parametrize("val", [-1, 0, 1], ids=["-1", "0", "1"])
 @pytest.mark.parametrize(
     "array",
diff --git a/tests/test_sycl_queue.py b/tests/test_sycl_queue.py
index f7c70320dbfb..1ea5592ecc44 100644
--- a/tests/test_sycl_queue.py
+++ b/tests/test_sycl_queue.py
@@ -647,6 +647,7 @@ def test_reduce_hypot(device):
         pytest.param("dot", [3.0, 4.0, 5.0], [1.0, 2.0, 3.0]),
         pytest.param("dot", [3, 4, 5], [1, 2, 3]),
         pytest.param("dot", [3 + 2j, 4 + 1j, 5], [1, 2 + 3j, 3]),
+        pytest.param("extract", [False, True, True, False], [0, 1, 2, 3]),
         pytest.param(
             "floor_divide", [1.0, 2.0, 3.0, 4.0], [2.5, 2.5, 2.5, 2.5]
         ),
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index 8d43bccd75a0..d38acc4a6570 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -637,6 +637,8 @@ def test_1in_1out(func, data, usm_type):
         pytest.param("dot", [3.0, 4.0, 5.0], [1.0, 2.0, 3.0]),
         pytest.param("dot", [3, 4, 5], [1, 2, 3]),
         pytest.param("dot", [3 + 2j, 4 + 1j, 5], [1, 2 + 3j, 3]),
+        # TODO: uncomment once resolved in gh-1723 by dpctl
+        # pytest.param("extract", [False, True, True, False], [0, 1, 2, 3]),
         pytest.param("fmax", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
         pytest.param("fmin", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
         pytest.param("fmod", [5, 3], [2, 2.0]),
diff --git a/tests/third_party/cupy/indexing_tests/test_indexing.py b/tests/third_party/cupy/indexing_tests/test_indexing.py
index 7d05eedd2c3f..6696bc470870 100644
--- a/tests/third_party/cupy/indexing_tests/test_indexing.py
+++ b/tests/third_party/cupy/indexing_tests/test_indexing.py
@@ -4,6 +4,7 @@
 import pytest
 
 import dpnp as cupy
+from tests.helper import has_support_aspect64
 from tests.third_party.cupy import testing
 
 
@@ -35,8 +36,10 @@ def test_take_no_axis(self, xp):
         return a.take(b)
 
     # see cupy#3017
+    # mark slow as NumPy could go OOM on the Windows CI
+    @testing.slow
     @testing.for_int_dtypes(no_bool=True)
-    @testing.numpy_cupy_array_equal()
+    @testing.numpy_cupy_array_equal(type_check=has_support_aspect64())
     def test_take_index_range_overflow(self, xp, dtype):
         # Skip for too large dimensions
         if numpy.dtype(dtype) in (numpy.int64, numpy.uint64):
@@ -46,7 +49,7 @@ def test_take_index_range_overflow(self, xp, dtype):
         if dtype in (numpy.int32, numpy.uint32):
             pytest.skip()
         iinfo = numpy.iinfo(dtype)
-        a = xp.broadcast_to(xp.ones(1, dtype=dtype), (iinfo.max + 1,))
+        a = xp.broadcast_to(xp.ones(1), (iinfo.max + 1,))
         b = xp.array([0], dtype=dtype)
         return a.take(b)
 
@@ -62,18 +65,21 @@ def test_take_along_axis_none_axis(self, xp):
         b = testing.shaped_random((30,), xp, dtype="int64", scale=24)
         return xp.take_along_axis(a, b, axis=None)
 
+    @pytest.mark.skip("compress() is not implemented yet")
     @testing.numpy_cupy_array_equal()
     def test_compress(self, xp):
         a = testing.shaped_arange((3, 4, 5), xp)
         b = xp.array([True, False, True])
         return xp.compress(b, a, axis=1)
 
+    @pytest.mark.skip("compress() is not implemented yet")
     @testing.numpy_cupy_array_equal()
     def test_compress_no_axis(self, xp):
         a = testing.shaped_arange((3, 4, 5), xp)
         b = xp.array([True, False, True])
         return xp.compress(b, a)
 
+    @pytest.mark.skip("compress() is not implemented yet")
     @testing.for_int_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_compress_no_bool(self, xp, dtype):
@@ -81,18 +87,34 @@ def test_compress_no_bool(self, xp, dtype):
         b = testing.shaped_arange((3,), xp, dtype)
         return xp.compress(b, a, axis=1)
 
+    @pytest.mark.skip("compress() is not implemented yet")
+    @testing.numpy_cupy_array_equal()
+    def test_compress_overrun_false(self, xp):
+        a = testing.shaped_arange((3,), xp)
+        b = xp.array([True, False, True, False, False, False])
+        return xp.compress(b, a)
+
+    @pytest.mark.skip("compress() is not implemented yet")
     @testing.numpy_cupy_array_equal()
     def test_compress_empty_1dim(self, xp):
         a = testing.shaped_arange((3, 4, 5), xp)
         b = xp.array([])
         return xp.compress(b, a, axis=1)
 
+    @pytest.mark.skip("compress() is not implemented yet")
     @testing.numpy_cupy_array_equal()
     def test_compress_empty_1dim_no_axis(self, xp):
         a = testing.shaped_arange((3, 4, 5), xp)
         b = xp.array([])
         return xp.compress(b, a)
 
+    @pytest.mark.skip("compress() is not implemented yet")
+    @testing.numpy_cupy_array_equal()
+    def test_compress_0dim(self, xp):
+        a = xp.array(3)
+        b = xp.array([True])
+        return xp.compress(b, a)
+
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_diagonal(self, xp, dtype):
@@ -162,28 +184,24 @@ def test_extract_no_bool(self, xp, dtype):
         b = xp.array([[1, 0, 1], [0, 1, 0], [1, 0, 1]], dtype=dtype)
         return xp.extract(b, a)
 
-    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.numpy_cupy_array_equal()
     def test_extract_shape_mismatch(self, xp):
         a = testing.shaped_arange((2, 3), xp)
         b = xp.array([[True, False], [True, False], [True, False]])
         return xp.extract(b, a)
 
-    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.numpy_cupy_array_equal()
     def test_extract_size_mismatch(self, xp):
         a = testing.shaped_arange((3, 3), xp)
         b = xp.array([[True, False, True], [False, True, False]])
         return xp.extract(b, a)
 
-    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.numpy_cupy_array_equal()
     def test_extract_size_mismatch2(self, xp):
         a = testing.shaped_arange((3, 3), xp)
         b = xp.array([[True, False, True, False], [False, True, False, True]])
         return xp.extract(b, a)
 
-    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.numpy_cupy_array_equal()
     def test_extract_empty_1dim(self, xp):
         a = testing.shaped_arange((3, 3), xp)
@@ -191,7 +209,6 @@ def test_extract_empty_1dim(self, xp):
         return xp.extract(b, a)
 
 
-@pytest.mark.usefixtures("allow_fall_back_on_numpy")
 class TestChoose(unittest.TestCase):
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
@@ -200,13 +217,15 @@ def test_choose(self, xp, dtype):
         c = testing.shaped_arange((3, 4), xp, dtype)
         return a.choose(c)
 
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_choose_broadcast(self, xp, dtype):
         a = xp.array([[1, 0, 1], [0, 1, 0], [1, 0, 1]])
-        c = xp.array([-10, 10], dtype=dtype)
+        c = xp.array([-10, 10]).astype(dtype)
         return a.choose(c)
 
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_choose_broadcast2(self, xp, dtype):
@@ -214,6 +233,7 @@ def test_choose_broadcast2(self, xp, dtype):
         c = testing.shaped_arange((3, 5, 2), xp, dtype)
         return a.choose(c)
 
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_choose_wrap(self, xp, dtype):
@@ -221,6 +241,7 @@ def test_choose_wrap(self, xp, dtype):
         c = testing.shaped_arange((3, 4), xp, dtype)
         return a.choose(c, mode="wrap")
 
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.for_all_dtypes()
     @testing.numpy_cupy_array_equal()
     def test_choose_clip(self, xp, dtype):
@@ -228,6 +249,7 @@ def test_choose_clip(self, xp, dtype):
         c = testing.shaped_arange((3, 4), xp, dtype)
         return a.choose(c, mode="clip")
 
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.with_requires("numpy>=1.19")
     def test_unknown_clip(self):
         for xp in (numpy, cupy):
@@ -236,12 +258,14 @@ def test_unknown_clip(self):
             with pytest.raises(ValueError):
                 a.choose(c, mode="unknown")
 
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     def test_raise(self):
         a = cupy.array([2])
         c = cupy.array([[0, 1]])
         with self.assertRaises(ValueError):
             a.choose(c)
 
+    @pytest.mark.usefixtures("allow_fall_back_on_numpy")
     @testing.for_all_dtypes()
     def test_choose_broadcast_fail(self, dtype):
         for xp in (numpy, cupy):
@@ -370,3 +394,10 @@ def test_select_default_scalar(self, dtype):
         choicelist = [a, b]
         with pytest.raises(TypeError):
             cupy.select(condlist, choicelist, [dtype(2)])
+
+    @pytest.mark.skip("as_strided() is not implemented yet")
+    @testing.numpy_cupy_array_equal()
+    def test_indexing_overflows(self, xp):
+        a = xp.arange(2, dtype=xp.int32)
+        a = xp.lib.stride_tricks.as_strided(a, shape=(2, 2**32), strides=(4, 0))
+        return a[xp.array([1]), xp.array([1])]

From 05e1bb6366d09b2c230d4433fb1a2f1a627e582d Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Thu, 4 Jul 2024 18:14:42 +0200
Subject: [PATCH 46/49] Rework implementation of `dpnp.fmax` and `dpnp.fmin`
 functions (#1905)

* Implement dpnp.fmax and dpnp.fmin functions

* Updated existing tests and added new ones

* Removed unused code from cython backend

* Removed a reference to original descriptor
---
 doc/reference/ufunc.rst                       |   2 +
 dpnp/backend/extensions/ufunc/CMakeLists.txt  |   2 +
 .../ufunc/elementwise_functions/common.cpp    |   4 +
 .../ufunc/elementwise_functions/fmax.cpp      | 137 +++++++
 .../ufunc/elementwise_functions/fmax.hpp      |  35 ++
 .../ufunc/elementwise_functions/fmin.cpp      | 137 +++++++
 .../ufunc/elementwise_functions/fmin.hpp      |  35 ++
 dpnp/backend/extensions/vm/CMakeLists.txt     |   2 +
 dpnp/backend/extensions/vm/fmax.cpp           | 161 ++++++++
 dpnp/backend/extensions/vm/fmax.hpp           |  35 ++
 dpnp/backend/extensions/vm/fmin.cpp           | 161 ++++++++
 dpnp/backend/extensions/vm/fmin.hpp           |  35 ++
 dpnp/backend/extensions/vm/vm_py.cpp          |   4 +
 .../include/dpnp_gen_2arg_3type_tbl.hpp       |  22 --
 dpnp/backend/include/dpnp_iface_fptr.hpp      |   4 -
 dpnp/backend/kernels/dpnp_krnl_elemwise.cpp   |  42 --
 .../kernels/elementwise_functions/fmax.hpp    |  83 ++++
 .../kernels/elementwise_functions/fmin.hpp    |  83 ++++
 .../kernels/elementwise_functions/fmod.hpp    |   3 +-
 dpnp/dpnp_algo/dpnp_algo.pxd                  |  11 -
 dpnp/dpnp_algo/dpnp_algo.pyx                  |  96 -----
 dpnp/dpnp_algo/dpnp_algo_mathematical.pxi     |  18 -
 dpnp/dpnp_iface.py                            |   8 +-
 dpnp/dpnp_iface_mathematical.py               | 362 ++++++++----------
 dpnp/dpnp_utils/dpnp_algo_utils.pxd           |   8 -
 dpnp/dpnp_utils/dpnp_algo_utils.pyx           |  78 +---
 tests/skipped_tests.tbl                       |   2 -
 tests/skipped_tests_gpu.tbl                   |   2 -
 tests/test_mathematical.py                    |  75 ++++
 tests/test_usm_type.py                        |   8 +-
 30 files changed, 1154 insertions(+), 501 deletions(-)
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/fmax.cpp
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/fmax.hpp
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/fmin.cpp
 create mode 100644 dpnp/backend/extensions/ufunc/elementwise_functions/fmin.hpp
 create mode 100644 dpnp/backend/extensions/vm/fmax.cpp
 create mode 100644 dpnp/backend/extensions/vm/fmax.hpp
 create mode 100644 dpnp/backend/extensions/vm/fmin.cpp
 create mode 100644 dpnp/backend/extensions/vm/fmin.hpp
 create mode 100644 dpnp/backend/kernels/elementwise_functions/fmax.hpp
 create mode 100644 dpnp/backend/kernels/elementwise_functions/fmin.hpp

diff --git a/doc/reference/ufunc.rst b/doc/reference/ufunc.rst
index 2dffca15e889..a5b64852bd42 100644
--- a/doc/reference/ufunc.rst
+++ b/doc/reference/ufunc.rst
@@ -105,10 +105,12 @@ Comparison functions
    dpnp.less_equal
    dpnp.not_equal
    dpnp.equal
+
    dpnp.logical_and
    dpnp.logical_or
    dpnp.logical_xor
    dpnp.logical_not
+
    dpnp.maximum
    dpnp.minimum
    dpnp.fmax
diff --git a/dpnp/backend/extensions/ufunc/CMakeLists.txt b/dpnp/backend/extensions/ufunc/CMakeLists.txt
index 1d140b066584..077710cb55c2 100644
--- a/dpnp/backend/extensions/ufunc/CMakeLists.txt
+++ b/dpnp/backend/extensions/ufunc/CMakeLists.txt
@@ -26,6 +26,8 @@
 set(_elementwise_sources
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/common.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/fabs.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/fmax.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/fmin.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/fmod.cpp
 )
 
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp b/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
index b915f9a299a8..e4af134f46db 100644
--- a/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
@@ -26,6 +26,8 @@
 #include <pybind11/pybind11.h>
 
 #include "fabs.hpp"
+#include "fmax.hpp"
+#include "fmin.hpp"
 #include "fmod.hpp"
 
 namespace py = pybind11;
@@ -38,6 +40,8 @@ namespace dpnp::extensions::ufunc
 void init_elementwise_functions(py::module_ m)
 {
     init_fabs(m);
+    init_fmax(m);
+    init_fmin(m);
     init_fmod(m);
 }
 } // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/fmax.cpp b/dpnp/backend/extensions/ufunc/elementwise_functions/fmax.cpp
new file mode 100644
index 000000000000..64f68d146be8
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/fmax.cpp
@@ -0,0 +1,137 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// maxification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "fmax.hpp"
+#include "kernels/elementwise_functions/fmax.hpp"
+#include "populate.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "kernels/elementwise_functions/maximum.hpp"
+#include "utils/type_dispatch.hpp"
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace max_ns = dpctl::tensor::kernels::maximum;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+namespace impl
+{
+// Supports the same types table as for maximum function in dpctl
+template <typename T1, typename T2>
+using OutputType = max_ns::MaximumOutputType<T1, T2>;
+
+using dpnp::kernels::fmax::FmaxFunctor;
+
+template <typename argT1,
+          typename argT2,
+          typename resT,
+          unsigned int vec_sz = 4,
+          unsigned int n_vecs = 2,
+          bool enable_sg_loadstore = true>
+using ContigFunctor =
+    ew_cmn_ns::BinaryContigFunctor<argT1,
+                                   argT2,
+                                   resT,
+                                   FmaxFunctor<argT1, argT2, resT>,
+                                   vec_sz,
+                                   n_vecs,
+                                   enable_sg_loadstore>;
+
+template <typename argT1, typename argT2, typename resT, typename IndexerT>
+using StridedFunctor =
+    ew_cmn_ns::BinaryStridedFunctor<argT1,
+                                    argT2,
+                                    resT,
+                                    IndexerT,
+                                    FmaxFunctor<argT1, argT2, resT>>;
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static binary_contig_impl_fn_ptr_t fmax_contig_dispatch_table[td_ns::num_types]
+                                                             [td_ns::num_types];
+static int fmax_output_typeid_table[td_ns::num_types][td_ns::num_types];
+static binary_strided_impl_fn_ptr_t
+    fmax_strided_dispatch_table[td_ns::num_types][td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(fmax);
+} // namespace impl
+
+void init_fmax(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+    {
+        impl::populate_fmax_dispatch_tables();
+        using impl::fmax_contig_dispatch_table;
+        using impl::fmax_output_typeid_table;
+        using impl::fmax_strided_dispatch_table;
+
+        auto fmax_pyapi = [&](const arrayT &src1, const arrayT &src2,
+                              const arrayT &dst, sycl::queue &exec_q,
+                              const event_vecT &depends = {}) {
+            return py_int::py_binary_ufunc(
+                src1, src2, dst, exec_q, depends, fmax_output_typeid_table,
+                fmax_contig_dispatch_table, fmax_strided_dispatch_table,
+                // no support of C-contig row with broadcasting in OneMKL
+                td_ns::NullPtrTable<
+                    impl::
+                        binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+                td_ns::NullPtrTable<
+                    impl::
+                        binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+        };
+        m.def("_fmax", fmax_pyapi, "", py::arg("src1"), py::arg("src2"),
+              py::arg("dst"), py::arg("sycl_queue"),
+              py::arg("depends") = py::list());
+
+        auto fmax_result_type_pyapi = [&](const py::dtype &dtype1,
+                                          const py::dtype &dtype2) {
+            return py_int::py_binary_ufunc_result_type(
+                dtype1, dtype2, fmax_output_typeid_table);
+        };
+        m.def("_fmax_result_type", fmax_result_type_pyapi);
+    }
+}
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/fmax.hpp b/dpnp/backend/extensions/ufunc/elementwise_functions/fmax.hpp
new file mode 100644
index 000000000000..70d0baac314c
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/fmax.hpp
@@ -0,0 +1,35 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <pybind11/pybind11.h>
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+void init_fmax(py::module_ m);
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/fmin.cpp b/dpnp/backend/extensions/ufunc/elementwise_functions/fmin.cpp
new file mode 100644
index 000000000000..0972ffde9226
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/fmin.cpp
@@ -0,0 +1,137 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// maxification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "fmin.hpp"
+#include "kernels/elementwise_functions/fmin.hpp"
+#include "populate.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "kernels/elementwise_functions/minimum.hpp"
+#include "utils/type_dispatch.hpp"
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace min_ns = dpctl::tensor::kernels::minimum;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+
+using ew_cmn_ns::unary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::unary_strided_impl_fn_ptr_t;
+
+namespace impl
+{
+// Supports the same types table as for minimum function in dpctl
+template <typename T1, typename T2>
+using OutputType = min_ns::MinimumOutputType<T1, T2>;
+
+using dpnp::kernels::fmin::FminFunctor;
+
+template <typename argT1,
+          typename argT2,
+          typename resT,
+          unsigned int vec_sz = 4,
+          unsigned int n_vecs = 2,
+          bool enable_sg_loadstore = true>
+using ContigFunctor =
+    ew_cmn_ns::BinaryContigFunctor<argT1,
+                                   argT2,
+                                   resT,
+                                   FminFunctor<argT1, argT2, resT>,
+                                   vec_sz,
+                                   n_vecs,
+                                   enable_sg_loadstore>;
+
+template <typename argT1, typename argT2, typename resT, typename IndexerT>
+using StridedFunctor =
+    ew_cmn_ns::BinaryStridedFunctor<argT1,
+                                    argT2,
+                                    resT,
+                                    IndexerT,
+                                    FminFunctor<argT1, argT2, resT>>;
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static binary_contig_impl_fn_ptr_t fmin_contig_dispatch_table[td_ns::num_types]
+                                                             [td_ns::num_types];
+static int fmin_output_typeid_table[td_ns::num_types][td_ns::num_types];
+static binary_strided_impl_fn_ptr_t
+    fmin_strided_dispatch_table[td_ns::num_types][td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(fmin);
+} // namespace impl
+
+void init_fmin(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+    {
+        impl::populate_fmin_dispatch_tables();
+        using impl::fmin_contig_dispatch_table;
+        using impl::fmin_output_typeid_table;
+        using impl::fmin_strided_dispatch_table;
+
+        auto fmin_pyapi = [&](const arrayT &src1, const arrayT &src2,
+                              const arrayT &dst, sycl::queue &exec_q,
+                              const event_vecT &depends = {}) {
+            return py_int::py_binary_ufunc(
+                src1, src2, dst, exec_q, depends, fmin_output_typeid_table,
+                fmin_contig_dispatch_table, fmin_strided_dispatch_table,
+                // no support of C-contig row with broadcasting in OneMKL
+                td_ns::NullPtrTable<
+                    impl::
+                        binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+                td_ns::NullPtrTable<
+                    impl::
+                        binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+        };
+        m.def("_fmin", fmin_pyapi, "", py::arg("src1"), py::arg("src2"),
+              py::arg("dst"), py::arg("sycl_queue"),
+              py::arg("depends") = py::list());
+
+        auto fmin_result_type_pyapi = [&](const py::dtype &dtype1,
+                                          const py::dtype &dtype2) {
+            return py_int::py_binary_ufunc_result_type(
+                dtype1, dtype2, fmin_output_typeid_table);
+        };
+        m.def("_fmin_result_type", fmin_result_type_pyapi);
+    }
+}
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/fmin.hpp b/dpnp/backend/extensions/ufunc/elementwise_functions/fmin.hpp
new file mode 100644
index 000000000000..9c2ca9baab34
--- /dev/null
+++ b/dpnp/backend/extensions/ufunc/elementwise_functions/fmin.hpp
@@ -0,0 +1,35 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <pybind11/pybind11.h>
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::ufunc
+{
+void init_fmin(py::module_ m);
+} // namespace dpnp::extensions::ufunc
diff --git a/dpnp/backend/extensions/vm/CMakeLists.txt b/dpnp/backend/extensions/vm/CMakeLists.txt
index 0a7646cfc57e..159ca57993ce 100644
--- a/dpnp/backend/extensions/vm/CMakeLists.txt
+++ b/dpnp/backend/extensions/vm/CMakeLists.txt
@@ -43,6 +43,8 @@ set(_elementwise_sources
     ${CMAKE_CURRENT_SOURCE_DIR}/exp2.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/expm1.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/floor.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/fmax.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/fmin.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/fmod.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/hypot.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/ln.cpp
diff --git a/dpnp/backend/extensions/vm/fmax.cpp b/dpnp/backend/extensions/vm/fmax.cpp
new file mode 100644
index 000000000000..b711516f6797
--- /dev/null
+++ b/dpnp/backend/extensions/vm/fmax.cpp
@@ -0,0 +1,161 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "fmax.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::fmax<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event fmax_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    py::ssize_t a_offset,
+                                    const char *in_b,
+                                    py::ssize_t b_offset,
+                                    char *out_y,
+                                    py::ssize_t out_offset,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::fmax(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing 1st input vector of size n
+                        b, // pointer `b` containing 2nd input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(fmax);
+} // namespace impl
+
+void init_fmax(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto fmax_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                          const arrayT &src2, const arrayT &dst,
+                          const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_fmax", fmax_pyapi,
+          "Call `fmax` function from OneMKL VM library to performs element "
+          "by element computation of the modulus function of vector `src1` "
+          "with respect to vector `src2` to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto fmax_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                       const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_fmax_to_call", fmax_need_to_call_pyapi,
+          "Check input arguments to answer if `fmax` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/fmax.hpp b/dpnp/backend/extensions/vm/fmax.hpp
new file mode 100644
index 000000000000..13d8ccad9ff5
--- /dev/null
+++ b/dpnp/backend/extensions/vm/fmax.hpp
@@ -0,0 +1,35 @@
+//*****************************************************************************
+// Copyright (c) 2023-2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <pybind11/pybind11.h>
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::vm
+{
+void init_fmax(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/fmin.cpp b/dpnp/backend/extensions/vm/fmin.cpp
new file mode 100644
index 000000000000..3b288216c921
--- /dev/null
+++ b/dpnp/backend/extensions/vm/fmin.cpp
@@ -0,0 +1,161 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
+
+#include "dpctl4pybind11.hpp"
+
+#include "common.hpp"
+#include "fmin.hpp"
+
+// include a local copy of elementwise common header from dpctl tensor:
+// dpctl/tensor/libtensor/source/elementwise_functions/elementwise_functions.hpp
+// TODO: replace by including dpctl header once available
+#include "../elementwise_functions/elementwise_functions.hpp"
+
+// dpctl tensor headers
+#include "kernels/elementwise_functions/common.hpp"
+#include "utils/type_dispatch.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::extensions::vm
+{
+namespace ew_cmn_ns = dpctl::tensor::kernels::elementwise_common;
+namespace py = pybind11;
+namespace py_int = dpnp::extensions::py_internal;
+namespace td_ns = dpctl::tensor::type_dispatch;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+namespace impl
+{
+// OneMKL namespace with VM functions
+namespace mkl_vm = oneapi::mkl::vm;
+
+/**
+ * @brief A factory to define pairs of supported types for which
+ * MKL VM library provides support in oneapi::mkl::vm::fmin<T> function.
+ *
+ * @tparam T Type of input vectors `a` and `b` and of result vector `y`.
+ */
+template <typename T1, typename T2>
+struct OutputType
+{
+    using value_type = typename std::disjunction<
+        td_ns::BinaryTypeMapResultEntry<T1, double, T2, double, double>,
+        td_ns::BinaryTypeMapResultEntry<T1, float, T2, float, float>,
+        td_ns::DefaultResultEntry<void>>::result_type;
+};
+
+template <typename T1, typename T2>
+static sycl::event fmin_contig_impl(sycl::queue &exec_q,
+                                    std::size_t in_n,
+                                    const char *in_a,
+                                    py::ssize_t a_offset,
+                                    const char *in_b,
+                                    py::ssize_t b_offset,
+                                    char *out_y,
+                                    py::ssize_t out_offset,
+                                    const std::vector<sycl::event> &depends)
+{
+    tu_ns::validate_type_for_device<T1>(exec_q);
+    tu_ns::validate_type_for_device<T2>(exec_q);
+
+    if ((a_offset != 0) || (b_offset != 0) || (out_offset != 0)) {
+        throw std::runtime_error("Arrays offsets have to be equals to 0");
+    }
+
+    std::int64_t n = static_cast<std::int64_t>(in_n);
+    const T1 *a = reinterpret_cast<const T1 *>(in_a);
+    const T2 *b = reinterpret_cast<const T2 *>(in_b);
+
+    using resTy = typename OutputType<T1, T2>::value_type;
+    resTy *y = reinterpret_cast<resTy *>(out_y);
+
+    return mkl_vm::fmin(exec_q,
+                        n, // number of elements to be calculated
+                        a, // pointer `a` containing 1st input vector of size n
+                        b, // pointer `b` containing 2nd input vector of size n
+                        y, // pointer `y` to the output vector of size n
+                        depends);
+}
+
+using ew_cmn_ns::binary_contig_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t;
+using ew_cmn_ns::binary_strided_impl_fn_ptr_t;
+
+static int output_typeid_vector[td_ns::num_types][td_ns::num_types];
+static binary_contig_impl_fn_ptr_t contig_dispatch_vector[td_ns::num_types]
+                                                         [td_ns::num_types];
+
+MACRO_POPULATE_DISPATCH_TABLES(fmin);
+} // namespace impl
+
+void init_fmin(py::module_ m)
+{
+    using arrayT = dpctl::tensor::usm_ndarray;
+    using event_vecT = std::vector<sycl::event>;
+
+    impl::populate_dispatch_tables();
+    using impl::contig_dispatch_vector;
+    using impl::output_typeid_vector;
+
+    auto fmin_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                          const arrayT &src2, const arrayT &dst,
+                          const event_vecT &depends = {}) {
+        return py_int::py_binary_ufunc(
+            src1, src2, dst, exec_q, depends, output_typeid_vector,
+            contig_dispatch_vector,
+            // no support of strided implementation in OneMKL
+            td_ns::NullPtrTable<impl::binary_strided_impl_fn_ptr_t>{},
+            // no support of C-contig row with broadcasting in OneMKL
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_matrix_contig_row_broadcast_impl_fn_ptr_t>{},
+            td_ns::NullPtrTable<
+                impl::
+                    binary_contig_row_contig_matrix_broadcast_impl_fn_ptr_t>{});
+    };
+    m.def("_fmin", fmin_pyapi,
+          "Call `fmin` function from OneMKL VM library to performs element "
+          "by element computation of the modulus function of vector `src1` "
+          "with respect to vector `src2` to resulting vector `dst`",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"), py::arg("depends") = py::list());
+
+    auto fmin_need_to_call_pyapi = [&](sycl::queue &exec_q, const arrayT &src1,
+                                       const arrayT &src2, const arrayT &dst) {
+        return py_internal::need_to_call_binary_ufunc(exec_q, src1, src2, dst,
+                                                      output_typeid_vector,
+                                                      contig_dispatch_vector);
+    };
+    m.def("_mkl_fmin_to_call", fmin_need_to_call_pyapi,
+          "Check input arguments to answer if `fmin` function from "
+          "OneMKL VM library can be used",
+          py::arg("sycl_queue"), py::arg("src1"), py::arg("src2"),
+          py::arg("dst"));
+}
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/fmin.hpp b/dpnp/backend/extensions/vm/fmin.hpp
new file mode 100644
index 000000000000..d1eefe5eccb2
--- /dev/null
+++ b/dpnp/backend/extensions/vm/fmin.hpp
@@ -0,0 +1,35 @@
+//*****************************************************************************
+// Copyright (c) 2023-2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <pybind11/pybind11.h>
+
+namespace py = pybind11;
+
+namespace dpnp::extensions::vm
+{
+void init_fmin(py::module_ m);
+} // namespace dpnp::extensions::vm
diff --git a/dpnp/backend/extensions/vm/vm_py.cpp b/dpnp/backend/extensions/vm/vm_py.cpp
index b78ae51ddc30..4491703957a7 100644
--- a/dpnp/backend/extensions/vm/vm_py.cpp
+++ b/dpnp/backend/extensions/vm/vm_py.cpp
@@ -46,6 +46,8 @@
 #include "exp2.hpp"
 #include "expm1.hpp"
 #include "floor.hpp"
+#include "fmax.hpp"
+#include "fmin.hpp"
 #include "fmod.hpp"
 #include "hypot.hpp"
 #include "ln.hpp"
@@ -87,6 +89,8 @@ PYBIND11_MODULE(_vm_impl, m)
     vm_ns::init_exp2(m);
     vm_ns::init_expm1(m);
     vm_ns::init_floor(m);
+    vm_ns::init_fmax(m);
+    vm_ns::init_fmin(m);
     vm_ns::init_fmod(m);
     vm_ns::init_hypot(m);
     vm_ns::init_ln(m);
diff --git a/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp b/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp
index e5a2c924653a..11aed0ebac25 100644
--- a/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp
+++ b/dpnp/backend/include/dpnp_gen_2arg_3type_tbl.hpp
@@ -103,28 +103,6 @@
 
 #endif
 
-MACRO_2ARG_3TYPES_OP(
-    dpnp_fmod_c,
-    dispatch_fmod_op(input1_elem, input2_elem),
-    dispatch_fmod_op(x1, x2),
-    MACRO_UNPACK_TYPES(std::int32_t, std::int64_t, float, double),
-    oneapi::mkl::vm::fmod,
-    MACRO_UNPACK_TYPES(float, double))
-
-MACRO_2ARG_3TYPES_OP(dpnp_maximum_c,
-                     sycl::max(input1_elem, input2_elem),
-                     nullptr,
-                     std::false_type,
-                     oneapi::mkl::vm::fmax,
-                     MACRO_UNPACK_TYPES(float, double))
-
-MACRO_2ARG_3TYPES_OP(dpnp_minimum_c,
-                     sycl::min(input1_elem, input2_elem),
-                     nullptr,
-                     std::false_type,
-                     oneapi::mkl::vm::fmin,
-                     MACRO_UNPACK_TYPES(float, double))
-
 // "multiply" needs to be standalone kernel (not autogenerated) due to complex
 // algorithm. This is not an element wise. pytest
 // "tests/third_party/cupy/creation_tests/test_ranges.py::TestMgrid::test_mgrid3"
diff --git a/dpnp/backend/include/dpnp_iface_fptr.hpp b/dpnp/backend/include/dpnp_iface_fptr.hpp
index aaaf90c27bb0..9f9b7a89143f 100644
--- a/dpnp/backend/include/dpnp_iface_fptr.hpp
+++ b/dpnp/backend/include/dpnp_iface_fptr.hpp
@@ -100,15 +100,11 @@ enum class DPNPFuncName : size_t
     DPNP_FN_INITVAL_EXT, /**< Used in numpy ones, ones_like, zeros, zeros_like
                             impls  */
     DPNP_FN_MAX,         /**< Used in numpy.max() impl  */
-    DPNP_FN_MAXIMUM_EXT, /**< Used in numpy.fmax() impl , requires extra
-                            parameters */
     DPNP_FN_MEAN,        /**< Used in numpy.mean() impl  */
     DPNP_FN_MEDIAN,      /**< Used in numpy.median() impl  */
     DPNP_FN_MEDIAN_EXT,  /**< Used in numpy.median() impl, requires extra
                             parameters */
     DPNP_FN_MIN,         /**< Used in numpy.min() impl  */
-    DPNP_FN_MINIMUM_EXT, /**< Used in numpy.fmax() impl, requires extra
-                            parameters */
     DPNP_FN_MODF,        /**< Used in numpy.modf() impl  */
     DPNP_FN_MODF_EXT,  /**< Used in numpy.modf() impl, requires extra parameters
                         */
diff --git a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
index e3797bd22e6e..75413cc5e60d 100644
--- a/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_elemwise.cpp
@@ -1026,45 +1026,6 @@ static void func_map_init_elemwise_1arg_1type(func_map_t &fmap)
 
 #include <dpnp_gen_2arg_3type_tbl.hpp>
 
-template <DPNPFuncType FT1, DPNPFuncType... FTs>
-static void func_map_elemwise_2arg_3type_short_core(func_map_t &fmap)
-{
-    ((fmap[DPNPFuncName::DPNP_FN_MAXIMUM_EXT][FT1][FTs] =
-          {get_floating_res_type<FT1, FTs, std::true_type, std::true_type>(),
-           (void *)dpnp_maximum_c_ext<
-               func_type_map_t::find_type<get_floating_res_type<
-                   FT1, FTs, std::true_type, std::true_type>()>,
-               func_type_map_t::find_type<FT1>,
-               func_type_map_t::find_type<FTs>>,
-           get_floating_res_type<FT1, FTs, std::false_type, std::true_type>(),
-           (void *)dpnp_maximum_c_ext<
-               func_type_map_t::find_type<get_floating_res_type<
-                   FT1, FTs, std::false_type, std::true_type>()>,
-               func_type_map_t::find_type<FT1>,
-               func_type_map_t::find_type<FTs>>}),
-     ...);
-    ((fmap[DPNPFuncName::DPNP_FN_MINIMUM_EXT][FT1][FTs] =
-          {get_floating_res_type<FT1, FTs, std::true_type, std::true_type>(),
-           (void *)dpnp_minimum_c_ext<
-               func_type_map_t::find_type<get_floating_res_type<
-                   FT1, FTs, std::true_type, std::true_type>()>,
-               func_type_map_t::find_type<FT1>,
-               func_type_map_t::find_type<FTs>>,
-           get_floating_res_type<FT1, FTs, std::false_type, std::true_type>(),
-           (void *)dpnp_minimum_c_ext<
-               func_type_map_t::find_type<get_floating_res_type<
-                   FT1, FTs, std::false_type, std::true_type>()>,
-               func_type_map_t::find_type<FT1>,
-               func_type_map_t::find_type<FTs>>}),
-     ...);
-}
-
-template <DPNPFuncType... FTs>
-static void func_map_elemwise_2arg_3type_short_helper(func_map_t &fmap)
-{
-    ((func_map_elemwise_2arg_3type_short_core<FTs, FTs...>(fmap)), ...);
-}
-
 static void func_map_init_elemwise_2arg_3type(func_map_t &fmap)
 {
     // Used in dpnp_dot_c
@@ -1170,9 +1131,6 @@ static void func_map_init_elemwise_2arg_3type(func_map_t &fmap)
         (void *)dpnp_multiply_c_default<
             std::complex<double>, std::complex<double>, std::complex<double>>};
 
-    func_map_elemwise_2arg_3type_short_helper<eft_INT, eft_LNG, eft_FLT,
-                                              eft_DBL>(fmap);
-
     return;
 }
 
diff --git a/dpnp/backend/kernels/elementwise_functions/fmax.hpp b/dpnp/backend/kernels/elementwise_functions/fmax.hpp
new file mode 100644
index 000000000000..6b0ebb81ec6f
--- /dev/null
+++ b/dpnp/backend/kernels/elementwise_functions/fmax.hpp
@@ -0,0 +1,83 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <sycl/sycl.hpp>
+
+// dpctl tensor headers
+#include "utils/math_utils.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::kernels::fmax
+{
+namespace mu_ns = dpctl::tensor::math_utils;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+template <typename argT1, typename argT2, typename resT>
+struct FmaxFunctor
+{
+    using supports_sg_loadstore = std::negation<
+        std::disjunction<tu_ns::is_complex<argT1>, tu_ns::is_complex<argT2>>>;
+    using supports_vec =
+        std::conjunction<std::is_same<argT1, argT2>,
+                         std::disjunction<std::is_floating_point<argT1>,
+                                          std::is_same<argT1, sycl::half>>>;
+
+    resT operator()(const argT1 &in1, const argT2 &in2) const
+    {
+        if constexpr (std::is_integral_v<argT1> && std::is_integral_v<argT2>) {
+            return in1 >= in2 ? in1 : in2;
+        }
+        else if constexpr (tu_ns::is_complex<argT1>::value &&
+                           tu_ns::is_complex<argT2>::value)
+        {
+            static_assert(std::is_same_v<argT1, argT2>);
+
+            using realT = typename argT1::value_type;
+            const realT in2r = std::real(in2);
+            const realT in2i = std::imag(in2);
+
+            if (sycl::isnan(in2r) || sycl::isnan(in2i) ||
+                mu_ns::greater_equal_complex<argT1>(in1, in2))
+            {
+                return in1;
+            }
+            return in2;
+        }
+        else {
+            return sycl::fmax(in1, in2);
+        }
+    }
+
+    template <int vec_sz>
+    sycl::vec<resT, vec_sz>
+        operator()(const sycl::vec<argT1, vec_sz> &in1,
+                   const sycl::vec<argT2, vec_sz> &in2) const
+    {
+        return sycl::fmax(in1, in2);
+    }
+};
+} // namespace dpnp::kernels::fmax
diff --git a/dpnp/backend/kernels/elementwise_functions/fmin.hpp b/dpnp/backend/kernels/elementwise_functions/fmin.hpp
new file mode 100644
index 000000000000..30e4af8884f4
--- /dev/null
+++ b/dpnp/backend/kernels/elementwise_functions/fmin.hpp
@@ -0,0 +1,83 @@
+//*****************************************************************************
+// Copyright (c) 2024, Intel Corporation
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+// THE POSSIBILITY OF SUCH DAMAGE.
+//*****************************************************************************
+
+#pragma once
+
+#include <sycl/sycl.hpp>
+
+// dpctl tensor headers
+#include "utils/math_utils.hpp"
+#include "utils/type_utils.hpp"
+
+namespace dpnp::kernels::fmin
+{
+namespace mu_ns = dpctl::tensor::math_utils;
+namespace tu_ns = dpctl::tensor::type_utils;
+
+template <typename argT1, typename argT2, typename resT>
+struct FminFunctor
+{
+    using supports_sg_loadstore = std::negation<
+        std::disjunction<tu_ns::is_complex<argT1>, tu_ns::is_complex<argT2>>>;
+    using supports_vec =
+        std::conjunction<std::is_same<argT1, argT2>,
+                         std::disjunction<std::is_floating_point<argT1>,
+                                          std::is_same<argT1, sycl::half>>>;
+
+    resT operator()(const argT1 &in1, const argT2 &in2) const
+    {
+        if constexpr (std::is_integral_v<argT1> && std::is_integral_v<argT2>) {
+            return in1 <= in2 ? in1 : in2;
+        }
+        else if constexpr (tu_ns::is_complex<argT1>::value &&
+                           tu_ns::is_complex<argT2>::value)
+        {
+            static_assert(std::is_same_v<argT1, argT2>);
+
+            using realT = typename argT1::value_type;
+            const realT in2r = std::real(in2);
+            const realT in2i = std::imag(in2);
+
+            if (sycl::isnan(in2r) || sycl::isnan(in2i) ||
+                mu_ns::less_equal_complex<argT1>(in1, in2))
+            {
+                return in1;
+            }
+            return in2;
+        }
+        else {
+            return sycl::fmin(in1, in2);
+        }
+    }
+
+    template <int vec_sz>
+    sycl::vec<resT, vec_sz>
+        operator()(const sycl::vec<argT1, vec_sz> &in1,
+                   const sycl::vec<argT2, vec_sz> &in2) const
+    {
+        return sycl::fmin(in1, in2);
+    }
+};
+} // namespace dpnp::kernels::fmin
diff --git a/dpnp/backend/kernels/elementwise_functions/fmod.hpp b/dpnp/backend/kernels/elementwise_functions/fmod.hpp
index e97b257cb066..bf60bd095642 100644
--- a/dpnp/backend/kernels/elementwise_functions/fmod.hpp
+++ b/dpnp/backend/kernels/elementwise_functions/fmod.hpp
@@ -38,8 +38,7 @@ struct FmodFunctor
 
     resT operator()(const argT1 &in1, const argT2 &in2) const
     {
-        if constexpr (std::is_integral<argT1>::value &&
-                      std::is_integral<argT2>::value) {
+        if constexpr (std::is_integral_v<argT1> && std::is_integral_v<argT2>) {
             if (in2 == argT2(0)) {
                 return resT(0);
             }
diff --git a/dpnp/dpnp_algo/dpnp_algo.pxd b/dpnp/dpnp_algo/dpnp_algo.pxd
index 0c8bd1134a78..3b5b23832260 100644
--- a/dpnp/dpnp_algo/dpnp_algo.pxd
+++ b/dpnp/dpnp_algo/dpnp_algo.pxd
@@ -41,9 +41,7 @@ cdef extern from "dpnp_iface_fptr.hpp" namespace "DPNPFuncName":  # need this na
         DPNP_FN_ERF_EXT
         DPNP_FN_FFT_FFT_EXT
         DPNP_FN_FFT_RFFT_EXT
-        DPNP_FN_MAXIMUM_EXT
         DPNP_FN_MEDIAN_EXT
-        DPNP_FN_MINIMUM_EXT
         DPNP_FN_MODF_EXT
         DPNP_FN_PARTITION_EXT
         DPNP_FN_RADIANS_EXT
@@ -170,15 +168,6 @@ cpdef dpnp_descriptor dpnp_isclose(dpnp_descriptor input1, dpnp_descriptor input
                                    double rtol=*, double atol=*, cpp_bool equal_nan=*)
 
 
-"""
-Mathematical functions
-"""
-cpdef dpnp_descriptor dpnp_fmax(dpnp_descriptor x1_obj, dpnp_descriptor x2_obj, object dtype=*,
-                                   dpnp_descriptor out=*, object where=*)
-cpdef dpnp_descriptor dpnp_fmin(dpnp_descriptor x1_obj, dpnp_descriptor x2_obj, object dtype=*,
-                                   dpnp_descriptor out=*, object where=*)
-
-
 """
 Trigonometric functions
 """
diff --git a/dpnp/dpnp_algo/dpnp_algo.pyx b/dpnp/dpnp_algo/dpnp_algo.pyx
index 4c560d50e0b3..d304f1d32d35 100644
--- a/dpnp/dpnp_algo/dpnp_algo.pyx
+++ b/dpnp/dpnp_algo/dpnp_algo.pyx
@@ -219,99 +219,3 @@ cdef utils.dpnp_descriptor call_fptr_1in_1out_strides(DPNPFuncName fptr_name,
     c_dpctl.DPCTLEvent_Delete(event_ref)
 
     return result
-
-
-cdef utils.dpnp_descriptor call_fptr_2in_1out_strides(DPNPFuncName fptr_name,
-                                                      utils.dpnp_descriptor x1_obj,
-                                                      utils.dpnp_descriptor x2_obj,
-                                                      object dtype=None,
-                                                      utils.dpnp_descriptor out=None,
-                                                      object where=True,
-                                                      func_name=None):
-
-    # Convert type (x1_obj.dtype) to C enum DPNPFuncType
-    cdef DPNPFuncType x1_c_type = dpnp_dtype_to_DPNPFuncType(x1_obj.dtype)
-    cdef DPNPFuncType x2_c_type = dpnp_dtype_to_DPNPFuncType(x2_obj.dtype)
-
-    # get the FPTR data structure
-    cdef DPNPFuncData kernel_data = get_dpnp_function_ptr(fptr_name, x1_c_type, x2_c_type)
-
-    result_sycl_device, result_usm_type, result_sycl_queue = utils.get_common_usm_allocation(x1_obj, x2_obj)
-
-    # get FPTR function and return type
-    cdef (DPNPFuncType, void *) ret_type_and_func = utils.get_ret_type_and_func(kernel_data,
-                                                                                result_sycl_device.has_aspect_fp64)
-    cdef DPNPFuncType return_type = ret_type_and_func[0]
-    cdef fptr_2in_1out_strides_t func = < fptr_2in_1out_strides_t > ret_type_and_func[1]
-
-    # Create result array
-    cdef shape_type_c x1_shape = x1_obj.shape
-
-    cdef shape_type_c x1_strides = utils.strides_to_vector(x1_obj.strides, x1_shape)
-    cdef shape_type_c x2_shape = x2_obj.shape
-    cdef shape_type_c x2_strides = utils.strides_to_vector(x2_obj.strides, x2_shape)
-
-    cdef shape_type_c result_shape = utils.get_common_shape(x1_shape, x2_shape)
-    cdef utils.dpnp_descriptor result
-
-    # check 'out' parameter data
-    if out is not None:
-        if out.shape != result_shape:
-            utils.checker_throw_value_error(func_name, 'out.shape', out.shape, result_shape)
-
-        utils.get_common_usm_allocation(x1_obj, out)  # check USM allocation is common
-
-    if out is None or out.is_array_overlapped(x1_obj) or out.is_array_overlapped(x2_obj) or not out.match_ctype(return_type):
-        """
-        Create result array with type given by FPTR data.
-        If 'out' array has another dtype than expected or overlaps a memory from any input array,
-        we have to create a temporary array and to copy data from the temporary into 'out' array,
-        once the computation is completed.
-        Otherwise simultaneously access to the same memory may cause a race condition issue
-        which will result into undefined behaviour.
-        """
-        is_result_memory_allocated = True
-        result = utils.create_output_descriptor(result_shape,
-                                                return_type,
-                                                None,
-                                                device=result_sycl_device,
-                                                usm_type=result_usm_type,
-                                                sycl_queue=result_sycl_queue)
-    else:
-        is_result_memory_allocated = False
-        result = out
-
-    cdef shape_type_c result_strides = utils.strides_to_vector(result.strides, result_shape)
-
-    result_obj = result.get_array()
-
-    cdef c_dpctl.SyclQueue q = < c_dpctl.SyclQueue > result_obj.sycl_queue
-    cdef c_dpctl.DPCTLSyclQueueRef q_ref = q.get_queue_ref()
-
-    """ Call FPTR function """
-    cdef c_dpctl.DPCTLSyclEventRef event_ref = func(q_ref,
-                                                    result.get_data(),
-                                                    result.size,
-                                                    result.ndim,
-                                                    result_shape.data(),
-                                                    result_strides.data(),
-                                                    x1_obj.get_data(),
-                                                    x1_obj.size,
-                                                    x1_obj.ndim,
-                                                    x1_shape.data(),
-                                                    x1_strides.data(),
-                                                    x2_obj.get_data(),
-                                                    x2_obj.size,
-                                                    x2_obj.ndim,
-                                                    x2_shape.data(),
-                                                    x2_strides.data(),
-                                                    NULL,
-                                                    NULL)  # dep_events_ref)
-
-    with nogil: c_dpctl.DPCTLEvent_WaitAndThrow(event_ref)
-    c_dpctl.DPCTLEvent_Delete(event_ref)
-
-    if out is not None and is_result_memory_allocated:
-        return out.get_result_desc(result)
-
-    return result.get_result_desc()
diff --git a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
index 28b89ce60a1a..84b004856bda 100644
--- a/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
+++ b/dpnp/dpnp_algo/dpnp_algo_mathematical.pxi
@@ -37,8 +37,6 @@ and the rest of the library
 
 __all__ += [
     "dpnp_ediff1d",
-    "dpnp_fmax",
-    "dpnp_fmin",
     "dpnp_modf",
 ]
 
@@ -104,22 +102,6 @@ cpdef utils.dpnp_descriptor dpnp_ediff1d(utils.dpnp_descriptor x1):
     return result
 
 
-cpdef utils.dpnp_descriptor dpnp_fmax(utils.dpnp_descriptor x1_obj,
-                                         utils.dpnp_descriptor x2_obj,
-                                         object dtype=None,
-                                         utils.dpnp_descriptor out=None,
-                                         object where=True):
-    return call_fptr_2in_1out_strides(DPNP_FN_MAXIMUM_EXT, x1_obj, x2_obj, dtype, out, where)
-
-
-cpdef utils.dpnp_descriptor dpnp_fmin(utils.dpnp_descriptor x1_obj,
-                                         utils.dpnp_descriptor x2_obj,
-                                         object dtype=None,
-                                         utils.dpnp_descriptor out=None,
-                                         object where=True):
-    return call_fptr_2in_1out_strides(DPNP_FN_MINIMUM_EXT, x1_obj, x2_obj, dtype, out, where)
-
-
 cpdef tuple dpnp_modf(utils.dpnp_descriptor x1):
     """ Convert string type names (array.dtype) to C enum DPNPFuncType """
     cdef DPNPFuncType param1_type = dpnp_dtype_to_DPNPFuncType(x1.dtype)
diff --git a/dpnp/dpnp_iface.py b/dpnp/dpnp_iface.py
index b3103869e8d3..3402f7d23a8f 100644
--- a/dpnp/dpnp_iface.py
+++ b/dpnp/dpnp_iface.py
@@ -438,11 +438,6 @@ def get_dpnp_descriptor(
     if use_origin_backend():
         return False
 
-    # It's required to keep track of input object if a non-strided copy is
-    # going to be created. Thus there will be an extra descriptor allocated
-    # to refer on original input.
-    orig_desc = None
-
     # If input object is a scalar, it means it was allocated on host memory.
     # We need to copy it to USM memory according to compute follows data.
     if isscalar(ext_obj):
@@ -473,7 +468,6 @@ def get_dpnp_descriptor(
             ext_obj_offset = 0
 
         if ext_obj.strides != shape_offsets or ext_obj_offset != 0:
-            orig_desc = dpnp_descriptor(ext_obj)
             ext_obj = array(ext_obj, order="C")
 
     # while dpnp functions are based on DPNP_QUEUE
@@ -490,7 +484,7 @@ def get_dpnp_descriptor(
         if not queue_is_default:
             ext_obj = array(ext_obj, sycl_queue=default_queue)
 
-    dpnp_desc = dpnp_descriptor(ext_obj, orig_desc)
+    dpnp_desc = dpnp_descriptor(ext_obj)
     if dpnp_desc.is_valid:  # pylint: disable=using-constant-test
         return dpnp_desc
 
diff --git a/dpnp/dpnp_iface_mathematical.py b/dpnp/dpnp_iface_mathematical.py
index 1caf1359be3e..51d7f2ceddc9 100644
--- a/dpnp/dpnp_iface_mathematical.py
+++ b/dpnp/dpnp_iface_mathematical.py
@@ -61,8 +61,6 @@
 from .backend.extensions.sycl_ext import _sycl_ext_impl
 from .dpnp_algo import (
     dpnp_ediff1d,
-    dpnp_fmax,
-    dpnp_fmin,
     dpnp_modf,
 )
 from .dpnp_algo.dpnp_elementwise_common import (
@@ -1537,232 +1535,174 @@ def ediff1d(x1, to_end=None, to_begin=None):
 )
 
 
-def fmax(x1, x2, /, out=None, *, where=True, dtype=None, subok=True, **kwargs):
-    """
-    Element-wise maximum of array elements.
+_FMAX_DOCSTRING = """
+Compares two input arrays `x1` and `x2` and returns a new array containing the
+element-wise maxima.
 
-    For full documentation refer to :obj:`numpy.fmax`.
+If one of the elements being compared is a NaN, then the non-nan element is
+returned. If both elements are NaNs then the first is returned. The latter
+distinction is important for complex NaNs, which are defined as at least one of
+the real or imaginary parts being a NaN. The net effect is that NaNs are
+ignored when possible.
 
-    Returns
-    -------
-    out : dpnp.ndarray
-        The maximum of `x1` and `x2`, element-wise, ignoring NaNs.
+For full documentation refer to :obj:`numpy.fmax`.
 
-    Limitations
-    -----------
-    Parameters `x1` and `x2` are supported as either scalar,
-    :class:`dpnp.ndarray` or :class:`dpctl.tensor.usm_ndarray`, but both `x1`
-    and `x2` can not be scalars at the same time.
-    Parameters `where`, `dtype` and `subok` are supported with their default
-    values.
-    Keyword argument `kwargs` is currently unsupported.
-    Otherwise the function will be executed sequentially on CPU.
-    Input array data types are limited by real-valued data types.
+Parameters
+----------
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
+    First input array, expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
+    Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+out : {None, dpnp.ndarray, usm_ndarray}, optional
+    Output array to populate.
+    Array must have the correct shape and the expected data type.
+    Default: ``None``.
+order : {"C", "F", "A", "K"}, optional
+    Memory layout of the newly output array, if parameter `out` is ``None``.
+    Default: ``"K"``.
 
-    See Also
-    --------
-    :obj:`dpnp.maximum` : Element-wise maximum of array elements, propagates
-                          NaNs.
-    :obj:`dpnp.fmin` : Element-wise minimum of array elements, ignores NaNs.
-    :obj:`dpnp.max` : The maximum value of an array along a given axis,
-                      propagates NaNs..
-    :obj:`dpnp.nanmax` : The maximum value of an array along a given axis,
-                         ignores NaNs.
-    :obj:`dpnp.minimum` : Element-wise minimum of array elements, propagates
-                          NaNs.
-    :obj:`dpnp.fmod` : Calculate the element-wise remainder of division.
+Returns
+-------
+out : dpnp.ndarray
+    An array containing the element-wise maxima. The data type of
+    the returned array is determined by the Type Promotion Rules.
 
-    Examples
-    --------
-    >>> import dpnp as np
-    >>> x1 = np.array([2, 3, 4])
-    >>> x2 = np.array([1, 5, 2])
-    >>> np.fmax(x1, x2)
-    array([2, 5, 4])
-
-    >>> x1 = np.eye(2)
-    >>> x2 = np.array([0.5, 2])
-    >>> np.fmax(x1, x2) # broadcasting
-    array([[1. , 2. ],
-           [0.5, 2. ]])
-
-    >>> x1 = np.array([np.nan, 0, np.nan])
-    >>> x2 = np.array([0, np.nan, np.nan])
-    >>> np.fmax(x1, x2)
-    array([ 0.,  0., nan])
+Limitations
+-----------
+Parameters `where` and `subok` are supported with their default values.
+Keyword argument `kwargs` is currently unsupported.
+Otherwise ``NotImplementedError`` exception will be raised.
 
-    """
+See Also
+--------
+:obj:`dpnp.fmin` : Element-wise minimum of two arrays, ignores NaNs.
+:obj:`dpnp.maximum` : Element-wise maximum of two arrays, propagates NaNs.
+:obj:`dpnp.max` : The maximum value of an array along a given axis, propagates NaNs.
+:obj:`dpnp.nanmax` : The maximum value of an array along a given axis, ignores NaNs.
+:obj:`dpnp.minimum` : Element-wise minimum of two arrays, propagates NaNs.
+:obj:`dpnp.min` : The minimum value of an array along a given axis, propagates NaNs.
+:obj:`dpnp.nanmin` : The minimum value of an array along a given axis, ignores NaNs.
 
-    if kwargs:
-        pass
-    elif where is not True:
-        pass
-    elif dtype is not None:
-        pass
-    elif subok is not True:
-        pass
-    elif dpnp.isscalar(x1) and dpnp.isscalar(x2):
-        # at least either x1 or x2 has to be an array
-        pass
-    else:
-        # get USM type and queue to copy scalar from the host memory
-        # into a USM allocation
-        usm_type, queue = (
-            get_usm_allocations([x1, x2])
-            if dpnp.isscalar(x1) or dpnp.isscalar(x2)
-            else (None, None)
-        )
+Notes
+-----
+The fmax is equivalent to ``dpnp.where(x1 >= x2, x1, x2)`` when neither
+`x1` nor `x2` are NaNs, but it is faster and does proper broadcasting.
 
-        x1_desc = dpnp.get_dpnp_descriptor(
-            x1,
-            copy_when_strides=False,
-            copy_when_nondefault_queue=False,
-            alloc_usm_type=usm_type,
-            alloc_queue=queue,
-        )
-        x2_desc = dpnp.get_dpnp_descriptor(
-            x2,
-            copy_when_strides=False,
-            copy_when_nondefault_queue=False,
-            alloc_usm_type=usm_type,
-            alloc_queue=queue,
-        )
-        if x1_desc and x2_desc:
-            if out is not None:
-                if not dpnp.is_supported_array_type(out):
-                    raise TypeError(
-                        "return array must be of supported array type"
-                    )
-                out_desc = (
-                    dpnp.get_dpnp_descriptor(
-                        out, copy_when_nondefault_queue=False
-                    )
-                    or None
-                )
-            else:
-                out_desc = None
+Examples
+--------
+>>> import dpnp as np
+>>> x1 = np.array([2, 3, 4])
+>>> x2 = np.array([1, 5, 2])
+>>> np.fmax(x1, x2)
+array([2, 5, 4])
 
-            return dpnp_fmax(
-                x1_desc, x2_desc, dtype=dtype, out=out_desc, where=where
-            ).get_pyobj()
+>>> x1 = np.eye(2)
+>>> x2 = np.array([0.5, 2])
+>>> np.fmax(x1, x2)
+array([[1. , 2. ],
+       [0.5, 2. ]])
 
-    return call_origin(
-        numpy.fmax, x1, x2, dtype=dtype, out=out, where=where, **kwargs
-    )
+>>> x1 = np.array([np.nan, 0, np.nan])
+>>> x2 = np.array([0, np.nan, np.nan])
+>>> np.fmax(x1, x2)
+array([ 0.,  0., nan])
+"""
 
+fmax = DPNPBinaryFunc(
+    "fmax",
+    ufi._fmax_result_type,
+    ufi._fmax,
+    _FMAX_DOCSTRING,
+    mkl_fn_to_call=vmi._mkl_fmax_to_call,
+    mkl_impl_fn=vmi._fmax,
+)
 
-def fmin(x1, x2, /, out=None, *, where=True, dtype=None, subok=True, **kwargs):
-    """
-    Element-wise minimum of array elements.
 
-    For full documentation refer to :obj:`numpy.fmin`.
+_FMIN_DOCSTRING = """
+Compares two input arrays `x1` and `x2` and returns a new array containing the
+element-wise minima.
 
-    Returns
-    -------
-    out : dpnp.ndarray
-        The minimum of `x1` and `x2`, element-wise, ignoring NaNs.
+If one of the elements being compared is a NaN, then the non-nan element is
+returned. If both elements are NaNs then the first is returned. The latter
+distinction is important for complex NaNs, which are defined as at least one of
+the real or imaginary parts being a NaN. The net effect is that NaNs are
+ignored when possible.
 
-    Limitations
-    -----------
-    Parameters `x1` and `x2` are supported as either scalar,
-    :class:`dpnp.ndarray` or :class:`dpctl.tensor.usm_ndarray`, but both `x1`
-    and `x2` can not be scalars at the same time.
-    Parameters `where`, `dtype` and `subok` are supported with their default
-    values.
-    Keyword argument `kwargs` is currently unsupported.
-    Otherwise the function will be executed sequentially on CPU.
-    Input array data types are limited by real-valued data types.
+For full documentation refer to :obj:`numpy.fmin`.
 
-    See Also
-    --------
-    :obj:`dpnp.minimum` : Element-wise minimum of array elements, propagates
-                          NaNs.
-    :obj:`dpnp.fmax` : Element-wise maximum of array elements, ignores NaNs.
-    :obj:`dpnp.min` : The minimum value of an array along a given axis,
-                      propagates NaNs.
-    :obj:`dpnp.nanmin` : The minimum value of an array along a given axis,
-                         ignores NaNs.
-    :obj:`dpnp.maximum` : Element-wise maximum of array elements, propagates
-                          NaNs.
-    :obj:`dpnp.fmod` : Calculate the element-wise remainder of division.
+Parameters
+----------
+x1 : {dpnp.ndarray, usm_ndarray, scalar}
+    First input array, expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+x2 : {dpnp.ndarray, usm_ndarray, scalar}
+    Second input array, also expected to have numeric data type.
+    Both inputs `x1` and `x2` can not be scalars at the same time.
+out : {None, dpnp.ndarray, usm_ndarray}, optional
+    Output array to populate.
+    Array must have the correct shape and the expected data type.
+    Default: ``None``.
+order : {"C", "F", "A", "K"}, optional
+    Memory layout of the newly output array, if parameter `out` is ``None``.
+    Default: ``"K"``.
 
-    Examples
-    --------
-    >>> import dpnp as np
-    >>> x1 = np.array([2, 3, 4])
-    >>> x2 = np.array([1, 5, 2])
-    >>> np.fmin(x1, x2)
-    array([1, 3, 2])
-
-    >>> x1 = np.eye(2)
-    >>> x2 = np.array([0.5, 2])
-    >>> np.fmin(x1, x2) # broadcasting
-    array([[0.5, 0. ],
-           [0. , 1. ]]
-
-    >>> x1 = np.array([np.nan, 0, np.nan])
-    >>> x2 = np.array([0, np.nan, np.nan])
-    >>> np.fmin(x1, x2)
-    array([ 0.,  0., nan])
+Returns
+-------
+out : dpnp.ndarray
+    An array containing the element-wise minima. The data type of
+    the returned array is determined by the Type Promotion Rules.
 
-    """
+Limitations
+-----------
+Parameters `where` and `subok` are supported with their default values.
+Keyword argument `kwargs` is currently unsupported.
+Otherwise ``NotImplementedError`` exception will be raised.
 
-    if kwargs:
-        pass
-    elif where is not True:
-        pass
-    elif dtype is not None:
-        pass
-    elif subok is not True:
-        pass
-    elif dpnp.isscalar(x1) and dpnp.isscalar(x2):
-        # at least either x1 or x2 has to be an array
-        pass
-    else:
-        # get USM type and queue to copy scalar from the host memory into
-        # a USM allocation
-        usm_type, queue = (
-            get_usm_allocations([x1, x2])
-            if dpnp.isscalar(x1) or dpnp.isscalar(x2)
-            else (None, None)
-        )
+See Also
+--------
+:obj:`dpnp.fmax` : Element-wise maximum of two arrays, ignores NaNs.
+:obj:`dpnp.minimum` : Element-wise minimum of two arrays, propagates NaNs.
+:obj:`dpnp.min` : The minimum value of an array along a given axis, propagates NaNs.
+:obj:`dpnp.nanmin` : The minimum value of an array along a given axis, ignores NaNs.
+:obj:`dpnp.maximum` : Element-wise maximum of two arrays, propagates NaNs.
+:obj:`dpnp.max` : The maximum value of an array along a given axis, propagates NaNs.
+:obj:`dpnp.nanmax` : The maximum value of an array along a given axis, ignores NaNs.
 
-        x1_desc = dpnp.get_dpnp_descriptor(
-            x1,
-            copy_when_strides=False,
-            copy_when_nondefault_queue=False,
-            alloc_usm_type=usm_type,
-            alloc_queue=queue,
-        )
-        x2_desc = dpnp.get_dpnp_descriptor(
-            x2,
-            copy_when_strides=False,
-            copy_when_nondefault_queue=False,
-            alloc_usm_type=usm_type,
-            alloc_queue=queue,
-        )
-        if x1_desc and x2_desc:
-            if out is not None:
-                if not dpnp.is_supported_array_type(out):
-                    raise TypeError(
-                        "return array must be of supported array type"
-                    )
-                out_desc = (
-                    dpnp.get_dpnp_descriptor(
-                        out, copy_when_nondefault_queue=False
-                    )
-                    or None
-                )
-            else:
-                out_desc = None
+Notes
+-----
+The fmin is equivalent to ``dpnp.where(x1 <= x2, x1, x2)`` when neither
+`x1` nor `x2` are NaNs, but it is faster and does proper broadcasting.
 
-            return dpnp_fmin(
-                x1_desc, x2_desc, dtype=dtype, out=out_desc, where=where
-            ).get_pyobj()
+Examples
+--------
+>>> import dpnp as np
+>>> x1 = np.array([2, 3, 4])
+>>> x2 = np.array([1, 5, 2])
+>>> np.fmin(x1, x2)
+array([1, 3, 2])
 
-    return call_origin(
-        numpy.fmin, x1, x2, dtype=dtype, out=out, where=where, **kwargs
-    )
+>>> x1 = np.eye(2)
+>>> x2 = np.array([0.5, 2])
+>>> np.fmin(x1, x2)
+array([[0.5, 0. ],
+       [0. , 1. ]])
+
+>>> x1 = np.array([np.nan, 0, np.nan])
+>>> x2 = np.array([0, np.nan, np.nan])
+>>> np.fmin(x1, x2)
+array([ 0.,  0., nan])
+"""
+
+fmin = DPNPBinaryFunc(
+    "fmin",
+    ufi._fmin_result_type,
+    ufi._fmin,
+    _FMIN_DOCSTRING,
+    mkl_fn_to_call=vmi._mkl_fmin_to_call,
+    mkl_impl_fn=vmi._fmin,
+)
 
 
 _FMOD_DOCSTRING = """
@@ -2100,6 +2040,11 @@ def gradient(f, *varargs, axis=None, edge_order=1):
 Compares two input arrays `x1` and `x2` and returns a new array containing the
 element-wise maxima.
 
+If one of the elements being compared is a NaN, then that element is returned.
+If both elements are NaNs then the first is returned. The latter distinction is
+important for complex NaNs, which are defined as at least one of the real or
+imaginary parts being a NaN. The net effect is that NaNs are propagated.
+
 For full documentation refer to :obj:`numpy.maximum`.
 
 Parameters
@@ -2175,6 +2120,11 @@ def gradient(f, *varargs, axis=None, edge_order=1):
 Compares two input arrays `x1` and `x2` and returns a new array containing the
 element-wise minima.
 
+If one of the elements being compared is a NaN, then that element is returned.
+If both elements are NaNs then the first is returned. The latter distinction is
+important for complex NaNs, which are defined as at least one of the real or
+imaginary parts being a NaN. The net effect is that NaNs are propagated.
+
 For full documentation refer to :obj:`numpy.minimum`.
 
 Parameters
diff --git a/dpnp/dpnp_utils/dpnp_algo_utils.pxd b/dpnp/dpnp_utils/dpnp_algo_utils.pxd
index 4d4272ac9fb1..23714b5218cc 100644
--- a/dpnp/dpnp_utils/dpnp_algo_utils.pxd
+++ b/dpnp/dpnp_utils/dpnp_algo_utils.pxd
@@ -91,19 +91,11 @@ cdef class dpnp_descriptor:
 
     cdef public:  # TODO remove "public" as python accessible attribute
         object origin_pyobj
-        dpnp_descriptor origin_desc
         dict descriptor
         Py_ssize_t dpnp_descriptor_data_size
         cpp_bool dpnp_descriptor_is_scalar
 
     cdef void * get_data(self)
-    cdef cpp_bool match_ctype(self, DPNPFuncType ctype)
-
-
-cdef shape_type_c get_common_shape(shape_type_c input1_shape, shape_type_c input2_shape) except *
-"""
-Calculate common shape from input shapes
-"""
 
 cdef dpnp_descriptor create_output_descriptor(shape_type_c output_shape,
                                               DPNPFuncType c_type,
diff --git a/dpnp/dpnp_utils/dpnp_algo_utils.pyx b/dpnp/dpnp_utils/dpnp_algo_utils.pyx
index 1e3a793d868f..ad9a2f10ff43 100644
--- a/dpnp/dpnp_utils/dpnp_algo_utils.pyx
+++ b/dpnp/dpnp_utils/dpnp_algo_utils.pyx
@@ -33,8 +33,6 @@ This module contains different helpers and utilities
 """
 
 import dpctl
-import dpctl.tensor._copy_utils as dpt_cu
-import dpctl.tensor._tensor_impl as dpt_ti
 import dpctl.utils as dpu
 import numpy
 
@@ -381,32 +379,6 @@ cpdef long _get_linear_index(key, tuple shape, int ndim):
     return li
 
 
-cdef shape_type_c get_common_shape(shape_type_c input1_shape, shape_type_c input2_shape) except *:
-    cdef shape_type_c input1_shape_orig = input1_shape
-    cdef shape_type_c input2_shape_orig = input2_shape
-    cdef shape_type_c result_shape
-
-    # ex (8, 1, 6, 1) and (7, 1, 5) -> (8, 1, 6, 1) and (1, 7, 1, 5)
-    cdef size_t max_shape_size = max(input1_shape.size(), input2_shape.size())
-    input1_shape.insert(input1_shape.begin(), max_shape_size - input1_shape.size(), 1)
-    input2_shape.insert(input2_shape.begin(), max_shape_size - input2_shape.size(), 1)
-
-    # ex result (8, 7, 6, 5)
-    for it in range(max_shape_size):
-        if input1_shape[it] == input2_shape[it]:
-            result_shape.push_back(input1_shape[it])
-        elif input1_shape[it] == 1:
-            result_shape.push_back(input2_shape[it])
-        elif input2_shape[it] == 1:
-            result_shape.push_back(input1_shape[it])
-        else:
-            err_msg = f"{ERROR_PREFIX} in function get_common_shape(): "
-            err_msg += f"operands could not be broadcast together with shapes {input1_shape_orig} {input2_shape_orig}"
-            raise ValueError(err_msg)
-
-    return result_shape
-
-
 cdef dpnp_descriptor create_output_descriptor(shape_type_c output_shape,
                                               DPNPFuncType c_type,
                                               dpnp_descriptor requested_out,
@@ -572,10 +544,9 @@ cdef (DPNPFuncType, void *) get_ret_type_and_func(DPNPFuncData kernel_data,
 
 
 cdef class dpnp_descriptor:
-    def __init__(self, obj, dpnp_descriptor orig_desc=None):
+    def __init__(self, obj):
         """ Initialize variables """
         self.origin_pyobj = None
-        self.origin_desc = None
         self.descriptor = None
         self.dpnp_descriptor_data_size = 0
         self.dpnp_descriptor_is_scalar = True
@@ -594,10 +565,6 @@ cdef class dpnp_descriptor:
 
         self.origin_pyobj = obj
 
-        """ Keep track of a descriptor with original data """
-        if orig_desc is not None and orig_desc.is_valid:
-            self.origin_desc = orig_desc
-
         """ array size calculation """
         cdef Py_ssize_t shape_it = 0
         self.dpnp_descriptor_data_size = 1
@@ -657,14 +624,6 @@ cdef class dpnp_descriptor:
     def is_scalar(self):
         return self.dpnp_descriptor_is_scalar
 
-    @property
-    def is_temporary(self):
-        """
-        Non-none descriptor of original data means the current descriptor
-        holds a temporary allocated data.
-        """
-        return self.origin_desc is not None
-
     @property
     def data(self):
         if self.is_valid:
@@ -696,15 +655,6 @@ cdef class dpnp_descriptor:
 
         return interface_dict
 
-    def _copy_array_from(self, other_desc):
-        """
-        Fill array data with usm_ndarray of the same shape from other DPNP descriptor
-        """
-        if not isinstance(other_desc, dpnp_descriptor):
-            raise TypeError("expected dpnp_descriptor, got {}".format(type(other_desc)))
-
-        dpt_cu._copy_same_shape(self.get_array(), other_desc.get_array())
-
     def get_pyobj(self):
         return self.origin_pyobj
 
@@ -718,29 +668,6 @@ cdef class dpnp_descriptor:
             "expected either dpctl.tensor.usm_ndarray or dpnp.dpnp_array.dpnp_array, got {}"
             "".format(type(self.origin_pyobj)))
 
-    def get_result_desc(self, result_desc=None):
-        """
-        Copy the result data into an original array
-        """
-        if self.is_temporary:
-            # Original descriptor is not None, so copy the array data into it and return
-            from_desc = self if result_desc is None else result_desc
-            self.origin_desc._copy_array_from(from_desc)
-            return self.origin_desc
-        elif result_desc is not None:
-            # A temporary result descriptor was allocated, needs to copy data back into 'out' descriptor
-            self._copy_array_from(result_desc)
-        return self
-
-    def is_array_overlapped(self, other_desc):
-        """
-        Check if usm_ndarray overlaps an array from other DPNP descriptor
-        """
-        if not isinstance(other_desc, dpnp_descriptor):
-            raise TypeError("expected dpnp_descriptor, got {}".format(type(other_desc)))
-
-        return dpt_ti._array_overlap(self.get_array(), other_desc.get_array())
-
     cdef void * get_data(self):
         cdef Py_ssize_t item_size = 0
         cdef Py_ssize_t elem_offset = 0
@@ -755,9 +682,6 @@ cdef class dpnp_descriptor:
 
         return < void * > val
 
-    cdef cpp_bool match_ctype(self, DPNPFuncType ctype):
-        return self.dtype == dpnp_DPNPFuncType_to_dtype(< size_t > ctype)
-
     def __bool__(self):
         return self.is_valid
 
diff --git a/tests/skipped_tests.tbl b/tests/skipped_tests.tbl
index 199566295a34..944a4bd122d4 100644
--- a/tests/skipped_tests.tbl
+++ b/tests/skipped_tests.tbl
@@ -222,8 +222,6 @@ tests/third_party/cupy/math_tests/test_floating.py::TestFloating::test_ldexp
 tests/third_party/cupy/math_tests/test_floating.py::TestFloating::test_nextafter_combination
 tests/third_party/cupy/math_tests/test_floating.py::TestFloating::test_nextafter_float
 
-tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_fmax_nan
-tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_fmin_nan
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_negative
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_for_old_numpy
diff --git a/tests/skipped_tests_gpu.tbl b/tests/skipped_tests_gpu.tbl
index 26b521905396..61f981c2b9cc 100644
--- a/tests/skipped_tests_gpu.tbl
+++ b/tests/skipped_tests_gpu.tbl
@@ -273,8 +273,6 @@ tests/third_party/cupy/math_tests/test_floating.py::TestFloating::test_ldexp
 tests/third_party/cupy/math_tests/test_floating.py::TestFloating::test_nextafter_combination
 tests/third_party/cupy/math_tests/test_floating.py::TestFloating::test_nextafter_float
 
-tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_fmax_nan
-tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_fmin_nan
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_negative
 tests/third_party/cupy/math_tests/test_misc.py::TestMisc::test_nan_to_num_for_old_numpy
diff --git a/tests/test_mathematical.py b/tests/test_mathematical.py
index ae2c73748b56..54bc03d0a3f2 100644
--- a/tests/test_mathematical.py
+++ b/tests/test_mathematical.py
@@ -24,6 +24,7 @@
     get_float_complex_dtypes,
     get_float_dtypes,
     get_integer_dtypes,
+    has_support_aspect16,
     has_support_aspect64,
 )
 from .test_umath import (
@@ -1953,6 +1954,80 @@ def test_invalid_out(self, out):
         assert_raises(TypeError, numpy.divide, a.asnumpy(), 2, out)
 
 
+class TestFmaxFmin:
+    @pytest.mark.skipif(not has_support_aspect16(), reason="no fp16 support")
+    @pytest.mark.parametrize("func", ["fmax", "fmin"])
+    def test_half(self, func):
+        a = numpy.array([0, 1, 2, 4, 2], dtype=numpy.float16)
+        b = numpy.array([-2, 5, 1, 4, 3], dtype=numpy.float16)
+        c = numpy.array([0, -1, -numpy.inf, numpy.nan, 6], dtype=numpy.float16)
+        ia, ib, ic = dpnp.array(a), dpnp.array(b), dpnp.array(c)
+
+        result = getattr(dpnp, func)(ia, ib)
+        expected = getattr(numpy, func)(a, b)
+        assert_equal(result, expected)
+
+        result = getattr(dpnp, func)(ib, ic)
+        expected = getattr(numpy, func)(b, c)
+        assert_equal(result, expected)
+
+    @pytest.mark.parametrize("func", ["fmax", "fmin"])
+    @pytest.mark.parametrize("dtype", get_float_dtypes())
+    def test_float_nans(self, func, dtype):
+        a = numpy.array([0, numpy.nan, numpy.nan], dtype=dtype)
+        b = numpy.array([numpy.nan, 0, numpy.nan], dtype=dtype)
+        ia, ib = dpnp.array(a), dpnp.array(b)
+
+        result = getattr(dpnp, func)(ia, ib)
+        expected = getattr(numpy, func)(a, b)
+        assert_equal(result, expected)
+
+    @pytest.mark.parametrize("func", ["fmax", "fmin"])
+    @pytest.mark.parametrize("dtype", get_complex_dtypes())
+    @pytest.mark.parametrize(
+        "nan_val",
+        [
+            complex(numpy.nan, 0),
+            complex(0, numpy.nan),
+            complex(numpy.nan, numpy.nan),
+        ],
+        ids=["nan+0j", "nanj", "nan+nanj"],
+    )
+    def test_complex_nans(self, func, dtype, nan_val):
+        a = numpy.array([0, nan_val, nan_val], dtype=dtype)
+        b = numpy.array([nan_val, 0, nan_val], dtype=dtype)
+        ia, ib = dpnp.array(a), dpnp.array(b)
+
+        result = getattr(dpnp, func)(ia, ib)
+        expected = getattr(numpy, func)(a, b)
+        assert_equal(result, expected)
+
+    @pytest.mark.parametrize("func", ["fmax", "fmin"])
+    @pytest.mark.parametrize("dtype", get_float_dtypes(no_float16=False))
+    def test_precision(self, func, dtype):
+        dtmin = numpy.finfo(dtype).min
+        dtmax = numpy.finfo(dtype).max
+        d1 = dtype(0.1)
+        d1_next = numpy.nextafter(d1, numpy.inf)
+
+        test_cases = [
+            # v1     v2
+            (dtmin, -numpy.inf),
+            (dtmax, -numpy.inf),
+            (d1, d1_next),
+            (dtmax, numpy.nan),
+        ]
+
+        for v1, v2 in test_cases:
+            a = numpy.array([v1])
+            b = numpy.array([v2])
+            ia, ib = dpnp.array(a), dpnp.array(b)
+
+            result = getattr(dpnp, func)(ia, ib)
+            expected = getattr(numpy, func)(a, b)
+            assert_allclose(result, expected)
+
+
 class TestFloorDivide:
     @pytest.mark.usefixtures("suppress_divide_numpy_warnings")
     @pytest.mark.parametrize(
diff --git a/tests/test_usm_type.py b/tests/test_usm_type.py
index d38acc4a6570..44311813b185 100644
--- a/tests/test_usm_type.py
+++ b/tests/test_usm_type.py
@@ -639,8 +639,8 @@ def test_1in_1out(func, data, usm_type):
         pytest.param("dot", [3 + 2j, 4 + 1j, 5], [1, 2 + 3j, 3]),
         # TODO: uncomment once resolved in gh-1723 by dpctl
         # pytest.param("extract", [False, True, True, False], [0, 1, 2, 3]),
-        pytest.param("fmax", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
-        pytest.param("fmin", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
+        pytest.param("fmax", [0.0, 1.0, 2.0], [3.0, 4.0, 5.0]),
+        pytest.param("fmin", [0.0, 1.0, 2.0], [3.0, 4.0, 5.0]),
         pytest.param("fmod", [5, 3], [2, 2.0]),
         pytest.param(
             "gradient", [1, 2, 4, 7, 11, 16], [0.0, 1.0, 1.5, 3.5, 4.0, 6.0]
@@ -651,8 +651,8 @@ def test_1in_1out(func, data, usm_type):
         pytest.param("inner", [1.0, 2.0, 3.0], [4.0, 5.0, 6.0]),
         pytest.param("kron", [3.0, 4.0, 5.0], [1.0, 2.0]),
         pytest.param("logaddexp", [[-1, 2, 5, 9]], [[4, -3, 2, -8]]),
-        pytest.param("maximum", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
-        pytest.param("minimum", [[0.0, 1.0, 2.0]], [[3.0, 4.0, 5.0]]),
+        pytest.param("maximum", [0.0, 1.0, 2.0], [3.0, 4.0, 5.0]),
+        pytest.param("minimum", [0.0, 1.0, 2.0], [3.0, 4.0, 5.0]),
         pytest.param("searchsorted", [11, 12, 13, 14, 15], [-10, 20, 12, 13]),
         pytest.param(
             "tensordot",

From e6cf9d7192cf5ac1c5421b12ed7130963edffaa5 Mon Sep 17 00:00:00 2001
From: Anton <100830759+antonwolfy@users.noreply.github.com>
Date: Sat, 6 Jul 2024 21:46:15 +0200
Subject: [PATCH 47/49] Resolve compilation issues with new DPC++ 2025.0
 compiler (#1907)

* CL/sycl.hpp is deprecated, use sycl/sycl.hpp

* Explicitly include complex header

* Use explicit type casting in a function from sycl namespace

* Use proper sycl namespace

* Use multi_ptr instead of raw pointer in sycl::modf

* Applied pre-commit hook for clang-format
---
 dpnp/backend/examples/example10.cpp             |  2 +-
 dpnp/backend/extensions/lapack/geqrf.hpp        |  2 +-
 dpnp/backend/extensions/lapack/gesv.hpp         |  2 +-
 dpnp/backend/extensions/lapack/gesvd.hpp        |  2 +-
 dpnp/backend/extensions/lapack/getrf.hpp        |  2 +-
 dpnp/backend/extensions/lapack/getri.hpp        |  2 +-
 dpnp/backend/extensions/lapack/getrs.hpp        |  2 +-
 dpnp/backend/extensions/lapack/heevd.hpp        |  2 +-
 dpnp/backend/extensions/lapack/orgqr.hpp        |  2 +-
 dpnp/backend/extensions/lapack/potrf.hpp        |  2 +-
 dpnp/backend/extensions/lapack/syevd.hpp        |  2 +-
 dpnp/backend/extensions/lapack/ungqr.hpp        |  2 +-
 dpnp/backend/extensions/sycl_ext/sum_mean.hpp   |  2 +-
 dpnp/backend/kernels/dpnp_krnl_mathematical.cpp | 16 ++++++++++------
 dpnp/backend/src/dpnp_fptr.hpp                  |  2 +-
 dpnp/backend/src/dpnp_utils.hpp                 |  3 ++-
 dpnp/backend/src/verbose.hpp                    |  2 +-
 17 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/dpnp/backend/examples/example10.cpp b/dpnp/backend/examples/example10.cpp
index b09ea9b335dc..6607bbfd7ab0 100644
--- a/dpnp/backend/examples/example10.cpp
+++ b/dpnp/backend/examples/example10.cpp
@@ -35,8 +35,8 @@
 #include <iostream>
 #include <time.h>
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpnp_iface.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/geqrf.hpp b/dpnp/backend/extensions/lapack/geqrf.hpp
index 4ab65286b29b..2ef15ba7a89b 100644
--- a/dpnp/backend/extensions/lapack/geqrf.hpp
+++ b/dpnp/backend/extensions/lapack/geqrf.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/gesv.hpp b/dpnp/backend/extensions/lapack/gesv.hpp
index 12486fae7870..057d839e9412 100644
--- a/dpnp/backend/extensions/lapack/gesv.hpp
+++ b/dpnp/backend/extensions/lapack/gesv.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/gesvd.hpp b/dpnp/backend/extensions/lapack/gesvd.hpp
index 17ebd0edbe7d..891a041c89b7 100644
--- a/dpnp/backend/extensions/lapack/gesvd.hpp
+++ b/dpnp/backend/extensions/lapack/gesvd.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/getrf.hpp b/dpnp/backend/extensions/lapack/getrf.hpp
index fee9b209426e..cd96f73bb503 100644
--- a/dpnp/backend/extensions/lapack/getrf.hpp
+++ b/dpnp/backend/extensions/lapack/getrf.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/getri.hpp b/dpnp/backend/extensions/lapack/getri.hpp
index 75e9b16d4ef8..870a29362526 100644
--- a/dpnp/backend/extensions/lapack/getri.hpp
+++ b/dpnp/backend/extensions/lapack/getri.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/getrs.hpp b/dpnp/backend/extensions/lapack/getrs.hpp
index ca78ed8b80de..551c607c1e1a 100644
--- a/dpnp/backend/extensions/lapack/getrs.hpp
+++ b/dpnp/backend/extensions/lapack/getrs.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/heevd.hpp b/dpnp/backend/extensions/lapack/heevd.hpp
index 7b3bfc05d87a..3eae78bde245 100644
--- a/dpnp/backend/extensions/lapack/heevd.hpp
+++ b/dpnp/backend/extensions/lapack/heevd.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/orgqr.hpp b/dpnp/backend/extensions/lapack/orgqr.hpp
index 9cc4f530d038..83b9cdebe62c 100644
--- a/dpnp/backend/extensions/lapack/orgqr.hpp
+++ b/dpnp/backend/extensions/lapack/orgqr.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/potrf.hpp b/dpnp/backend/extensions/lapack/potrf.hpp
index f0850b3fd98d..c377820e1d1f 100644
--- a/dpnp/backend/extensions/lapack/potrf.hpp
+++ b/dpnp/backend/extensions/lapack/potrf.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/syevd.hpp b/dpnp/backend/extensions/lapack/syevd.hpp
index 9dfaba08ae15..1b6750487fd5 100644
--- a/dpnp/backend/extensions/lapack/syevd.hpp
+++ b/dpnp/backend/extensions/lapack/syevd.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/lapack/ungqr.hpp b/dpnp/backend/extensions/lapack/ungqr.hpp
index 1a9b68e94f96..06729e82eeec 100644
--- a/dpnp/backend/extensions/lapack/ungqr.hpp
+++ b/dpnp/backend/extensions/lapack/ungqr.hpp
@@ -25,8 +25,8 @@
 
 #pragma once
 
-#include <CL/sycl.hpp>
 #include <oneapi/mkl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpctl4pybind11.hpp>
 
diff --git a/dpnp/backend/extensions/sycl_ext/sum_mean.hpp b/dpnp/backend/extensions/sycl_ext/sum_mean.hpp
index 5333456b0c75..fe935752b03e 100644
--- a/dpnp/backend/extensions/sycl_ext/sum_mean.hpp
+++ b/dpnp/backend/extensions/sycl_ext/sum_mean.hpp
@@ -26,7 +26,7 @@
 #pragma once
 
 #include "dispatcher_utils.hpp"
-#include <CL/sycl.hpp>
+#include <sycl/sycl.hpp>
 #include <tuple>
 
 #include "utils/memory_overlap.hpp"
diff --git a/dpnp/backend/kernels/dpnp_krnl_mathematical.cpp b/dpnp/backend/kernels/dpnp_krnl_mathematical.cpp
index 44cd91854df4..7e358a8a7102 100644
--- a/dpnp/backend/kernels/dpnp_krnl_mathematical.cpp
+++ b/dpnp/backend/kernels/dpnp_krnl_mathematical.cpp
@@ -89,10 +89,10 @@ DPCTLSyclEventRef dpnp_ediff1d_c(DPCTLSyclQueueRef q_ref,
     _DataType_input *input1_data = input1_ptr.get_ptr();
     _DataType_output *result = result_ptr.get_ptr();
 
-    cl::sycl::event event;
-    cl::sycl::range<1> gws(result_size);
+    sycl::event event;
+    sycl::range<1> gws(result_size);
 
-    auto kernel_parallel_for_func = [=](cl::sycl::id<1> global_id) {
+    auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
         size_t output_id =
             global_id[0]; /*for (size_t i = 0; i < result_size; ++i)*/
         {
@@ -101,7 +101,7 @@ DPCTLSyclEventRef dpnp_ediff1d_c(DPCTLSyclQueueRef q_ref,
             result[output_id] = next_elem - curr_elem;
         }
     };
-    auto kernel_func = [&](cl::sycl::handler &cgh) {
+    auto kernel_func = [&](sycl::handler &cgh) {
         cgh.parallel_for<
             class dpnp_ediff1d_c_kernel<_DataType_input, _DataType_output>>(
             gws, kernel_parallel_for_func);
@@ -205,8 +205,12 @@ DPCTLSyclEventRef dpnp_modf_c(DPCTLSyclQueueRef q_ref,
         auto kernel_parallel_for_func = [=](sycl::id<1> global_id) {
             size_t i = global_id[0]; /*for (size_t i = 0; i < size; ++i)*/
             {
-                _DataType_input input_elem1 = array1[i];
-                result2[i] = sycl::modf(double(input_elem1), &result1[i]);
+                double input_elem1 = static_cast<double>(array1[i]);
+                auto res_multi_ptr = sycl::address_space_cast<
+                    sycl::access::address_space::global_space,
+                    sycl::access::decorated::yes>(&result1[i]);
+
+                result2[i] = sycl::modf(input_elem1, res_multi_ptr);
             }
         };
 
diff --git a/dpnp/backend/src/dpnp_fptr.hpp b/dpnp/backend/src/dpnp_fptr.hpp
index 73d627812a5c..5e07b11542da 100644
--- a/dpnp/backend/src/dpnp_fptr.hpp
+++ b/dpnp/backend/src/dpnp_fptr.hpp
@@ -35,7 +35,7 @@
 #include <complex>
 #include <map>
 
-#include <CL/sycl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpnp_iface_fptr.hpp>
 
diff --git a/dpnp/backend/src/dpnp_utils.hpp b/dpnp/backend/src/dpnp_utils.hpp
index 88e993a0a20e..89b8a7331537 100644
--- a/dpnp/backend/src/dpnp_utils.hpp
+++ b/dpnp/backend/src/dpnp_utils.hpp
@@ -29,10 +29,11 @@
 
 #include <algorithm>
 #include <cassert>
+#include <complex>
 #include <iostream>
 #include <iterator>
 
-#include <CL/sycl.hpp>
+#include <sycl/sycl.hpp>
 
 #include <dpnp_iface_fptr.hpp>
 
diff --git a/dpnp/backend/src/verbose.hpp b/dpnp/backend/src/verbose.hpp
index ae67dbe56fab..20a106ced3e9 100644
--- a/dpnp/backend/src/verbose.hpp
+++ b/dpnp/backend/src/verbose.hpp
@@ -27,7 +27,7 @@
 #ifndef VERBOSE_H // Cython compatibility
 #define VERBOSE_H
 
-#include <CL/sycl.hpp>
+#include <sycl/sycl.hpp>
 
 bool is_verbose_mode();
 void set_barrier_event(sycl::queue queue, std::vector<sycl::event> &depends);

From 637b4c56fc9639f3e73cc1c384f6c44ef7360c21 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sat, 6 Jul 2024 22:47:30 +0200
Subject: [PATCH 48/49] Bump actions/upload-artifact from 4.3.3 to 4.3.4
 (#1911)

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.3 to 4.3.4.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/65462800fd760344b1a7b4382951275a0abb4808...0b2256b8c012f0828dc542b3febcab082c67f72b)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 .github/workflows/conda-package.yml     | 4 ++--
 .github/workflows/openssf-scorecard.yml | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/conda-package.yml b/.github/workflows/conda-package.yml
index c67b74874295..29ba04a40b6e 100644
--- a/.github/workflows/conda-package.yml
+++ b/.github/workflows/conda-package.yml
@@ -141,13 +141,13 @@ jobs:
         run: conda build --no-test --python ${{ matrix.python }} --numpy 1.24 ${{ env.CHANNELS }} conda-recipe
 
       - name: Upload artifact
-        uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4.3.3
+        uses: actions/upload-artifact@0b2256b8c012f0828dc542b3febcab082c67f72b # v4.3.4
         with:
           name: ${{ env.PACKAGE_NAME }} ${{ runner.os }} Python ${{ matrix.python }}
           path: ${{ env.CONDA_BLD }}${{ env.PACKAGE_NAME }}-*.tar.bz2
 
       - name: Upload wheels artifact
-        uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4.3.3
+        uses: actions/upload-artifact@0b2256b8c012f0828dc542b3febcab082c67f72b # v4.3.4
         with:
           name: ${{ env.PACKAGE_NAME }} ${{ runner.os }} Wheels Python ${{ matrix.python }}
           path: ${{ env.WHEELS_OUTPUT_FOLDER }}${{ env.PACKAGE_NAME }}-*.whl
diff --git a/.github/workflows/openssf-scorecard.yml b/.github/workflows/openssf-scorecard.yml
index 9658c7e3b2f0..df1ff4f95907 100644
--- a/.github/workflows/openssf-scorecard.yml
+++ b/.github/workflows/openssf-scorecard.yml
@@ -60,7 +60,7 @@ jobs:
       # Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
       # format to the repository Actions tab.
       - name: "Upload artifact"
-        uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4.3.3
+        uses: actions/upload-artifact@0b2256b8c012f0828dc542b3febcab082c67f72b # v4.3.4
         with:
           name: SARIF file
           path: results.sarif

From b64442b314b3b24f4a5a1f27fc1fb08b2db695d9 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 7 Jul 2024 13:13:57 +0200
Subject: [PATCH 49/49] Bump actions/download-artifact from 4.1.7 to 4.1.8
 (#1910)

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4.1.7 to 4.1.8.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/65a9edc5881444af0b9093a5e628f2fe47ea3b2e...fa0a91b85d4f404e444e00e005971372dc801d16)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
---
 .github/workflows/conda-package.yml | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/conda-package.yml b/.github/workflows/conda-package.yml
index 29ba04a40b6e..d1e2ebf5492c 100644
--- a/.github/workflows/conda-package.yml
+++ b/.github/workflows/conda-package.yml
@@ -180,7 +180,7 @@ jobs:
 
     steps:
       - name: Download artifact
-        uses: actions/download-artifact@65a9edc5881444af0b9093a5e628f2fe47ea3b2e # v4.1.7
+        uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
         with:
           name: ${{ env.PACKAGE_NAME }} ${{ runner.os }} Python ${{ matrix.python }}
           path: ${{ env.pkg-path-in-channel }}
@@ -301,7 +301,7 @@ jobs:
 
     steps:
       - name: Download artifact
-        uses: actions/download-artifact@65a9edc5881444af0b9093a5e628f2fe47ea3b2e # v4.1.7
+        uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
         with:
           name: ${{ env.PACKAGE_NAME }} ${{ runner.os }} Python ${{ matrix.python }}
           path: ${{ env.pkg-path-in-channel }}
@@ -453,12 +453,12 @@ jobs:
 
     steps:
       - name: Download artifact
-        uses: actions/download-artifact@65a9edc5881444af0b9093a5e628f2fe47ea3b2e # v4.1.7
+        uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
         with:
           name: ${{ env.PACKAGE_NAME }} ${{ runner.os }} Python ${{ matrix.python }}
 
       - name: Download wheels artifact
-        uses: actions/download-artifact@65a9edc5881444af0b9093a5e628f2fe47ea3b2e # v4.1.7
+        uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
         with:
           name: ${{ env.PACKAGE_NAME }} ${{ runner.os }} Wheels Python ${{ matrix.python }}