Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement asynchronous fill method using dpctl kernels #2055

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ndgrigorian
Copy link

@ndgrigorian ndgrigorian commented Sep 17, 2024

This PR proposes a change to dpnp_array.fill method which leverages dpctl kernels to make fill asynchronous and more efficient, avoiding repeated calls to index the array and copying scalars to the device for each element.

Shows significant performance gains on Iris Xe in WSL

Before

In [1]: import dpnp as dnp

In [2]: x_dnp = dnp.empty(10000, dtype="c8")

In [3]: %timeit x_dnp.fill(10)
1.25 s ± 47.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit x_dnp.fill(10)
1.26 s ± 27.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After

In [8]: %timeit x_dnp.fill(10); q.wait()
229 μs ± 37.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you filing the PR as a draft?

@ndgrigorian ndgrigorian changed the title Implement efficient, asynchronous fill method using dpctl kernels Implement asynchronous fill method using dpctl kernels Sep 17, 2024
New fill implementation does not permit NumPy array values, consistent with fill_diagonal
@ndgrigorian
Copy link
Author

@antonwolfy @vtavana @vlad-perevezentsev
I've added a commit to skip test_fill_with_numpy_scalar_ndarray from the CuPy tests.

dpnp.fill_diagonal does not permit NumPy arrays, so it made sense to do the same here. If it would be preferred to keep this feature, it can be implemented, but I would argue that for consistency, the two should behave the same.

Copy link
Contributor

@antonwolfy antonwolfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndgrigorian, thank you for implementing so great improvement.
Please find some comments below.

dpnp/dpnp_algo/dpnp_fill.py Outdated Show resolved Hide resolved
dpnp/dpnp_algo/dpnp_fill.py Outdated Show resolved Hide resolved
dpnp/dpnp_algo/dpnp_fill.py Outdated Show resolved Hide resolved
dpnp/dpnp_algo/dpnp_fill.py Show resolved Hide resolved
@@ -903,8 +903,10 @@ def fill(self, value):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update docstring a bit, like:

        """
        Fill the array with a scalar value.

        For full documentation refer to :obj:`numpy.ndarray.fill`.

        Parameters
        ----------
        value : {dpnp.ndarray, usm_ndarray, scalar}
            All elements of `a` will be assigned this value.

        Examples
        --------
        >>> import numpy as np
        >>> a = np.array([1, 2])
        >>> a.fill(0)
        >>> a
        array([0, 0])
        >>> a = np.empty(2)
        >>> a.fill(1)
        >>> a
        array([1.,  1.])

        """

if isinstance(val, (dpnp_array, dpt.usm_ndarray)):
val = dpnp.get_usm_ndarray(val)
if val.shape != ():
raise ValueError("`val` must be a scalar")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add tests to cover lines: 50, 71, 73?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants