-
Notifications
You must be signed in to change notification settings - Fork 58
perf(autograd): optimize grey_dilation with striding #2589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
3 files reviewed, no comments
Edit PR Review Bot Settings | Greptile
Diff CoverageDiff: origin/develop...HEAD, staged and unstaged changes
Summary
tidy3d/plugins/autograd/functions.py
|
266d6a0
to
be0b9b0
Compare
raise ValueError("Either size or structure must be provided.") | ||
if structure is None: | ||
size_np = onp.atleast_1d(size) | ||
shape = (size_np[0], size_np[-1]) if size_np.size > 1 else (size_np[0], size_np[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case of a 1D structuring element or size, does this mean the dilation/erosion gets applied in 2D and then just a single dimension is extracted at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operation is still applied to the full 2D array, but with a 1D-like structuring element. You always need to define a 2D structuring element of shape like (1, size)
or (size, 1)
. The dilation/erosion
operation slides this structuring element across the entire 2D array, effectively performing the operation along rows or columns respectively. I updated the docstring to clarify this, because we don't explicitly handle 1d structuring elements.
@@ -238,9 +204,27 @@ def convolve( | |||
return convolve_ag(array, kernel, axes=axes, mode=mode) | |||
|
|||
|
|||
def _get_footprint(size, structure, maxval): | |||
"""Helper to generate the morphological footprint from size or structure.""" | |||
if size is None and structure is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly the other case to catch is that both the size and the structure are specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good catch. i think this was discussed previously at some point and we went for silently having precedence of structuring element over sizes, but since it came up again i added a check instead to forbid this
|
||
padded_array_np = getval(padded_array) | ||
|
||
windows = sliding_window_view(padded_array_np, window_shape=(h, w)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking my understanding - this replaces the convolve call in the previous version which was being used to essentially create this same view but doing so through a bunch of unnecessary computation with an identity kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly. the previous implementation would create the sliding windows (which we now do directly), multiply each window element-wise with the kernel, and then sum the results. you can see how this scales horrendously with image and kernel sizes 😄
|
||
# normalize the gradient for cases where multiple elements are the maximum. | ||
multiplicity = onp.sum(is_max_mask, axis=(-2, -1), keepdims=True) | ||
is_max_mask /= onp.maximum(multiplicity, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not fully understanding this part - is it possible for values to come out of the operation as the max_val?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, values cannot equal max_val
in the output. i added a comment explaining this:
# Note: Values can never exceed maxval in the output since we add structure
# values (capped at maxval) to the input array values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this implementation, the speed up looks awesome especially for a function that will be in a lot of robust optimizations!
left some comments/questions, some just for my own understanding!
The previous implementation of `grey_dilation` was based on convolution, which was slow for both the forward and backward passes. This commit replaces it with a high-performance implementation that uses NumPy's `as_strided` to create sliding window views of the input array. This avoids redundant computations and memory allocations, leading to significant speedups. The VJP (gradient) for the primitive is also updated to use the same striding technique, ensuring the backward pass is also much faster. Benchmarks show speedups of 10-100x depending on the array and kernel size.
be0b9b0
to
21486df
Compare
The previous implementation of
grey_dilation
was based on convolution, which was slow for both the forward and backward passes.This PR replaces it with a high-performance implementation that uses NumPy's
sliding_window_view
to create sliding window views of the input array. I also wrote a custom VJP that uses the same striding technique to make the backward pass faster too.I also simplified the implementation of
grey_erosion
so thatgrey_dilation
is now the only function that does the heavy lifting.Benchmarks show speedups of 10-100x depending on the array and kernel size.
This should make these ops much more usable in topopt @groberts-flex
Greptile Summary
Significant performance optimization of the
grey_dilation
morphological operation by replacing convolution-based implementation with NumPy'ssliding_window_view
for strided array operations.tidy3d/plugins/autograd/functions.py
, achieving 10-100x speedupgrey_erosion
by expressing it through duality withgrey_dilation