Skip to content

perf(autograd): optimize grey_dilation with striding #2589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

yaugenst-flex
Copy link
Collaborator

@yaugenst-flex yaugenst-flex commented Jun 20, 2025

The previous implementation of grey_dilation was based on convolution, which was slow for both the forward and backward passes.

This PR replaces it with a high-performance implementation that uses NumPy's sliding_window_view to create sliding window views of the input array. I also wrote a custom VJP that uses the same striding technique to make the backward pass faster too.

I also simplified the implementation of grey_erosion so that grey_dilation is now the only function that does the heavy lifting.

Benchmarks show speedups of 10-100x depending on the array and kernel size.

dilation_image_scaling
dilation_kernel_scaling

This should make these ops much more usable in topopt @groberts-flex

Greptile Summary

Significant performance optimization of the grey_dilation morphological operation by replacing convolution-based implementation with NumPy's sliding_window_view for strided array operations.

  • Replaced convolution-based implementation with strided array approach in tidy3d/plugins/autograd/functions.py, achieving 10-100x speedup
  • Added custom VJP (vector-Jacobian product) for efficient backpropagation using the same striding technique
  • Simplified grey_erosion by expressing it through duality with grey_dilation
  • Updated morphology test cases to use first-order gradients and include kernel structure testing
  • Added comprehensive benchmarks showing performance improvements scaling with array and kernel sizes

@yaugenst-flex yaugenst-flex self-assigned this Jun 20, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

3 files reviewed, no comments
Edit PR Review Bot Settings | Greptile

Copy link
Contributor

github-actions bot commented Jun 20, 2025

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

  • tidy3d/plugins/autograd/functions.py (98.8%): Missing lines 67

Summary

  • Total: 83 lines
  • Missing: 1 line
  • Coverage: 98%

tidy3d/plugins/autograd/functions.py

  63         The indices for padding along the axis.
  64     """
  65     total_pad = sum(pad_width)
  66     if n == 0:
! 67         return numpy_module.zeros(total_pad, dtype=int)
  68 
  69     idx = numpy_module.arange(-pad_width[0], n + pad_width[1])
  70 
  71     if mode == "constant":

@yaugenst-flex yaugenst-flex force-pushed the yaugenst-flex/faster-morphology branch 2 times, most recently from 266d6a0 to be0b9b0 Compare June 20, 2025 15:08
raise ValueError("Either size or structure must be provided.")
if structure is None:
size_np = onp.atleast_1d(size)
shape = (size_np[0], size_np[-1]) if size_np.size > 1 else (size_np[0], size_np[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case of a 1D structuring element or size, does this mean the dilation/erosion gets applied in 2D and then just a single dimension is extracted at the end?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation is still applied to the full 2D array, but with a 1D-like structuring element. You always need to define a 2D structuring element of shape like (1, size) or (size, 1). The dilation/erosion
operation slides this structuring element across the entire 2D array, effectively performing the operation along rows or columns respectively. I updated the docstring to clarify this, because we don't explicitly handle 1d structuring elements.

@@ -238,9 +204,27 @@ def convolve(
return convolve_ag(array, kernel, axes=axes, mode=mode)


def _get_footprint(size, structure, maxval):
"""Helper to generate the morphological footprint from size or structure."""
if size is None and structure is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly the other case to catch is that both the size and the structure are specified

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good catch. i think this was discussed previously at some point and we went for silently having precedence of structuring element over sizes, but since it came up again i added a check instead to forbid this


padded_array_np = getval(padded_array)

windows = sliding_window_view(padded_array_np, window_shape=(h, w))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking my understanding - this replaces the convolve call in the previous version which was being used to essentially create this same view but doing so through a bunch of unnecessary computation with an identity kernel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly. the previous implementation would create the sliding windows (which we now do directly), multiply each window element-wise with the kernel, and then sum the results. you can see how this scales horrendously with image and kernel sizes 😄


# normalize the gradient for cases where multiple elements are the maximum.
multiplicity = onp.sum(is_max_mask, axis=(-2, -1), keepdims=True)
is_max_mask /= onp.maximum(multiplicity, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fully understanding this part - is it possible for values to come out of the operation as the max_val?

Copy link
Collaborator Author

@yaugenst-flex yaugenst-flex Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, values cannot equal max_val in the output. i added a comment explaining this:

# Note: Values can never exceed maxval in the output since we add structure
# values (capped at maxval) to the input array values.

Copy link
Contributor

@groberts-flex groberts-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this implementation, the speed up looks awesome especially for a function that will be in a lot of robust optimizations!

left some comments/questions, some just for my own understanding!

The previous implementation of `grey_dilation` was based on convolution, which was slow for both the forward and backward passes.

This commit replaces it with a high-performance implementation that uses NumPy's `as_strided` to create sliding window views of the input array. This avoids redundant computations and memory allocations, leading to significant speedups.

The VJP (gradient) for the primitive is also updated to use the same striding technique, ensuring the backward pass is also much faster.

Benchmarks show speedups of 10-100x depending on the array and kernel size.
@yaugenst-flex yaugenst-flex force-pushed the yaugenst-flex/faster-morphology branch from be0b9b0 to 21486df Compare June 24, 2025 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants