Element-wise matrix multiplication performance? #1181

xanderdunn · 2023-02-12T19:30:53Z

xanderdunn
Feb 12, 2023

Running the unaltered element-wise vector multiplication tutorial on my A100 machine produces these benchmark results:

This compares favorably to the same results in the

Now I make a minor 2-line modification to instead do element-wise multiplication. Line 45 of the kernel changes from output = x + y to output = x * y. Line 81 of the torch comparison changes from output_torch = x + y to output_torch = x * y. Now, on the same machine the benchmark result looks like this:

Again, good results.

Finally, I make a three-line modification to do matrix-matrix element-wise multiplication. Lines 118 and 119 of the benchmark become:

x = torch.rand((size, size), device='cuda', dtype=torch.float32)
y = torch.rand((size, size), device='cuda', dtype=torch.float32)

And change line 104 to keep the matrices within memory limits: 128 * i for i in range(2, 264). The results:

The x axes here are not directly comparable. The first two show numel of a vector and the last shows N where the matrices being multiplied are both shape (N, N)

Is it expected that the matrix multiplication performance is orders of magnitude lower throughput? Is this a poor way of implementing a matrix-matrix element-wise multiplication in triton? It would appear pytorch performance suffers just as much, so unless I'm doing something wrong across the board this would appear to be expected? I may be missing some basic context, please recommend any relevant reading.

Answered by Chillee

Feb 17, 2023

@xanderdunn Did you also modify the GB/s calculation? Line 124 would need to be changed to 12 * size^2 if you're changing it to a matrix.

View full answer

Chillee · 2023-02-17T08:30:04Z

Chillee
Feb 17, 2023

@xanderdunn Did you also modify the GB/s calculation? Line 124 would need to be changed to 12 * size^2 if you're changing it to a matrix.

1 reply

xanderdunn Feb 17, 2023
Author

Ah, that'll do it!

Thanks very much, I should've caught that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Element-wise matrix multiplication performance? #1181

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Element-wise matrix multiplication performance? #1181

xanderdunn Feb 12, 2023

Replies: 1 comment · 1 reply

Chillee Feb 17, 2023

xanderdunn Feb 17, 2023 Author

xanderdunn
Feb 12, 2023

Replies: 1 comment 1 reply

Chillee
Feb 17, 2023

xanderdunn Feb 17, 2023
Author