Replies: 4 comments
-
So it's possible to use |
Beta Was this translation helpful? Give feedback.
-
Something like that... It's not strictly the number of terms. Here the terms are differentiated, so there is two levels of nesting. I think it has to do with the total "complexity" of the term, somehow. |
Beta Was this translation helpful? Give feedback.
-
Thanks @glwagner . Yes, complexity is certainly the right word for this problem. |
Beta Was this translation helpful? Give feedback.
-
Okay, I think I may understand why this is a problem. I believe this is because such operation chains pass the Here are two solutions:
|
Beta Was this translation helpful? Give feedback.
-
This is a summary of the current salient issues discussed on #1241. Much of that discussion is out of date; however one issue that remains is that complex GPU AbstractOperations can produce PTX code with function signatures that consume too much "parameter space".
To reproduce this issue:
and then
A possible solution is proposed at JuliaGPU/CUDA.jl#267.
One workaround within Oceananigans is to "stage" the computation:
By sharing memory between the
ComputedField
s, we avoid allocating more memory in this solution. It may still be more computationally expensive however (though benchmarking is required to confirm that, as its not certain).Another solution is to hand-write the kernel operation using
KernelFunctionOperation
.cc @tomchor
Beta Was this translation helpful? Give feedback.
All reactions