-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CeedBasisApplyAdd #1644
Add CeedBasisApplyAdd #1644
Conversation
9aa6d80
to
855975c
Compare
Ok, design decision point - the usage really only makes sense for I think for GPU I think I'll want separate Apply vs ApplyTranspose kernels in some cases? |
8178602
to
a322f13
Compare
Agreed that having the transpose as an input, but verifying it to always be transpose works well. Might also include that in the docstring for it, ie:
|
a322f13
to
652a51b
Compare
eac281b
to
5e414ad
Compare
Update - CUDA/HIP ref and shared are good, need to tackle MAGMA |
I'm not able to follow libCEED activity as closely these days, but please ping me if you run into any issues with the MAGMA backend. |
Thanks! The TLDR here is that I want to sum into the target vector with CeedBasisApplyAdd for CEED_TRANSPOSE for CEED_EVAL_INTERP and CEED_EVAL_GRAD. It looks not too tricky for MAGMA at a glance but I haven't dug in on it |
30a674e
to
bd66db0
Compare
Only tensor product grad is correctly summing into the target vector with my changes for MAGMA, unclear why but I'll dig more Monday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick review, found some copy-paste bugs
bd66db0
to
18d93ab
Compare
As a side note, looking at the new tests in this PR reminded me that we might want to check on the test coverage for the non-tensor basis in MAGMA. Now that we have two options -- the RTC kernels for lower orders and standard library calls for the rest -- I'm not sure how often, if at all, the "standard" route is being tested. (For t363 I manually lowered |
56c8d78
to
f5cd871
Compare
f5cd871
to
311be37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like this refactor. I noticed a few places where it is probably more clear to index an array instead of doing pointer arithmetic, but if you disagree that's fine.
include/ceed/jit-source/cuda/cuda-ref-basis-nontensor-templates.h
Outdated
Show resolved
Hide resolved
include/ceed/jit-source/hip/hip-ref-basis-nontensor-templates.h
Outdated
Show resolved
Hide resolved
include/ceed/jit-source/magma/magma-basis-interp-deriv-nontensor.h
Outdated
Show resolved
Hide resolved
style - consistently use indexing over pointer arithmatic Co-authored-by: Zach Atkins <[email protected]>
7b32c88
to
fbbb68f
Compare
WIP, GPU impl and CeedBasisApplyAddAtPoints to come