-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable multiple precisions in CeedVector #948
base: main
Are you sure you want to change the base?
Conversation
…tions to generic version with precision parameter
…for different precisions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm slow on this. It's looking really good. I made a couple comments that I hope will help keep CEED_SCALAR_TYPE
out of library code. My intent is to use C11 _Generic
to dispatch to the appropriate *FP32
or *FP64
when available, otherwise fall back to a default that the user can select by including ceed-f64.h
or ceed-f32.h
.
I think we should be consistent, and either call the functions *F32
or name the header ceed-fp32.h
; they're currently *FP32
and ceed-f32.h
.
include/ceed/ceed.h
Outdated
CEED_SCALAR_FP64, | ||
/// Total number of allowed scalar precision types | ||
CEED_NUM_PRECISIONS, | ||
} CeedScalarType; | ||
/// Struct for holding data in multiple precisions for mixed-precision-enabled | ||
/// backends | ||
typedef struct { | ||
void *values[CEED_NUM_PRECISIONS]; // Size equals CEED_NUM_PRECISIONS | ||
} CeedScalarArray; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This automatically gets the right sizes. Can we move CeedScalarArray
to ceed-impl.h
or backend.h
?
include/ceed/ceed.h
Outdated
/// backends | ||
typedef struct { | ||
void *values[CEED_NUM_PRECISIONS]; // Size equals CEED_NUM_PRECISIONS | ||
} CeedScalarArray; | ||
/// Base scalar type for the library to use: change which header is | ||
/// included to change the precision. | ||
#include "ceed-f64.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would allow a user to include the type-specific header before ceed.h
, and thus be able to select the default at file granularity.
#include "ceed-f64.h" | |
#ifndef CEED_SCALAR_TYPE | |
# include "ceed-f64.h" | |
#endif |
Do you think we can avoid referencing CEED_SCALAR_TYPE
in library code? I think that's the eventual goal.
int CeedVectorSetArray(CeedVector vec, CeedMemType mem_type, | ||
CeedCopyMode copy_mode, | ||
CeedScalar *array) { | ||
return CeedVectorSetArrayGeneric(vec, mem_type, CEED_SCALAR_TYPE, copy_mode, | ||
(void **) array); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have ceed.h
use a macro to make CeedVectorSetArray
map to either CeedVectorSetArrayFP32
or CeedVectorSetArrayFP64
so that we don't need an actual symbol in the library for this, and thus can remove this use of CEED_SCALAR_TYPE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I've spent a bit of time looking into this, because I didn't realize removal of CEED_SCALAR_TYPE
was a goal.
What would be the benefit of this? Just that the user wouldn't have to re-compile the library in order to reset the default precision?
One thing I am not sure about is how it would (or wouldn't) affect the bindings for other languages. I remember the CeedGetScalarType
interface function was added specifically for the Julia bindings, for example. We also use CEED_SCALAR_TYPE
in the Python bindings right now.
I went through and made a list of all the functions in headers that take CeedScalar
or CeedScalar *
in the prototype, which would need to have versions for every allowed precision if CeedScalar
is not set at library compile time (as is the case if no CEED_SCALAR_TYPE
/no re-compiling). They can be broken into several groups:
- Functions that we would likely already be modifying or adding multiple versions of for different precisions:
- the
Get/Set/Take/RestoreArray
functions for Vectors, obviously, as well asSetValue
,Norm
,Scale
, andAXPY
(which we haven't yet decided what to do with, but are clear candidates for multiple precisions). - basis creation:
CreateTensorH1
,CreateH1
,CreateHdiv
: makes sense that if we allow different computation and/or input/output precisions for Basis objects (mixed-precision setting), we'd maybe want versions allowing the user to create the object with quadrature/interp/gradient data in the precision that will be used in computation. So that makes sense - other basis stuff:
GetQRef
,GetQWeights
,GetInterp
,GetInterp1D
,GetGrad
,GetGrad1D
,GetDiv
,GaussQuadrature
,LobattoQuadrature
,QRFactorization
,SymmetricSchurDecomposition
,SimultaneousDiagonalization
-- most of these probably would also make sense to have versions for different precisions in a mixed-precision setting, but we are definitely starting to "explode" the number of added functions at this point (especially if we expand to more precisions in the future...) - the definition of
QFunctionUser
function pointer prototype -- we already need a solution for mixed-precision QFunctions anyway
- Miscellaneous other stuff:
- Two preconditioning Operator functions:
MultigridLevelCreateTensorH1
andMultigridLevelCreateH1
-- not sure about those - a few functions in
ceed/backend.h
:BasisGetCollocatedGrad
,HouseholderApplyQ
,TensorContractApply
,MatrixMultiply
. These are inbackend.h
instead ofceed.h
like the rest -- this may affect how easy it is to switch versions without re-compiling?
But at any rate, we'd need "generic" versions of most/all these functions, as well, for the library to use internally, to avoid re-compilation if we changeCeedScalar
, right? Though for many of the computational routines, we'd still need separate implementations for each precision, I think, unlike the vector access functions... - We would also need to change the definitions of some of the objects (private structs), like Basis and QFunctionContext, which include
CeedScalar
types.
use a macro to make CeedVectorSetArray map to either CeedVectorSetArrayFP32 or CeedVectorSetArrayFP64 so that we don't need an actual symbol in the library for this
Hmm, I guess I had thought we would define different versions of all the above functions from ceed.h
in ceed-f32
and ceed-f64
with the corresponding types -- to have a version that would still work with C99, too. Unless we want to require C11? I thought not? I guess a macro would achieve the same thing.
I'm not sure how any of these options with affect the bindings for other languages, though, as mentioned previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought CEED_SCALAR_TYPE
was a stop-gap until we could get rid of it with decent ergonomics. The foreign language bindings are fine (simpler even) because they'll just call to the correct type-specialized one.
I don't know that CreateH1
and other quadrature/basis stuff need specializations for f32 (they can easily be obtained by casting the f64 specification). Doing the setup in f64 and casting the result when it starts operating on problem-sized data is more accurate at equivalent cost. They would need specializations if we add __float128
support sometime. But not urgent in any case.
I think the internal stuff like tensor contraction needs specializations anyway to actually do the basis apply work in the selected precision.
IMO, we can't require C11 _Generic
(not least because we need to support C++ callers). The ceed-f{32,64}.h
headers would just define CeedScalar
so people can be oblivious to precision. Foreign language callers wouldn't touch it. C11 _Generic
or C++ templates would enable callers to do mixed precision stuff without explicitly acknowledging it when calling libceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm still missing something. What's the point of getting rid of CEED_SCALAR_TYPE
(which defines CeedScalar
at compile time) unless we want to select the type of CeedScalar
without re-compiling the library? In which case, we'd need multiple versions of and/or a generic version of any function with CeedScalar
in its prototype whether or not we intend to expose it to users, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if CeedScalar
isn't known at compile time, we couldn't use it internally for computation anywhere in the library...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I envision that everyone will be content using f64 to compute basis data, except a rare few who want to experiment with __float128
. Is this something we could reasonably roll into this PR or should we shoot to merge this with libceed-build-time CeedScalar
and make that next step in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing to consider with the basis data, I guess, is that we may also want the storage to be CeedScalarArray
type, so if, for example, the compute precision doesn't match CeedBasisScalar
, we would have a convenient place to store the converted array and re-use every time the basis is applied.
Is this something we could reasonably roll into this PR or should we shoot to merge this with libceed-build-time CeedScalar and make that next step in a future PR?
Not sure. I feel like this will already be a big PR -- I haven't even added/finalized any tests for making sure the multiprecision vectors are working, and once hip-ref is more or less finalized, I will add the same changes for cuda-ref (and applicable ones for ref).
We might also want to add in this PR the extra versions/changes for the other vector functions: SetValue
, Norm
, Scale
, and AXPY
...
I should also highlight a few other discussion points in this draft that I haven't pointed out yet; I'll do that now.
There's a merge conflict with main now thanks to the SyncArray update PR, making rebasing less straightforward (when I tried it previously). If we're okay with squash-merging this eventually, it will be fine, but that may also be something to consider...unless I start a "clean" branch with the finalized Vector changes, delete this branch, and we continue the PR from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question about the possible CeedBasisScalar
, this would be defined at libCEED compile time, right? And could not be changed without re-compiling? And only things that we don't switch to CeedBasisScalar
would need to be available in multiple versions in order to select the correct one with ceed-f32.h
?
(Also, I agree about the naming being consistent -- from your much earlier review comment. I like FP better, so I'd vote for ceed-fp32.h
, but would it be too much of a hassle to change the header names at this point since we've already released versions with the current names?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly right in the first paragraph.
I think our norms and axpy are mostly just used for testing without PETSc or MFEM -- libCEED doesn't intend to be a complete solvers library so any real caller has a library for that stuff. We could just use one precision (CeedBasisScalar
or maybe there's a better name for this concept that it's not problem-sized) or we could make it dispatch by having the internal computation always work in CeedBasisScalar
, but promote f32 inputs to AXPY and demote outputs of Norm (rather than plumbing both variants into backends).
I don't especially care on the names, just consistency. There may not be many direct includes in the wild. f32
and f64
are the Rust type names, fwiw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't want any unintuitive surprises for users...for example, in the norm implementation for CPU backends (done in the interface), we call GetArrayRead
followed by using that data to compute the norm. But in a multiprecision case, what precision should be used to get the array? If the vector only has float data and CeedBasisScalar
(or whatever we want to call it) is double, for example, calling GetArrayRead
in CeedBasisScalar
would do unnecessary conversion (and maybe allocation, too) in order to do the computation in CeedBasisScalar
when the expected behavior would be to use float.
So, maybe it would be useful to make the backend versions of Norm
, SetValue
, Scale
, and AXPY
be Generic
like the Get/Set/Take/Sync
functions, and have different interface versions which call the generic function with the correct precision? (And also move the Norm
implementation into the ref backend instead of interface-level.)
Then what if the interface level had, as an example for CeedVectorNorm
:
CeedVectorNormFP32
: calls the generic version withCEED_SCALAR_FP32
CeedVectorNormFP64
: calls the generic version withCEED_SCALAR_FP64
CeedVectorNorm
: result is alwaysCeedBasisScalar
(for the prototype, known at compile time), but computation happens in whatever precision the function deems "most likely to be intended" by the call (if only one precision is currently valid for the vector, use that precision; if two or more precisions are valid, pick one based on some heuristic: most precise? Least, for better performance?)
Of course, I guess that would ruin the plan to have *FP32/*FP64
/etc. functions defined as their corresponding "regular" versions in the ceed-fp32
or ceed-fp64
headers as a way to set CeedScalar
without recompiling the library, if CeedVectorNorm
already had a distinct use beyond being a stand-in for one of the specific precision versions. So maybe we shouldn't do that. But I think the intended behavior of Norm
based on whatever data is currently valid in the vector is unclear...
Then there is the issue with AXPY
: in the case where the input vectors do not have the same valid precisions, what should we do? Should we specify what precision we want to use, or let the function pick based on some heuristic?
In the same vein, we would also need to decide how we want PointwiseMult
to behave in the case of multiple precisions, even thought it doesn't take a scalar in its prototype, with the same questions as AXPY
.
For Scale
, I'd assume we'd want the behavior to be to scale all currently valid precisions in the vector, so that's easy. It could take the CeedBasisScalar
type to avoid needing multiple versions.
(We really probably should have a different name for that type...ideas? CeedBaseScalar
? CeedDataScalar
?)
static inline int CeedScalarTypeGetSize_Hip(Ceed ceed, CeedScalarType prec, | ||
size_t *size) { | ||
switch(prec) { | ||
case CEED_SCALAR_FP32: | ||
*size = sizeof(float); | ||
break; | ||
case CEED_SCALAR_FP64: | ||
*size = sizeof(double); | ||
break; | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this in the backend for now while testing with hip-ref, but is this something that could be useful/necessary to have at an interface level for all backends, rather than each ref
backend duplicating it?
/** | ||
@brief Check the current status of the CeedVector's data arrays. | ||
|
||
This function sets unsigned int parameters indicating for which precisions | ||
the data for mem_type is not NULL, for borrowed and owned arrays, and also | ||
which precisions are currently valid. | ||
The values can be checked bitwise, such that, e.g., | ||
(valid_status & (1 << CEED_SCALAR_FP32)) will be true if the FP32 array is | ||
currently valid, and false otherwise. | ||
|
||
@param vec CeedVector for which to check status | ||
@param mem_type Mem type for which to check status | ||
@param valid_status Status indicator for valid data | ||
@param borrowed_status Status indicator for borrowed arrays | ||
@param owned_status Status indicator for owned arrays | ||
**/ | ||
int CeedVectorCheckArrayStatus(CeedVector vec, CeedMemType mem_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this function to the interface level, thinking it would be used by tests for multiprecision vector functionality (and potentially, users may want this feature).
It also has a backend implementation, so it's added to the CeedVector_private
struct. Since this affects the user interface, I wanted to make sure and point it out to see what people thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this, we may also want an interface function for convenient "pretty" printing of status for all vector arrays, or we could implement such a function in the tests directly.
703f404
to
112a04a
Compare
In this branch, I have attempted to start collecting the result of many discussions, particularly with @jeremylt and @jedbrown, about expanding CeedVectors to be able to contain multiple precisions. This PR is meant as a place for further discussion and refinement, whether or not we end up merging from it.
Summary of changes so far (as of opening of the draft PR, April 22):
*Generic
at the backend level, that is, the functions that get added to theCeedVector_private
struct. They now take void pointers instead ofCeedScalar
, and an additional parameter for the data type. (These can hopefully be used within other backend functions in the future where we allow mixed precision.) This has been changed in ref, cuda-ref, and hip-ref.Generic
version withCEED_SCALAR_TYPE
as the type parameter in their implementations ininterface/ceed-vector.c
*FP32
and*FP64
). These also just call theGeneric
version "under the hood"in the hip-ref backend ONLY so far:
Generic
functions, aimed at implementing the previously-discussed multiprecision vector access rules for validityCeedScalarArray
struct is used to store dataIn fd75fdf I added a user-interface-level function for checking the current array status, inspired by the way MFEM returns memory status flags. I was using this in my own local tests. We may or may not want to keep something like this in the future.
Related issue: #778