-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half precision computations #29
Labels
Comments
Some further notes:
A patch may have a few hundred pixels; rounding off to 10% of a pixel
would not be good. But the pixels values themselves are integers and
hence have no round-off. I think we could express the center of the
Gaussian with the nearest integer and then a fractional part; then we
could do the dx,dy math with much less loss of precision.
It seems like the half2 library doesn't provide binary operators,
but rather a ton of functions. So this is not as simple as changing
PixFloat; we have to recode the kernel. And there is a FMA, so
we'd need to be thoughtful about that coding.
It seems that we'll need to store ImageGaussian items as a half2
doublet, so we don't save any space in there. I think we'd compute
the convolution in float and then compress to half2 only when
inserting into the Image Gaussian. That part looks easy to code.
The comparison operators require both halves to be true, which is fine
for superpixel comparisons.
There are some differences in the derivative computation, but I'm not
convinced its particularly scary.
The accumulators could stay in float, but would start the reductions
from __high2float(x)+__low2float(x). Easy.
The pixel indexing in exposure_N and exposure_start would need to
be adjusted, because we're dealing with pixel pairs. Easy to do on
the GPU at the top of the pixel loop.
I'm imagining that we'll have a set of preprocessor macros and definitions
that get redefined between float and half2.
#ifdef HALF
#define MUL(a,b) __half2mul(a,b)
#elsif
#define MUL(a,b) (a)*(b)
#endif
…On Fri, Jun 7, 2019 at 5:57 PM Lehman Garrison ***@***.***> wrote:
CUDA supports efficient computation with half-precision (16-bit) floats.
This is probably enough precision for the pixel data in our problem. We
might do this with CUDA's half2 type, but this would require using SIMD
instrinsics rather than plain math operators.
This will require care in two ways: making sure that our values stay
within 6e-5 to 6e4, and making sure that our operation count is small
enough that the build-up of round-off/truncation error is acceptable. The
signficand precision is 1/2048, or 5e-4.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#29?email_source=notifications&email_token=AAPQRGXQLMIFOZST6LKUSZTPZLKTZA5CNFSM4HV36JA2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYLBOCA>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPQRGUJICWWDHWYAWZYARTPZLKTZANCNFSM4HV36JAQ>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
CUDA supports efficient computation with half-precision (16-bit) floats. This is probably enough precision for the pixel data in our problem. We might do this with CUDA's half2 type, but this would require using SIMD instrinsics rather than plain math operators.
This will require care in two ways: making sure that our values stay within 6e-5 to 6e4, and making sure that our operation count is small enough that the build-up of round-off/truncation error is acceptable. The signficand precision is 1/2048, or 5e-4.
The text was updated successfully, but these errors were encountered: