Half precision computations #29

lgarrison · 2019-06-07T21:57:16Z

CUDA supports efficient computation with half-precision (16-bit) floats. This is probably enough precision for the pixel data in our problem. We might do this with CUDA's half2 type, but this would require using SIMD instrinsics rather than plain math operators.

This will require care in two ways: making sure that our values stay within 6e-5 to 6e4, and making sure that our operation count is small enough that the build-up of round-off/truncation error is acceptable. The signficand precision is 1/2048, or 5e-4.

deisenstein · 2019-06-08T11:33:26Z

Some further notes: A patch may have a few hundred pixels; rounding off to 10% of a pixel would not be good. But the pixels values themselves are integers and hence have no round-off. I think we could express the center of the Gaussian with the nearest integer and then a fractional part; then we could do the dx,dy math with much less loss of precision. It seems like the half2 library doesn't provide binary operators, but rather a ton of functions. So this is not as simple as changing PixFloat; we have to recode the kernel. And there is a FMA, so we'd need to be thoughtful about that coding. It seems that we'll need to store ImageGaussian items as a half2 doublet, so we don't save any space in there. I think we'd compute the convolution in float and then compress to half2 only when inserting into the Image Gaussian. That part looks easy to code. The comparison operators require both halves to be true, which is fine for superpixel comparisons. There are some differences in the derivative computation, but I'm not convinced its particularly scary. The accumulators could stay in float, but would start the reductions from __high2float(x)+__low2float(x). Easy. The pixel indexing in exposure_N and exposure_start would need to be adjusted, because we're dealing with pixel pairs. Easy to do on the GPU at the top of the pixel loop. I'm imagining that we'll have a set of preprocessor macros and definitions that get redefined between float and half2. #ifdef HALF #define MUL(a,b) __half2mul(a,b) #elsif #define MUL(a,b) (a)*(b) #endif

…

On Fri, Jun 7, 2019 at 5:57 PM Lehman Garrison ***@***.***> wrote: CUDA supports efficient computation with half-precision (16-bit) floats. This is probably enough precision for the pixel data in our problem. We might do this with CUDA's half2 type, but this would require using SIMD instrinsics rather than plain math operators. This will require care in two ways: making sure that our values stay within 6e-5 to 6e4, and making sure that our operation count is small enough that the build-up of round-off/truncation error is acceptable. The signficand precision is 1/2048, or 5e-4. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#29?email_source=notifications&email_token=AAPQRGXQLMIFOZST6LKUSZTPZLKTZA5CNFSM4HV36JA2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYLBOCA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPQRGUJICWWDHWYAWZYARTPZLKTZANCNFSM4HV36JAQ> .

lgarrison added the enhancement label Jun 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half precision computations #29

Half precision computations #29

lgarrison commented Jun 7, 2019

deisenstein commented Jun 8, 2019 via email

Half precision computations #29

Half precision computations #29

Comments

lgarrison commented Jun 7, 2019

deisenstein commented Jun 8, 2019 via email