-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast Function Approximations lowering. #8566
base: main
Are you sure you want to change the base?
Conversation
@@ -38,6 +38,7 @@ extern "C" WEAK void *halide_opencl_get_symbol(void *user_context, const char *n | |||
"opencl.dll", | |||
#else | |||
"libOpenCL.so", | |||
"libOpenCL.so.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's the case that libOpenCL.so.1
should rather replace libOpenCL.so
? The latter is a namelink that's only present when -dev
packages are installed. It should always point to the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same. Can fix it later, but I needed this on my local machine, so without being too destructive without consensus, I did this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it. We'll see what the build bots do.
efa7ddc
to
bea8612
Compare
…nge (-1, 1) to test (-4, 4). Cleanup code/comments. Test performance for all approximations.
…optimization. Greatly improved accuracy testing framework.
…n test by precomputing arguments buffer.
… support for fast_tanh on all backends.
…st not touching input: prevents constant folding.
bea8612
to
0de4dbc
Compare
The big transcendental lower update! Replaces #8388.
TODO
I still have to do:
Overview:
Fast transcendentals implemented for: sin, cos, tan, atan, exp, log, tanh.
Simple API to specify precision requirements. Default-initialised precision (
AUTO
without contraints) means "don't care about precision, as long as it's reasonable and fast", which gives you the highest chance of selecting a high-performance implementation based on hardware instructions. Optimization objectivesMULPE
(max ULP error), andMAE
(max absolute error) are available. Compared to previous PR, I removedMULPE_MAE
as I didn't see a good purpose for it.Tabular info on intrinsics and native functions their precision and speed, to select an appropriate choice for lowering to something that is definitely not slower, while satisfying the precision requirements.
native_cos
,native_exp
, etc...fast::cos
,fast::exp
, etc...Performance tests validating that:
Accuracy tests validating that:
Drive-by fix for adding
libOpenCL.so.1
to the list of tested sonames for the OpenCL runtime.Review guide
Call::make_struct
node with 4 parameters (see API below). This approximation precision Call node survives until lowering pass where the transcendentals are lowered. In this pass, they are extracted again from this Call node's arguments. I conceptually like that this way, they are bundled and clearly not at the same level as the actual mathematical arguments. Is this a good approach? In order for this to work, I had to stopCSE
from extracting those precision arguments, andStrictfyFloat
from recursing down into that struct and litterstrict_float
on those numbers. I have seen theCall::bundle
intrinsic. Perhaps this one is better for that purpose? @abadamsFloat(16)
andFloat(64)
, but those are not yet implemented/tested. The polynomial approximations should work correctly (although untested) for these other data-types.native_tan()
compiles to the same three instructions as I implemented on CUDA:sin.approx.f32
,cos.approx.f32
,div.approx.f32
. I haven't investigated AMD's documentation on available hardware instructions.API