-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu: generic: sycl: lnorm Intel GPU precision issues #2071
base: main
Are you sure you want to change the base?
Conversation
To give a little bit of additional context. I found that in this line. The division is making v_variance slightly different from the reference that is computed in benchdnn (there is a thr=0 for these cases). Maybe that is happening because of some compiler optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an improvement. However I would just set the threshold to 5e-7 for all the cases, as 0 does not make sense as soon as floating point calculations are involved.
Do you know what is the default setting for |
@kala855, benchdnn is very sensitive to numerical issues by design. So in cases like this it's important to understand where the difference is coming from before considering the threshold change. |
We have been checking carefully these days to see what is happening here. Doing a comparison between the assembly generated by icpx on the OCL and SYCL versions of the implementations we found that:
Any suggestions or feedback will be more than welcome. Thanks. |
Thanks for checking. I would suggest:
|
Description
When used in Intel PVC, the layer normalization SYCL kernel implementation faces some precision issues in variance computation.
[ 14][VAR][14] exp_f32: 2.68618e-05 exp: 2.68618e-05 got: 2.68618e-05 diff:1.81899e-12 rdiff:6.77165e-08 [COMPARE_STATS][VAR]: trh=0 err_max_diff:1.81899e-12 err_max_rdiff:6.77165e-08 all_max_diff:1.81899e-12 all_max_rdiff:6.77165e-08 8471:FAILED (errors:1 total:75) __REPRO: --lnorm --engine=gpu --dt=f32:bf16 --tag=axb --stat_tag=abx --flags=CH 15x3_n"lnorm_ci_0d:0"
The previous are just a couple of failing examples.
As a proposal, the variance threshold is modified to pass the failing tests.