We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#### Ref ###### Run instruction: @6 = ref::exp(@5) -> fp8e4m3fnuz_type, {2, 5}, {5, 1}, target_id=0 Output: 0.9375, 1, 0.5625, 0.75, 0.8125, 0.5625, 1, 0.6875, 0.9375, 0.5625 Run instruction: @7 = ref::prefix_scan_sum[axis=1,exclusive=0,reverse=0](@6) -> fp8e4m3fnuz_type, {2, 5}, {5, 1}, target_id=0 Output: 0.9375, 2, 2.5, 3.25, 4, 0.5625, 1.5, 2.25, 3.25, 3.75 ##### GPU ###### Run instruction: @5 = gpu::code_object[code_object=10224,symbol_name=reduce_max_sub_exp_convert_kernel,global=128,local=64,](input,@4) -> float_type, {2, 5}, {5, 1}, target_id=0 Output: 0.9375, 1, 0.5625, 0.75, 0.8125, 0.5625, 1, 0.6875, 0.9375, 0.5625 Run instruction: @7 = gpu::prefix_scan_sum[axis=1,exclusive=0,reverse=0](@5,@6) -> float_type, {2, 5}, {5, 1}, target_id=0 Output: 0.9375, 1.9375, 2.5, 3.25, 4.0625, 0.5625, 1.5625, 2.25, 3.1875, 3.75
#2510 adds FP8 test for the multinomial op which is failing because prefix_scan_sum is producing different results for the "ref" and "gpu" target.
prefix_scan_sum
Numbers are close for both target, but Float allows for higher precision and error accumulates through prefix_scan_sum.
The text was updated successfully, but these errors were encountered:
CharlieL7
No branches or pull requests
#2510 adds FP8 test for the multinomial op which is failing because
prefix_scan_sum
is producing different results for the "ref" and "gpu" target.Numbers are close for both target, but Float allows for higher precision and error accumulates through prefix_scan_sum.
The text was updated successfully, but these errors were encountered: