Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multinomial FP8 verification test is failing #2516

Open
umangyadav opened this issue Dec 5, 2023 · 0 comments
Open

Multinomial FP8 verification test is failing #2516

umangyadav opened this issue Dec 5, 2023 · 0 comments
Assignees
Labels
FP8 issues related to FP8 implemenation

Comments

@umangyadav
Copy link
Member

#### Ref ######
Run instruction: @6 = ref::exp(@5) -> fp8e4m3fnuz_type, {2, 5}, {5, 1}, target_id=0
Output: 0.9375, 1, 0.5625, 0.75, 0.8125, 0.5625, 1, 0.6875, 0.9375, 0.5625
Run instruction: @7 = ref::prefix_scan_sum[axis=1,exclusive=0,reverse=0](@6) -> fp8e4m3fnuz_type, {2, 5}, {5, 1}, target_id=0
Output: 0.9375, 2, 2.5, 3.25, 4, 0.5625, 1.5, 2.25, 3.25, 3.75


##### GPU ######
Run instruction: @5 = gpu::code_object[code_object=10224,symbol_name=reduce_max_sub_exp_convert_kernel,global=128,local=64,](input,@4) -> float_type, {2, 5}, {5, 1}, target_id=0
Output: 0.9375, 1, 0.5625, 0.75, 0.8125, 0.5625, 1, 0.6875, 0.9375, 0.5625
Run instruction: @7 = gpu::prefix_scan_sum[axis=1,exclusive=0,reverse=0](@5,@6) -> float_type, {2, 5}, {5, 1}, target_id=0
Output: 0.9375, 1.9375, 2.5, 3.25, 4.0625, 0.5625, 1.5625, 2.25, 3.1875, 3.75

#2510 adds FP8 test for the multinomial op which is failing because prefix_scan_sum is producing different results for the "ref" and "gpu" target.

Numbers are close for both target, but Float allows for higher precision and error accumulates through prefix_scan_sum.

@umangyadav umangyadav added the FP8 issues related to FP8 implemenation label Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FP8 issues related to FP8 implemenation
Projects
None yet
Development

No branches or pull requests

2 participants