Ref implementation of FP8 #2438

umangyadav · 2023-11-10T23:12:45Z

handles all 4 Fp8 dtypes listed here : https://onnx.ai/onnx/technical/float8.html
Follows saturation/clipping logic from table there as well : https://onnx.ai/onnx/technical/float8.html#cast

Only adding fp8e4m3fnuz in MIGraphX IR for now.
Other types can be added later if necessary.

src/include/migraphx/migraphx_f8_impl.hpp

src/include/migraphx/migraphx_float8.hpp

src/include/migraphx/migraphx_f8_impl.hpp

src/include/migraphx/migraphx_float8.hpp

migraphx-bot · 2023-11-11T15:02:51Z

Test	Batch	Rate new 9e6d86	Rate old 0039b1	Diff	Compare
torchvision-resnet50	64	2,828.05	2,830.56	-0.09%	✅
torchvision-resnet50_fp16	64	6,489.11	6,495.66	-0.10%	✅
torchvision-densenet121	32	2,098.00	2,094.28	0.18%	✅
torchvision-densenet121_fp16	32	3,655.13	3,665.07	-0.27%	✅
torchvision-inceptionv3	32	1,586.85	1,582.01	0.31%	✅
torchvision-inceptionv3_fp16	32	2,567.11	2,575.16	-0.31%	✅
cadene-inceptionv4	16	702.98	703.19	-0.03%	✅
cadene-resnext64x4	16	691.96	690.81	0.17%	✅
slim-mobilenet	64	8,331.53	8,337.30	-0.07%	✅
slim-nasnetalarge	64	225.48	225.51	-0.01%	✅
slim-resnet50v2	64	2,666.74	2,664.32	0.09%	✅
bert-mrpc-onnx	8	822.50	822.87	-0.04%	✅
bert-mrpc-tf	1	390.59	387.67	0.75%	✅
pytorch-examples-wlang-gru	1	302.34	301.83	0.17%	✅
pytorch-examples-wlang-lstm	1	315.69	313.98	0.54%	✅
torchvision-resnet50_1	1	596.58	596.93	-0.06%	✅
torchvision-inceptionv3_1	1	343.60	343.95	-0.10%	✅
cadene-dpn92_1	1	401.76	397.79	1.00%	✅
cadene-resnext101_1	1	329.27	329.44	-0.05%	✅
slim-vgg16_1	1	458.71	459.55	-0.18%	✅
slim-mobilenet_1	1	2,120.20	2,119.65	0.03%	✅
slim-inceptionv4_1	1	219.65	219.88	-0.11%	✅
onnx-taau-downsample	1	304.99	304.20	0.26%	✅
dlrm-criteoterabyte	1	21.59	21.62	-0.14%	✅
dlrm-criteoterabyte_fp16	1	40.66	40.62	0.09%	✅
agentmodel	1	nan	nan	nan%	❌
unet_fp16	2	54.71	54.71	-0.00%	✅
resnet50v1_fp16	1	940.70	953.28	-1.32%	✅
bert_base_cased_fp16	64	902.93	903.12	-0.02%	✅
bert_large_uncased_fp16	32	285.60	285.62	-0.01%	✅
bert_large_fp16	1	166.58	166.55	0.02%	✅
distilgpt2_fp16	16	1,279.83	1,280.79	-0.08%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2023-11-11T15:02:53Z

:white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

:white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

:white_check_mark:torchvision-inceptionv3_1: PASSED: MIGraphX meets tolerance

:white_check_mark:cadene-dpn92_1: PASSED: MIGraphX meets tolerance

:white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-inceptionv4_1: PASSED: MIGraphX meets tolerance

:white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

❌agentmodel: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 336, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 254, in main
pred_migx = np.array(model.run(params)[-1])
RuntimeError: /src/AMDMIGraphX/src/targets/gpu/device/include/migraphx/gpu/device/visit.hpp:140: hip_visit_views_impl: Ranks must be the same

:white_check_mark:unet: PASSED: MIGraphX meets tolerance

:white_check_mark:resnet50v1: PASSED: MIGraphX meets tolerance

🔴bert_base_cased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark:bert_large: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

src/include/migraphx/float8.hpp

src/include/migraphx/bit_cast.hpp

TedThemistokleous

looks good and solid + helpful comments in the code @umangyadav

CharlieL7 · 2023-11-15T20:00:30Z

src/py/migraphx_py.cpp

+    static std::string format()
+    {
+        // following: https://docs.python.org/3/library/struct.html#format-characters
+        return "z";


I don't see "z" in the commented link?

Yes that is not correct. I am not sure what should be correct format. Between "B", "b", or "c". I'll have to check.

I've opened an issue: This thing needs to be tested out to see if numpy buffers are created correctly.
#2447

test/fp8e4m3fn.cpp

test/fp8e5m2.cpp

src/include/migraphx/float8_impl.hpp

… into ref_fp8

umangyadav added 9 commits November 9, 2023 23:23

changes for the FP8 ref implementation

df7f8a3

cppcheck fixes

9bc1828

move FNUZ as template parameter

155a2b1

Fix numeric limits

d9f11e3

Working FNUZ and FN

4e9d51f

use float equal

7639c28

add test for fp8e5m2

a6372c5

add test for fp8e5m2fnuz

439ea40

refactor add some comments

183db78

umangyadav requested review from CharlieL7, pfultz2 and shivadbhavsar November 10, 2023 23:12