adjust tolerance for xpu in utils #749

yuchengliu1 · 2024-08-12T08:45:17Z

fix some accuracy problems.

add decorator toleranceOverride to opinfo when hooking opdb.
This method can handle the cases with accuracy problems wrapped by ops. (except the difference is nan or inf)
fix test_decomp, test_torch accuracy problem.
align lastest pytorch code of ModuleTest_test
"test_Conv2d_dilated_with_long_tensor_cuda",
"test_Conv2d_groups_thnn_with_long_tensor_cuda",
"test_Conv2d_groups_with_long_tensor_cuda",
"test_Conv2d_no_bias_with_long_tensor_cuda",
"test_Conv2d_padding_with_long_tensor_cuda",
"test_Conv2d_strided_with_long_tensor_cuda",
"test_Conv2d_with_long_tensor_cuda",
"test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda",
"test_Conv3d_groups_with_long_tensor_cuda",
"test_Conv3d_no_bias_with_long_tensor_cuda",
"test_Conv3d_stride_padding_with_long_tensor_cuda",
"test_Conv3d_stride_with_long_tensor_cuda",
"test_Conv3d_with_long_tensor_cuda",
"test_ConvTranspose2d_dilated_with_long_tensor_cuda",
"test_ConvTranspose2d_groups_with_long_tensor_cuda",
"test_ConvTranspose2d_no_bias_with_long_tensor_cuda",
"test_ConvTranspose2d_with_long_tensor_cuda",
add 'nn_AvgPool2d' to cuda xfail list
"test_memory_format_nn_AvgPool2d_xpu_float32",
"test_memory_format_nn_AvgPool2d_xpu_float64",
clean skiplist, remove pass cases with lastest code.
"test_compare_cpu_native_dropout_backward_xpu_bool",
"test_compare_cpu_native_dropout_backward_xpu_int16",
"test_compare_cpu_native_dropout_backward_xpu_int32",
"test_compare_cpu_native_dropout_backward_xpu_int64",
"test_compare_cpu_native_dropout_backward_xpu_int8",
"test_compare_cpu_native_dropout_backward_xpu_uint8",
"test_compare_cpu_nn_functional_avg_pool2d_xpu_int64",
"test_compare_cpu_abs_xpu_bool",
"test_dtypes_nn_functional_linear_xpu",
"test_dtypes_nn_functional_pad_replicate_negative_xpu",
"test_dtypes_nn_functional_pad_replicate_xpu",
"test_dtypes_unique_consecutive_xpu",
"test_SmoothL1Loss_no_batch_dim_mean_cuda_half",
"test_SmoothL1Loss_no_batch_dim_none_cuda_half",
"test_SmoothL1Loss_no_batch_dim_sum_cuda_half",
"test_tensor_ctor_device_inference_xpu",
"test_trace_xpu_float16",
"test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64",
"test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128",
"test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128",
"test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128",
"test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128",
"test_fn_fwgrad_bwgrad_renorm_xpu_complex128",
"test_forward_mode_AD_linalg_vector_norm_xpu_complex128",
"test_forward_mode_AD_masked_normalize_xpu_complex128",
"test_forward_mode_AD_norm_inf_xpu_complex128",
"test_forward_mode_AD_renorm_xpu_complex128",
"test_inplace_forward_mode_AD_renorm_xpu_complex128",
"test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64",
"test_forward_mode_AD_nn_functional_group_norm_xpu_float64",
"test_fn_gradgrad_linalg_det_singular_xpu_float64",
"test_fn_gradgrad_linalg_pinv_singular_xpu_complex128",
"test_fn_grad_masked_normalize_xpu_complex128",
"test_fn_grad_renorm_xpu_complex128",
"test_fn_gradgrad_linalg_vector_norm_xpu_complex128",
"test_fn_gradgrad_masked_normalize_xpu_complex128",
"test_fn_gradgrad_renorm_xpu_complex128",
"test_inplace_grad_renorm_xpu_complex128",
"test_inplace_gradgrad_renorm_xpu_complex128",
"test_fn_grad_nn_functional_max_pool2d_xpu_float64",
"test_multihead_attn_fast_path_small_test_xpu_float64",

daisyden · 2024-08-13T08:48:17Z

test/xpu/test_decomp_xpu.py

+        (torch.float16, torch.ops.aten.mv.default): 1e-5,
+        (torch.bfloat16, torch.ops.aten.mv.default): 1e-5,
+        (torch.float16, torch.ops.aten.log_sigmoid_backward.default): 2e-5,
+        (torch.float16, torch.ops.aten._batch_norm_with_update.default): 2e-7,


please add comments for customized threshold.

daisyden · 2024-08-13T09:09:45Z

@yuchengliu1 please do experiments to check whether the code can work with ops that already have toleranceOverride defined.

toleranceOverride decorator is defined at the beginning of decorators, device="cuda"
toleranceOverride decorator is defined at the end of decorators, device="cuda"
toleranceOverride decorator is defined at the beginning of decorators, for all device
toleranceOverride decorator is defined at the end of decorators, for all device

daisyden · 2024-08-13T09:10:43Z

@yuchengliu1 please help to summary the tolerance issues and discuss with @fengyuan14 on how to fully handle them.

yuchengliu1 · 2024-08-14T09:03:45Z

@yuchengliu1 please do experiments to check whether the code can work with ops that already have toleranceOverride defined.

toleranceOverride decorator is defined at the beginning of decorators, device="cuda"

toleranceOverride decorator is defined at the end of decorators, device="cuda"

toleranceOverride decorator is defined at the beginning of decorators, for all device

toleranceOverride decorator is defined at the end of decorators, for all device

if mutilple toleranceOverride decorators activated on a test case, the lastest decorator will really activate.

…-ops into tolerance_adjust

daisyden

OK for me.

fix some accuracy problems. 1. add decorator `toleranceOverride` to opinfo when hooking opdb. This method can handle the cases with accuracy problems wrapped by `ops`. (except the difference is nan or inf) 2. fix test_decomp, test_torch accuracy problem. 3. align lastest pytorch code of `ModuleTest_test` "test_Conv2d_dilated_with_long_tensor_cuda", "test_Conv2d_groups_thnn_with_long_tensor_cuda", "test_Conv2d_groups_with_long_tensor_cuda", "test_Conv2d_no_bias_with_long_tensor_cuda", "test_Conv2d_padding_with_long_tensor_cuda", "test_Conv2d_strided_with_long_tensor_cuda", "test_Conv2d_with_long_tensor_cuda", "test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda", "test_Conv3d_groups_with_long_tensor_cuda", "test_Conv3d_no_bias_with_long_tensor_cuda", "test_Conv3d_stride_padding_with_long_tensor_cuda", "test_Conv3d_stride_with_long_tensor_cuda", "test_Conv3d_with_long_tensor_cuda", "test_ConvTranspose2d_dilated_with_long_tensor_cuda", "test_ConvTranspose2d_groups_with_long_tensor_cuda", "test_ConvTranspose2d_no_bias_with_long_tensor_cuda", "test_ConvTranspose2d_with_long_tensor_cuda", 5. add 'nn_AvgPool2d' to cuda xfail list "test_memory_format_nn_AvgPool2d_xpu_float32", "test_memory_format_nn_AvgPool2d_xpu_float64", 7. clean skiplist, remove pass cases with lastest code. "test_compare_cpu_native_dropout_backward_xpu_bool", "test_compare_cpu_native_dropout_backward_xpu_int16", "test_compare_cpu_native_dropout_backward_xpu_int32", "test_compare_cpu_native_dropout_backward_xpu_int64", "test_compare_cpu_native_dropout_backward_xpu_int8", "test_compare_cpu_native_dropout_backward_xpu_uint8", "test_compare_cpu_nn_functional_avg_pool2d_xpu_int64", "test_compare_cpu_abs_xpu_bool", "test_dtypes_nn_functional_linear_xpu", "test_dtypes_nn_functional_pad_replicate_negative_xpu", "test_dtypes_nn_functional_pad_replicate_xpu", "test_dtypes_unique_consecutive_xpu", "test_SmoothL1Loss_no_batch_dim_mean_cuda_half", "test_SmoothL1Loss_no_batch_dim_none_cuda_half", "test_SmoothL1Loss_no_batch_dim_sum_cuda_half", "test_tensor_ctor_device_inference_xpu", "test_trace_xpu_float16", "test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64", "test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128", "test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128", "test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128", "test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128", "test_fn_fwgrad_bwgrad_renorm_xpu_complex128", "test_forward_mode_AD_linalg_vector_norm_xpu_complex128", "test_forward_mode_AD_masked_normalize_xpu_complex128", "test_forward_mode_AD_norm_inf_xpu_complex128", "test_forward_mode_AD_renorm_xpu_complex128", "test_inplace_forward_mode_AD_renorm_xpu_complex128", "test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64", "test_forward_mode_AD_nn_functional_group_norm_xpu_float64", "test_fn_gradgrad_linalg_det_singular_xpu_float64", "test_fn_gradgrad_linalg_pinv_singular_xpu_complex128", "test_fn_grad_masked_normalize_xpu_complex128", "test_fn_grad_renorm_xpu_complex128", "test_fn_gradgrad_linalg_vector_norm_xpu_complex128", "test_fn_gradgrad_masked_normalize_xpu_complex128", "test_fn_gradgrad_renorm_xpu_complex128", "test_inplace_grad_renorm_xpu_complex128", "test_inplace_gradgrad_renorm_xpu_complex128", "test_fn_grad_nn_functional_max_pool2d_xpu_float64", "test_multihead_attn_fast_path_small_test_xpu_float64", --------- Co-authored-by: Feng Yuan <[email protected]>

fix some accuracy problems. 1. add decorator `toleranceOverride` to opinfo when hooking opdb. This method can handle the cases with accuracy problems wrapped by `ops`. (except the difference is nan or inf) 2. fix test_decomp, test_torch accuracy problem. 3. align lastest pytorch code of `ModuleTest_test` "test_Conv2d_dilated_with_long_tensor_cuda", "test_Conv2d_groups_thnn_with_long_tensor_cuda", "test_Conv2d_groups_with_long_tensor_cuda", "test_Conv2d_no_bias_with_long_tensor_cuda", "test_Conv2d_padding_with_long_tensor_cuda", "test_Conv2d_strided_with_long_tensor_cuda", "test_Conv2d_with_long_tensor_cuda", "test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda", "test_Conv3d_groups_with_long_tensor_cuda", "test_Conv3d_no_bias_with_long_tensor_cuda", "test_Conv3d_stride_padding_with_long_tensor_cuda", "test_Conv3d_stride_with_long_tensor_cuda", "test_Conv3d_with_long_tensor_cuda", "test_ConvTranspose2d_dilated_with_long_tensor_cuda", "test_ConvTranspose2d_groups_with_long_tensor_cuda", "test_ConvTranspose2d_no_bias_with_long_tensor_cuda", "test_ConvTranspose2d_with_long_tensor_cuda", 5. add 'nn_AvgPool2d' to cuda xfail list "test_memory_format_nn_AvgPool2d_xpu_float32", "test_memory_format_nn_AvgPool2d_xpu_float64", 7. clean skiplist, remove pass cases with lastest code. "test_compare_cpu_native_dropout_backward_xpu_bool", "test_compare_cpu_native_dropout_backward_xpu_int16", "test_compare_cpu_native_dropout_backward_xpu_int32", "test_compare_cpu_native_dropout_backward_xpu_int64", "test_compare_cpu_native_dropout_backward_xpu_int8", "test_compare_cpu_native_dropout_backward_xpu_uint8", "test_compare_cpu_nn_functional_avg_pool2d_xpu_int64", "test_compare_cpu_abs_xpu_bool", "test_dtypes_nn_functional_linear_xpu", "test_dtypes_nn_functional_pad_replicate_negative_xpu", "test_dtypes_nn_functional_pad_replicate_xpu", "test_dtypes_unique_consecutive_xpu", "test_SmoothL1Loss_no_batch_dim_mean_cuda_half", "test_SmoothL1Loss_no_batch_dim_none_cuda_half", "test_SmoothL1Loss_no_batch_dim_sum_cuda_half", "test_tensor_ctor_device_inference_xpu", "test_trace_xpu_float16", "test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64", "test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128", "test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128", "test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128", "test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128", "test_fn_fwgrad_bwgrad_renorm_xpu_complex128", "test_forward_mode_AD_linalg_vector_norm_xpu_complex128", "test_forward_mode_AD_masked_normalize_xpu_complex128", "test_forward_mode_AD_norm_inf_xpu_complex128", "test_forward_mode_AD_renorm_xpu_complex128", "test_inplace_forward_mode_AD_renorm_xpu_complex128", "test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64", "test_forward_mode_AD_nn_functional_group_norm_xpu_float64", "test_fn_gradgrad_linalg_det_singular_xpu_float64", "test_fn_gradgrad_linalg_pinv_singular_xpu_complex128", "test_fn_grad_masked_normalize_xpu_complex128", "test_fn_grad_renorm_xpu_complex128", "test_fn_gradgrad_linalg_vector_norm_xpu_complex128", "test_fn_gradgrad_masked_normalize_xpu_complex128", "test_fn_gradgrad_renorm_xpu_complex128", "test_inplace_grad_renorm_xpu_complex128", "test_inplace_gradgrad_renorm_xpu_complex128", "test_fn_grad_nn_functional_max_pool2d_xpu_float64", "test_multihead_attn_fast_path_small_test_xpu_float64", ---------

fix some accuracy problems. 1. add decorator `toleranceOverride` to opinfo when hooking opdb. This method can handle the cases with accuracy problems wrapped by `ops`. (except the difference is nan or inf) 2. fix test_decomp, test_torch accuracy problem. 3. align lastest pytorch code of `ModuleTest_test` "test_Conv2d_dilated_with_long_tensor_cuda", "test_Conv2d_groups_thnn_with_long_tensor_cuda", "test_Conv2d_groups_with_long_tensor_cuda", "test_Conv2d_no_bias_with_long_tensor_cuda", "test_Conv2d_padding_with_long_tensor_cuda", "test_Conv2d_strided_with_long_tensor_cuda", "test_Conv2d_with_long_tensor_cuda", "test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda", "test_Conv3d_groups_with_long_tensor_cuda", "test_Conv3d_no_bias_with_long_tensor_cuda", "test_Conv3d_stride_padding_with_long_tensor_cuda", "test_Conv3d_stride_with_long_tensor_cuda", "test_Conv3d_with_long_tensor_cuda", "test_ConvTranspose2d_dilated_with_long_tensor_cuda", "test_ConvTranspose2d_groups_with_long_tensor_cuda", "test_ConvTranspose2d_no_bias_with_long_tensor_cuda", "test_ConvTranspose2d_with_long_tensor_cuda", 5. add 'nn_AvgPool2d' to cuda xfail list "test_memory_format_nn_AvgPool2d_xpu_float32", "test_memory_format_nn_AvgPool2d_xpu_float64", 7. clean skiplist, remove pass cases with lastest code. "test_compare_cpu_native_dropout_backward_xpu_bool", "test_compare_cpu_native_dropout_backward_xpu_int16", "test_compare_cpu_native_dropout_backward_xpu_int32", "test_compare_cpu_native_dropout_backward_xpu_int64", "test_compare_cpu_native_dropout_backward_xpu_int8", "test_compare_cpu_native_dropout_backward_xpu_uint8", "test_compare_cpu_nn_functional_avg_pool2d_xpu_int64", "test_compare_cpu_abs_xpu_bool", "test_dtypes_nn_functional_linear_xpu", "test_dtypes_nn_functional_pad_replicate_negative_xpu", "test_dtypes_nn_functional_pad_replicate_xpu", "test_dtypes_unique_consecutive_xpu", "test_SmoothL1Loss_no_batch_dim_mean_cuda_half", "test_SmoothL1Loss_no_batch_dim_none_cuda_half", "test_SmoothL1Loss_no_batch_dim_sum_cuda_half", "test_tensor_ctor_device_inference_xpu", "test_trace_xpu_float16", "test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64", "test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128", "test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128", "test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128", "test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128", "test_fn_fwgrad_bwgrad_renorm_xpu_complex128", "test_forward_mode_AD_linalg_vector_norm_xpu_complex128", "test_forward_mode_AD_masked_normalize_xpu_complex128", "test_forward_mode_AD_norm_inf_xpu_complex128", "test_forward_mode_AD_renorm_xpu_complex128", "test_inplace_forward_mode_AD_renorm_xpu_complex128", "test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64", "test_forward_mode_AD_nn_functional_group_norm_xpu_float64", "test_fn_gradgrad_linalg_det_singular_xpu_float64", "test_fn_gradgrad_linalg_pinv_singular_xpu_complex128", "test_fn_grad_masked_normalize_xpu_complex128", "test_fn_grad_renorm_xpu_complex128", "test_fn_gradgrad_linalg_vector_norm_xpu_complex128", "test_fn_gradgrad_masked_normalize_xpu_complex128", "test_fn_gradgrad_renorm_xpu_complex128", "test_inplace_grad_renorm_xpu_complex128", "test_inplace_gradgrad_renorm_xpu_complex128", "test_fn_grad_nn_functional_max_pool2d_xpu_float64", "test_multihead_attn_fast_path_small_test_xpu_float64", --------- Co-authored-by: Feng Yuan <[email protected]>

yuchengliu1 added 4 commits August 12, 2024 16:44

adjust tolerance for xpu

a321dee

update

3e00b7c

Merge branch 'main' into tolerance_adjust

62613f8

Merge branch 'main' into tolerance_adjust

a23732d

daisyden reviewed Aug 13, 2024

View reviewed changes

yuchengliu1 added 2 commits August 14, 2024 13:20

update skip_list

7f06d6a

Merge branch 'main' into tolerance_adjust

e9a1a15

yuchengliu1 marked this pull request as ready for review August 14, 2024 05:34

yuchengliu1 added 4 commits August 14, 2024 15:50

update

97ec9c5

update

7051142

update

16176cd

Merge branch 'main' into tolerance_adjust

1a0d85a

update skiplist according ci

53fac08

daisyden modified the milestones: PT2.6, PT2.5 Aug 15, 2024

yuchengliu1 and others added 10 commits August 16, 2024 03:17

Merge branch 'main' into tolerance_adjust

835326a

update extented skiplist

cdc3f6a

Merge branch 'main' into tolerance_adjust

11dbedb

enhance extended hook

593c98d

update skiplist

e81c7f5

update skiplist

cf3ec92

update skiplist

5cd5aa3

Merge branch 'main' into tolerance_adjust

cd1540c

update

5cea5a3

Merge branch 'tolerance_adjust' of https://github.com/intel/torch-xpu…

2e3686d

…-ops into tolerance_adjust

daisyden approved these changes Aug 21, 2024

View reviewed changes

yuchengliu1 added 2 commits August 21, 2024 13:19

Update skip_list_common.py

279061d

Merge branch 'main' into tolerance_adjust

9c57b13

This was referenced Aug 21, 2024

new ut failures introduced by new pytorch #746

Closed

UT failure in 0731 nightly #669

Closed

Case regression due to PyTorch uplift #614

Closed

yuchengliu1 added this pull request to the merge queue Aug 21, 2024

Merged via the queue into main with commit e040874 Aug 21, 2024
3 checks passed

yuchengliu1 deleted the tolerance_adjust branch August 21, 2024 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adjust tolerance for xpu in utils #749

adjust tolerance for xpu in utils #749

Uh oh!

yuchengliu1 commented Aug 12, 2024 •

edited

Loading

Uh oh!

daisyden Aug 13, 2024

Uh oh!

yuchengliu1 Aug 14, 2024

Uh oh!

daisyden commented Aug 13, 2024

Uh oh!

daisyden commented Aug 13, 2024 •

edited

Loading

Uh oh!

yuchengliu1 commented Aug 14, 2024

Uh oh!

daisyden left a comment

Uh oh!

Uh oh!

Uh oh!

adjust tolerance for xpu in utils #749

adjust tolerance for xpu in utils #749

Uh oh!

Conversation

yuchengliu1 commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daisyden Aug 13, 2024

Choose a reason for hiding this comment

Uh oh!

yuchengliu1 Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

daisyden commented Aug 13, 2024

Uh oh!

daisyden commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuchengliu1 commented Aug 14, 2024

Uh oh!

daisyden left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yuchengliu1 commented Aug 12, 2024 •

edited

Loading

daisyden commented Aug 13, 2024 •

edited

Loading