-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adjust tolerance for xpu in utils #749
Conversation
test/xpu/test_decomp_xpu.py
Outdated
(torch.float16, torch.ops.aten.mv.default): 1e-5, | ||
(torch.bfloat16, torch.ops.aten.mv.default): 1e-5, | ||
(torch.float16, torch.ops.aten.log_sigmoid_backward.default): 2e-5, | ||
(torch.float16, torch.ops.aten._batch_norm_with_update.default): 2e-7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comments for customized threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@yuchengliu1 please do experiments to check whether the code can work with ops that already have toleranceOverride defined.
|
@yuchengliu1 please help to summary the tolerance issues and discuss with @fengyuan14 on how to fully handle them. |
if mutilple toleranceOverride decorators activated on a test case, the lastest decorator will really activate. |
…-ops into tolerance_adjust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me.
fix some accuracy problems. 1. add decorator `toleranceOverride` to opinfo when hooking opdb. This method can handle the cases with accuracy problems wrapped by `ops`. (except the difference is nan or inf) 2. fix test_decomp, test_torch accuracy problem. 3. align lastest pytorch code of `ModuleTest_test` "test_Conv2d_dilated_with_long_tensor_cuda", "test_Conv2d_groups_thnn_with_long_tensor_cuda", "test_Conv2d_groups_with_long_tensor_cuda", "test_Conv2d_no_bias_with_long_tensor_cuda", "test_Conv2d_padding_with_long_tensor_cuda", "test_Conv2d_strided_with_long_tensor_cuda", "test_Conv2d_with_long_tensor_cuda", "test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda", "test_Conv3d_groups_with_long_tensor_cuda", "test_Conv3d_no_bias_with_long_tensor_cuda", "test_Conv3d_stride_padding_with_long_tensor_cuda", "test_Conv3d_stride_with_long_tensor_cuda", "test_Conv3d_with_long_tensor_cuda", "test_ConvTranspose2d_dilated_with_long_tensor_cuda", "test_ConvTranspose2d_groups_with_long_tensor_cuda", "test_ConvTranspose2d_no_bias_with_long_tensor_cuda", "test_ConvTranspose2d_with_long_tensor_cuda", 5. add 'nn_AvgPool2d' to cuda xfail list "test_memory_format_nn_AvgPool2d_xpu_float32", "test_memory_format_nn_AvgPool2d_xpu_float64", 7. clean skiplist, remove pass cases with lastest code. "test_compare_cpu_native_dropout_backward_xpu_bool", "test_compare_cpu_native_dropout_backward_xpu_int16", "test_compare_cpu_native_dropout_backward_xpu_int32", "test_compare_cpu_native_dropout_backward_xpu_int64", "test_compare_cpu_native_dropout_backward_xpu_int8", "test_compare_cpu_native_dropout_backward_xpu_uint8", "test_compare_cpu_nn_functional_avg_pool2d_xpu_int64", "test_compare_cpu_abs_xpu_bool", "test_dtypes_nn_functional_linear_xpu", "test_dtypes_nn_functional_pad_replicate_negative_xpu", "test_dtypes_nn_functional_pad_replicate_xpu", "test_dtypes_unique_consecutive_xpu", "test_SmoothL1Loss_no_batch_dim_mean_cuda_half", "test_SmoothL1Loss_no_batch_dim_none_cuda_half", "test_SmoothL1Loss_no_batch_dim_sum_cuda_half", "test_tensor_ctor_device_inference_xpu", "test_trace_xpu_float16", "test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64", "test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128", "test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128", "test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128", "test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128", "test_fn_fwgrad_bwgrad_renorm_xpu_complex128", "test_forward_mode_AD_linalg_vector_norm_xpu_complex128", "test_forward_mode_AD_masked_normalize_xpu_complex128", "test_forward_mode_AD_norm_inf_xpu_complex128", "test_forward_mode_AD_renorm_xpu_complex128", "test_inplace_forward_mode_AD_renorm_xpu_complex128", "test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64", "test_forward_mode_AD_nn_functional_group_norm_xpu_float64", "test_fn_gradgrad_linalg_det_singular_xpu_float64", "test_fn_gradgrad_linalg_pinv_singular_xpu_complex128", "test_fn_grad_masked_normalize_xpu_complex128", "test_fn_grad_renorm_xpu_complex128", "test_fn_gradgrad_linalg_vector_norm_xpu_complex128", "test_fn_gradgrad_masked_normalize_xpu_complex128", "test_fn_gradgrad_renorm_xpu_complex128", "test_inplace_grad_renorm_xpu_complex128", "test_inplace_gradgrad_renorm_xpu_complex128", "test_fn_grad_nn_functional_max_pool2d_xpu_float64", "test_multihead_attn_fast_path_small_test_xpu_float64", --------- Co-authored-by: Feng Yuan <[email protected]>
fix some accuracy problems. 1. add decorator `toleranceOverride` to opinfo when hooking opdb. This method can handle the cases with accuracy problems wrapped by `ops`. (except the difference is nan or inf) 2. fix test_decomp, test_torch accuracy problem. 3. align lastest pytorch code of `ModuleTest_test` "test_Conv2d_dilated_with_long_tensor_cuda", "test_Conv2d_groups_thnn_with_long_tensor_cuda", "test_Conv2d_groups_with_long_tensor_cuda", "test_Conv2d_no_bias_with_long_tensor_cuda", "test_Conv2d_padding_with_long_tensor_cuda", "test_Conv2d_strided_with_long_tensor_cuda", "test_Conv2d_with_long_tensor_cuda", "test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda", "test_Conv3d_groups_with_long_tensor_cuda", "test_Conv3d_no_bias_with_long_tensor_cuda", "test_Conv3d_stride_padding_with_long_tensor_cuda", "test_Conv3d_stride_with_long_tensor_cuda", "test_Conv3d_with_long_tensor_cuda", "test_ConvTranspose2d_dilated_with_long_tensor_cuda", "test_ConvTranspose2d_groups_with_long_tensor_cuda", "test_ConvTranspose2d_no_bias_with_long_tensor_cuda", "test_ConvTranspose2d_with_long_tensor_cuda", 5. add 'nn_AvgPool2d' to cuda xfail list "test_memory_format_nn_AvgPool2d_xpu_float32", "test_memory_format_nn_AvgPool2d_xpu_float64", 7. clean skiplist, remove pass cases with lastest code. "test_compare_cpu_native_dropout_backward_xpu_bool", "test_compare_cpu_native_dropout_backward_xpu_int16", "test_compare_cpu_native_dropout_backward_xpu_int32", "test_compare_cpu_native_dropout_backward_xpu_int64", "test_compare_cpu_native_dropout_backward_xpu_int8", "test_compare_cpu_native_dropout_backward_xpu_uint8", "test_compare_cpu_nn_functional_avg_pool2d_xpu_int64", "test_compare_cpu_abs_xpu_bool", "test_dtypes_nn_functional_linear_xpu", "test_dtypes_nn_functional_pad_replicate_negative_xpu", "test_dtypes_nn_functional_pad_replicate_xpu", "test_dtypes_unique_consecutive_xpu", "test_SmoothL1Loss_no_batch_dim_mean_cuda_half", "test_SmoothL1Loss_no_batch_dim_none_cuda_half", "test_SmoothL1Loss_no_batch_dim_sum_cuda_half", "test_tensor_ctor_device_inference_xpu", "test_trace_xpu_float16", "test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64", "test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128", "test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128", "test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128", "test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128", "test_fn_fwgrad_bwgrad_renorm_xpu_complex128", "test_forward_mode_AD_linalg_vector_norm_xpu_complex128", "test_forward_mode_AD_masked_normalize_xpu_complex128", "test_forward_mode_AD_norm_inf_xpu_complex128", "test_forward_mode_AD_renorm_xpu_complex128", "test_inplace_forward_mode_AD_renorm_xpu_complex128", "test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64", "test_forward_mode_AD_nn_functional_group_norm_xpu_float64", "test_fn_gradgrad_linalg_det_singular_xpu_float64", "test_fn_gradgrad_linalg_pinv_singular_xpu_complex128", "test_fn_grad_masked_normalize_xpu_complex128", "test_fn_grad_renorm_xpu_complex128", "test_fn_gradgrad_linalg_vector_norm_xpu_complex128", "test_fn_gradgrad_masked_normalize_xpu_complex128", "test_fn_gradgrad_renorm_xpu_complex128", "test_inplace_grad_renorm_xpu_complex128", "test_inplace_gradgrad_renorm_xpu_complex128", "test_fn_grad_nn_functional_max_pool2d_xpu_float64", "test_multihead_attn_fast_path_small_test_xpu_float64", ---------
fix some accuracy problems. 1. add decorator `toleranceOverride` to opinfo when hooking opdb. This method can handle the cases with accuracy problems wrapped by `ops`. (except the difference is nan or inf) 2. fix test_decomp, test_torch accuracy problem. 3. align lastest pytorch code of `ModuleTest_test` "test_Conv2d_dilated_with_long_tensor_cuda", "test_Conv2d_groups_thnn_with_long_tensor_cuda", "test_Conv2d_groups_with_long_tensor_cuda", "test_Conv2d_no_bias_with_long_tensor_cuda", "test_Conv2d_padding_with_long_tensor_cuda", "test_Conv2d_strided_with_long_tensor_cuda", "test_Conv2d_with_long_tensor_cuda", "test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda", "test_Conv3d_groups_with_long_tensor_cuda", "test_Conv3d_no_bias_with_long_tensor_cuda", "test_Conv3d_stride_padding_with_long_tensor_cuda", "test_Conv3d_stride_with_long_tensor_cuda", "test_Conv3d_with_long_tensor_cuda", "test_ConvTranspose2d_dilated_with_long_tensor_cuda", "test_ConvTranspose2d_groups_with_long_tensor_cuda", "test_ConvTranspose2d_no_bias_with_long_tensor_cuda", "test_ConvTranspose2d_with_long_tensor_cuda", 5. add 'nn_AvgPool2d' to cuda xfail list "test_memory_format_nn_AvgPool2d_xpu_float32", "test_memory_format_nn_AvgPool2d_xpu_float64", 7. clean skiplist, remove pass cases with lastest code. "test_compare_cpu_native_dropout_backward_xpu_bool", "test_compare_cpu_native_dropout_backward_xpu_int16", "test_compare_cpu_native_dropout_backward_xpu_int32", "test_compare_cpu_native_dropout_backward_xpu_int64", "test_compare_cpu_native_dropout_backward_xpu_int8", "test_compare_cpu_native_dropout_backward_xpu_uint8", "test_compare_cpu_nn_functional_avg_pool2d_xpu_int64", "test_compare_cpu_abs_xpu_bool", "test_dtypes_nn_functional_linear_xpu", "test_dtypes_nn_functional_pad_replicate_negative_xpu", "test_dtypes_nn_functional_pad_replicate_xpu", "test_dtypes_unique_consecutive_xpu", "test_SmoothL1Loss_no_batch_dim_mean_cuda_half", "test_SmoothL1Loss_no_batch_dim_none_cuda_half", "test_SmoothL1Loss_no_batch_dim_sum_cuda_half", "test_tensor_ctor_device_inference_xpu", "test_trace_xpu_float16", "test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64", "test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128", "test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128", "test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128", "test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128", "test_fn_fwgrad_bwgrad_renorm_xpu_complex128", "test_forward_mode_AD_linalg_vector_norm_xpu_complex128", "test_forward_mode_AD_masked_normalize_xpu_complex128", "test_forward_mode_AD_norm_inf_xpu_complex128", "test_forward_mode_AD_renorm_xpu_complex128", "test_inplace_forward_mode_AD_renorm_xpu_complex128", "test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64", "test_forward_mode_AD_nn_functional_group_norm_xpu_float64", "test_fn_gradgrad_linalg_det_singular_xpu_float64", "test_fn_gradgrad_linalg_pinv_singular_xpu_complex128", "test_fn_grad_masked_normalize_xpu_complex128", "test_fn_grad_renorm_xpu_complex128", "test_fn_gradgrad_linalg_vector_norm_xpu_complex128", "test_fn_gradgrad_masked_normalize_xpu_complex128", "test_fn_gradgrad_renorm_xpu_complex128", "test_inplace_grad_renorm_xpu_complex128", "test_inplace_gradgrad_renorm_xpu_complex128", "test_fn_grad_nn_functional_max_pool2d_xpu_float64", "test_multihead_attn_fast_path_small_test_xpu_float64", --------- Co-authored-by: Feng Yuan <[email protected]>
fix some accuracy problems. 1. add decorator `toleranceOverride` to opinfo when hooking opdb. This method can handle the cases with accuracy problems wrapped by `ops`. (except the difference is nan or inf) 2. fix test_decomp, test_torch accuracy problem. 3. align lastest pytorch code of `ModuleTest_test` "test_Conv2d_dilated_with_long_tensor_cuda", "test_Conv2d_groups_thnn_with_long_tensor_cuda", "test_Conv2d_groups_with_long_tensor_cuda", "test_Conv2d_no_bias_with_long_tensor_cuda", "test_Conv2d_padding_with_long_tensor_cuda", "test_Conv2d_strided_with_long_tensor_cuda", "test_Conv2d_with_long_tensor_cuda", "test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda", "test_Conv3d_groups_with_long_tensor_cuda", "test_Conv3d_no_bias_with_long_tensor_cuda", "test_Conv3d_stride_padding_with_long_tensor_cuda", "test_Conv3d_stride_with_long_tensor_cuda", "test_Conv3d_with_long_tensor_cuda", "test_ConvTranspose2d_dilated_with_long_tensor_cuda", "test_ConvTranspose2d_groups_with_long_tensor_cuda", "test_ConvTranspose2d_no_bias_with_long_tensor_cuda", "test_ConvTranspose2d_with_long_tensor_cuda", 5. add 'nn_AvgPool2d' to cuda xfail list "test_memory_format_nn_AvgPool2d_xpu_float32", "test_memory_format_nn_AvgPool2d_xpu_float64", 7. clean skiplist, remove pass cases with lastest code. "test_compare_cpu_native_dropout_backward_xpu_bool", "test_compare_cpu_native_dropout_backward_xpu_int16", "test_compare_cpu_native_dropout_backward_xpu_int32", "test_compare_cpu_native_dropout_backward_xpu_int64", "test_compare_cpu_native_dropout_backward_xpu_int8", "test_compare_cpu_native_dropout_backward_xpu_uint8", "test_compare_cpu_nn_functional_avg_pool2d_xpu_int64", "test_compare_cpu_abs_xpu_bool", "test_dtypes_nn_functional_linear_xpu", "test_dtypes_nn_functional_pad_replicate_negative_xpu", "test_dtypes_nn_functional_pad_replicate_xpu", "test_dtypes_unique_consecutive_xpu", "test_SmoothL1Loss_no_batch_dim_mean_cuda_half", "test_SmoothL1Loss_no_batch_dim_none_cuda_half", "test_SmoothL1Loss_no_batch_dim_sum_cuda_half", "test_tensor_ctor_device_inference_xpu", "test_trace_xpu_float16", "test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64", "test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128", "test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128", "test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128", "test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128", "test_fn_fwgrad_bwgrad_renorm_xpu_complex128", "test_forward_mode_AD_linalg_vector_norm_xpu_complex128", "test_forward_mode_AD_masked_normalize_xpu_complex128", "test_forward_mode_AD_norm_inf_xpu_complex128", "test_forward_mode_AD_renorm_xpu_complex128", "test_inplace_forward_mode_AD_renorm_xpu_complex128", "test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64", "test_forward_mode_AD_nn_functional_group_norm_xpu_float64", "test_fn_gradgrad_linalg_det_singular_xpu_float64", "test_fn_gradgrad_linalg_pinv_singular_xpu_complex128", "test_fn_grad_masked_normalize_xpu_complex128", "test_fn_grad_renorm_xpu_complex128", "test_fn_gradgrad_linalg_vector_norm_xpu_complex128", "test_fn_gradgrad_masked_normalize_xpu_complex128", "test_fn_gradgrad_renorm_xpu_complex128", "test_inplace_grad_renorm_xpu_complex128", "test_inplace_gradgrad_renorm_xpu_complex128", "test_fn_grad_nn_functional_max_pool2d_xpu_float64", "test_multihead_attn_fast_path_small_test_xpu_float64", --------- Co-authored-by: Feng Yuan <[email protected]>
fix some accuracy problems. 1. add decorator `toleranceOverride` to opinfo when hooking opdb. This method can handle the cases with accuracy problems wrapped by `ops`. (except the difference is nan or inf) 2. fix test_decomp, test_torch accuracy problem. 3. align lastest pytorch code of `ModuleTest_test` "test_Conv2d_dilated_with_long_tensor_cuda", "test_Conv2d_groups_thnn_with_long_tensor_cuda", "test_Conv2d_groups_with_long_tensor_cuda", "test_Conv2d_no_bias_with_long_tensor_cuda", "test_Conv2d_padding_with_long_tensor_cuda", "test_Conv2d_strided_with_long_tensor_cuda", "test_Conv2d_with_long_tensor_cuda", "test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda", "test_Conv3d_groups_with_long_tensor_cuda", "test_Conv3d_no_bias_with_long_tensor_cuda", "test_Conv3d_stride_padding_with_long_tensor_cuda", "test_Conv3d_stride_with_long_tensor_cuda", "test_Conv3d_with_long_tensor_cuda", "test_ConvTranspose2d_dilated_with_long_tensor_cuda", "test_ConvTranspose2d_groups_with_long_tensor_cuda", "test_ConvTranspose2d_no_bias_with_long_tensor_cuda", "test_ConvTranspose2d_with_long_tensor_cuda", 5. add 'nn_AvgPool2d' to cuda xfail list "test_memory_format_nn_AvgPool2d_xpu_float32", "test_memory_format_nn_AvgPool2d_xpu_float64", 7. clean skiplist, remove pass cases with lastest code. "test_compare_cpu_native_dropout_backward_xpu_bool", "test_compare_cpu_native_dropout_backward_xpu_int16", "test_compare_cpu_native_dropout_backward_xpu_int32", "test_compare_cpu_native_dropout_backward_xpu_int64", "test_compare_cpu_native_dropout_backward_xpu_int8", "test_compare_cpu_native_dropout_backward_xpu_uint8", "test_compare_cpu_nn_functional_avg_pool2d_xpu_int64", "test_compare_cpu_abs_xpu_bool", "test_dtypes_nn_functional_linear_xpu", "test_dtypes_nn_functional_pad_replicate_negative_xpu", "test_dtypes_nn_functional_pad_replicate_xpu", "test_dtypes_unique_consecutive_xpu", "test_SmoothL1Loss_no_batch_dim_mean_cuda_half", "test_SmoothL1Loss_no_batch_dim_none_cuda_half", "test_SmoothL1Loss_no_batch_dim_sum_cuda_half", "test_tensor_ctor_device_inference_xpu", "test_trace_xpu_float16", "test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64", "test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128", "test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128", "test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128", "test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128", "test_fn_fwgrad_bwgrad_renorm_xpu_complex128", "test_forward_mode_AD_linalg_vector_norm_xpu_complex128", "test_forward_mode_AD_masked_normalize_xpu_complex128", "test_forward_mode_AD_norm_inf_xpu_complex128", "test_forward_mode_AD_renorm_xpu_complex128", "test_inplace_forward_mode_AD_renorm_xpu_complex128", "test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64", "test_forward_mode_AD_nn_functional_group_norm_xpu_float64", "test_fn_gradgrad_linalg_det_singular_xpu_float64", "test_fn_gradgrad_linalg_pinv_singular_xpu_complex128", "test_fn_grad_masked_normalize_xpu_complex128", "test_fn_grad_renorm_xpu_complex128", "test_fn_gradgrad_linalg_vector_norm_xpu_complex128", "test_fn_gradgrad_masked_normalize_xpu_complex128", "test_fn_gradgrad_renorm_xpu_complex128", "test_inplace_grad_renorm_xpu_complex128", "test_inplace_gradgrad_renorm_xpu_complex128", "test_fn_grad_nn_functional_max_pool2d_xpu_float64", "test_multihead_attn_fast_path_small_test_xpu_float64", --------- Co-authored-by: Feng Yuan <[email protected]>
fix some accuracy problems.
add decorator
toleranceOverride
to opinfo when hooking opdb.This method can handle the cases with accuracy problems wrapped by
ops
. (except the difference is nan or inf)fix test_decomp, test_torch accuracy problem.
align lastest pytorch code of
ModuleTest_test
"test_Conv2d_dilated_with_long_tensor_cuda",
"test_Conv2d_groups_thnn_with_long_tensor_cuda",
"test_Conv2d_groups_with_long_tensor_cuda",
"test_Conv2d_no_bias_with_long_tensor_cuda",
"test_Conv2d_padding_with_long_tensor_cuda",
"test_Conv2d_strided_with_long_tensor_cuda",
"test_Conv2d_with_long_tensor_cuda",
"test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda",
"test_Conv3d_groups_with_long_tensor_cuda",
"test_Conv3d_no_bias_with_long_tensor_cuda",
"test_Conv3d_stride_padding_with_long_tensor_cuda",
"test_Conv3d_stride_with_long_tensor_cuda",
"test_Conv3d_with_long_tensor_cuda",
"test_ConvTranspose2d_dilated_with_long_tensor_cuda",
"test_ConvTranspose2d_groups_with_long_tensor_cuda",
"test_ConvTranspose2d_no_bias_with_long_tensor_cuda",
"test_ConvTranspose2d_with_long_tensor_cuda",
add 'nn_AvgPool2d' to cuda xfail list
"test_memory_format_nn_AvgPool2d_xpu_float32",
"test_memory_format_nn_AvgPool2d_xpu_float64",
clean skiplist, remove pass cases with lastest code.
"test_compare_cpu_native_dropout_backward_xpu_bool",
"test_compare_cpu_native_dropout_backward_xpu_int16",
"test_compare_cpu_native_dropout_backward_xpu_int32",
"test_compare_cpu_native_dropout_backward_xpu_int64",
"test_compare_cpu_native_dropout_backward_xpu_int8",
"test_compare_cpu_native_dropout_backward_xpu_uint8",
"test_compare_cpu_nn_functional_avg_pool2d_xpu_int64",
"test_compare_cpu_abs_xpu_bool",
"test_dtypes_nn_functional_linear_xpu",
"test_dtypes_nn_functional_pad_replicate_negative_xpu",
"test_dtypes_nn_functional_pad_replicate_xpu",
"test_dtypes_unique_consecutive_xpu",
"test_SmoothL1Loss_no_batch_dim_mean_cuda_half",
"test_SmoothL1Loss_no_batch_dim_none_cuda_half",
"test_SmoothL1Loss_no_batch_dim_sum_cuda_half",
"test_tensor_ctor_device_inference_xpu",
"test_trace_xpu_float16",
"test_fn_fwgrad_bwgrad_linalg_det_singular_xpu_float64",
"test_fn_fwgrad_bwgrad_linalg_pinv_singular_xpu_complex128",
"test_fn_fwgrad_bwgrad_linalg_vector_norm_xpu_complex128",
"test_fn_fwgrad_bwgrad_masked_normalize_xpu_complex128",
"test_fn_fwgrad_bwgrad_norm_inf_xpu_complex128",
"test_fn_fwgrad_bwgrad_renorm_xpu_complex128",
"test_forward_mode_AD_linalg_vector_norm_xpu_complex128",
"test_forward_mode_AD_masked_normalize_xpu_complex128",
"test_forward_mode_AD_norm_inf_xpu_complex128",
"test_forward_mode_AD_renorm_xpu_complex128",
"test_inplace_forward_mode_AD_renorm_xpu_complex128",
"test_fn_fwgrad_bwgrad_nn_functional_group_norm_xpu_float64",
"test_forward_mode_AD_nn_functional_group_norm_xpu_float64",
"test_fn_gradgrad_linalg_det_singular_xpu_float64",
"test_fn_gradgrad_linalg_pinv_singular_xpu_complex128",
"test_fn_grad_masked_normalize_xpu_complex128",
"test_fn_grad_renorm_xpu_complex128",
"test_fn_gradgrad_linalg_vector_norm_xpu_complex128",
"test_fn_gradgrad_masked_normalize_xpu_complex128",
"test_fn_gradgrad_renorm_xpu_complex128",
"test_inplace_grad_renorm_xpu_complex128",
"test_inplace_gradgrad_renorm_xpu_complex128",
"test_fn_grad_nn_functional_max_pool2d_xpu_float64",
"test_multihead_attn_fast_path_small_test_xpu_float64",