-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Upgrade OP Precision to Float64 (English)
Specification summary:
- Section 1, Details of the Specification
- Section 2, Instructions of CI Check
- Section 3, Suggestions for CI Check Failure
- Section 4, Verify Meeting the Specification
Supplementary Note:
You may find some aspects that are not taken into account in the existing specifications, which need to be continuously supplemented and improved during the implementation process. Please feel free to give your feedback.
At present, the precision of most of op tests is float32, and the threshold is not unified. In order to guarantee the correctness of all op, we propose the specification of upgrading the precision of op test to float64.
All op tests inherited from OpTest will infer the precision from the inputs. If the precision is float32, the op test should upgrade the precision to float64, and passes the forward and backward check with the unified threshold.
Generally, the inputs of op test always contain several elements. Therefore, the the highest precision of all elements will be set as the precision of the op test. This is, float64>float32>float16>int64>int32>int16>int8>bool
.
When the precision of op test is upgraded to float64, the absolute error threshold (atol
) will be set to 0 in forward check, and the relative error threshold (max_relative_error
) will be set to 1e-7 in backward check.
All op tests inherited from OpTest will infer the precision from inputs.
- If the precision of op test is float32, it will raise an error for no check_grad with precision of float64. An example of the error messages is
AssertionError: This test of ** op needs check_grad with fp64 precision.
. - If the precision of op test is float64, all error threshold will be unified in the forward and backward check. When the check cannot meet the unified error threshold, the CI also raise an error, such as
AssertionError: ** Variable ** max gradient diff ** over limit 1e-7
.
All ops saved in EMPTY_GRAD_OP_LIST does not have backward propagation, so the CI will skip the check_grad with precision of float64 by the white list. When the op test is decorated by skip_check_grad_ci, the CI will also skip the check_grad with precision of float64.
If the op test does no meet the specification of upgrading the precision to float64, you can refer to the following suggestions for modification:
- If the op test does not call the functions that owned by the parent class of OpTest, it should use unittest.TestCase as the parent class. Please refer to pr.
- If the op does not have backward propagation, it is not necessary to perform the gradient check. Therefore, you can add the op into the EMPTY_GRAD_OP_LIST to skip the gradient check, which requires specific review. Please refer to pr.
- If the data type of inputs is float32 in the op test, you should change the data type of inputs to float64. Once upgrading the precision to float64, run the op test to check whether it meets the specification. Please refer to pr.
- If the op test cannot meet the specification after upgrading the precision to float64, it is necessary to fix the error of op and kernel.
- If the op is unable to perform gradient check or meet the standard error threshold due to special reasons, use the decorator of skip_check_grad_ci, which also requires specific review. Please refer to pr.
After fixing the op test according to the suggestions, you should delete the op from NO_FP64_CHECK_GRAD_OP_LIST and push your code to paddlepaddle. If it passes the CI check, the op meets the specification.
If you have other problems, please contact @juncaipeng