-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Upgrade OP Precision to Float64 (English)
Specification summary:
- Section 1, Details of the Specification
- Section 2, Instructions of CI Check
- Section 3, Suggestions for CI Check Failure
- Section 4, Verify Meeting the Specification
Supplementary Note:
You may find some aspects that are not taken into account in the existing specifications, which need to be continuously supplemented and improved during the implementation process. Please feel free to give your feedback.
At present, most of op tests have float32 precision and use different error threshold. In order to guarantee the correctness of all op, we propose the specification of upgrading the precision of op test from float32 to float64.
All op tests inherited from OpTest will infer the precision from the inputs. If the precision is float32, the op test should upgrade the precision to float64, and then passes the forward and backward check with unified threshold.
Generally, the inputs of op test always contain several elements. According to float64>float32>float16>int64>int32>int16>int8>bool
, the highest precision of all elements is the precision of the op test.
Once upgrading the precision of op test from float32 to float64, all error threshold will be unified as follows:
- The absolute error threshold
atol=0
in forward check. - The relative error threshold
max_relative_error=1e-7
in backward check, but we actually apply dynamic threshold method.- When the value is less than 1e-10, require the absolute error is less than 1e-10.
- When the value is between 1e-10 and 1e-8, require the ralative error is less than 1e-3.
- When the value is between 1e-8 and 1e-6, require the ralative error is less than 1e-5.
- When the value is greater than 1e-6, require the ralative error is less than 1e-7.
All op tests inherited from OpTest will infer the precision from inputs.
- If the precision of op test is float32, it will raise an error for no check_grad with the precision of float64, such as
AssertionError: This test of ** op needs check_grad with fp64 precision.
. - If the precision of op test is float64, all error threshold will be unified in the forward and backward check. When the check cannot meet the unified error threshold, the CI also raise an error, such as
AssertionError: ** Variable ** max gradient diff ** over limit 1e-7
.
For the following two cases,CI will skip the check_grad check with the precision of float64:
- The op test is decorated by skip_check_grad_ci.
- The op does not have backward propagation.
If the op test does no meet the specification of upgrading the precision to float64, you can refer to the following suggestions for modification:
- If the op test does not call the functions in the parent class of OpTest, it should inherit the unittest.TestCase. Please refer to pr.
- If the precision of the op test is float32, you can refer to "1. Details of the specification" and change the data type of inputs to float64. Afterwards, you should run the op test according to "4. Verify Meeting the Specification" . Please refer to pr.
- If the op test has float64 percision and cannot meet the standard error threshold, it is necessary to fix the error of op implement. Afterwards, you should also refer to "4. Verify Meeting the Specification" and run the op test..
- If the op is unable to perform gradient check or meet the standard error threshold due to special reasons, use the decorator of skip_check_grad_ci, which also requires specific review. Please refer to pr.
For the op test that upgrade the precision from float32 to float64:
- Delete the op from NO_FP64_CHECK_GRAD_OP_LIST in op_accuracy_white_list
- If the op test is related to mkldnn, add CI check of this specification for it. In detail, remove the
and (not hasattr(cls, "use_mkldnn") or cls.use_mkldnn == False)
in tearDownClass function of OpTest Class. - Push your code to paddlepaddle.
- If it passes the CI check, the op test meets the specification.
- If it raise errors for not meet the standard error threshold, you should modify the op test again according to "3. Suggestions for CI Check Failure".
For the op test that have float64 precision and cannot meet the standard error threshold:
- Delete the op from NEED_FIX_FP64_CHECK_GRAD_THRESHOLD_OP_LIST in op_threshold_white_list .
- Push your code to paddlepaddle.
- If it passes the CI check, the op test meets the specification.
If you have other problems, please contact @juncaipeng