Add aten::index_reduce operator #1156
Open
+507
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implemented the index_reduce, index_reduce_ and index_reduce.out operators.
It passes most of the unit tests on the PVC platform except 3 tests. Those failed tests are bf16 and float16.
I checked the differences and found those are small. For example:
Mismatched elements: 1 / 60 (1.7%)
Greatest absolute difference: 0.0625 at index (2, 1, 4) (up to 1e-05 allowed)
Greatest relative difference: 0.001125335693359375 at index (2, 1, 4) (up to 0.001 allowed)
To execute this test, run the following from the base repo dir:
python test/xpu/test_torch_xpu.py
TestTorchDeviceTypeXPU.test_index_reduce_reduce_prod_xpu_float16
I suspect the software emulation of the atomic operations of those low-level precisions causes that. Shall we skip those tests?
Need inputs from @xytintel and @fengyuan14
Side note: This function is in beta and may change in the near future. (Ref - https://pytorch.org/docs/stable/generated/torch.Tensor.index_reduce_.html)