Add autocast for nms, roi_align on CPU #8049

chunyuan-w · 2023-10-17T01:47:29Z

nms and roi_align only support FP32. The current code only handled these OPs for CUDA via autocast to convert all the inputs to FP32. This PR adds the autocast support on CPU.

Fixes bfloat16 on CPU for the below models (pytorch/pytorch#110841):

detectron2_fasterrcnn_r_101_c4
detectron2_fasterrcnn_r_101_dc5
detectron2_fasterrcnn_r_101_fpn
detectron2_fasterrcnn_r_50_c4
detectron2_fasterrcnn_r_50_dc5
detectron2_fasterrcnn_r_50_fpn
detectron2_fcos_r_50_fpn
vision_maskrcnn

pytorch-bot · 2023-10-17T01:47:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8049

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 25 Unrelated Failures

As of commit 7e222c1 with merge base b80bdb7 ():

NEW FAILURE - The following job has failed:

CMake / windows (windows.g5.4xlarge.nvidia.gpu, cuda, 11.8) / windows-job (gh)

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

chunyuan-w · 2023-10-17T07:14:04Z

cc @jgong5, @Guobing-Chen, @leslie-fang-intel

NicolasHug

Thanks @chunyuan-w , PR looks good, I made a few comments below. LMK what you think!

test/test_ops.py

NicolasHug · 2023-10-18T20:47:32Z

test/test_ops.py

+    @pytest.mark.parametrize("dtype", (torch.float, torch.bfloat16))
+    def test_autocast_cpu(self, iou, dtype):
+        with torch.cpu.amp.autocast():
+            self.test_nms_cpu(iou=iou, dtype=dtype)        


Would it be enough to just call test_nms_ref() here (perhaps with slight modifications) ?

If we really need a specific test as in test_nms_cpu then maybe we can just inline it below instead of having test_nms_cpu as a standalone.

I modified test_nms_ref to check the case where the dtype is torch.bfloat16 on cpu. Could you please check if it looks good to you?

NicolasHug · 2023-10-19T15:00:29Z

test/test_ops.py

+        if dtype == torch.bfloat16:
+            keep_ref_float = ops.nms(boxes.to(dtype).float(), scores.to(dtype).float(), iou)
+            keep_dtype = ops.nms(boxes.to(dtype), scores.to(dtype), iou)
+            torch.testing.assert_close(keep_ref_float, keep_dtype)


Thanks for the update and sorry for the noise - shouldn't we instead assert that we get equivalent results when autocast is on and when it's off? e.g. something like

@pytest.mark.parametrize("iou", (0.2, 0.5, 0.8)) @pytest.mark.parametrize("seed", range(10)) def test_autocast_cpu(self, iou, seed): torch.random.manual_seed(seed) boxes, scores = self._create_tensors_with_iou(1000, iou) keep = ops.nms(boxes, scores, iou) with torch.cuda.amp.autocast(dtype=torch.bfloat16): keep_autocast = ops.nms(boxes, scores, iou) torch.testing.assert_close(keep, keep_autocast)

The test for roi_align calls test_forward() which has a slightly different logic where instead we check our implem against a reference, both with autocast. Unfortunately that same test seems to fail for nms, so maybe we can just use the snippet above. WDYT?

Thanks for the suggestion. Here are two concerns:

Since nms is not in the white list of autocast, simply running the above UT won't be able to expose the issue that this PR is trying to fix, since the input to nms will still be fp32 in the above snippet. Instead, we need to provide an input in BF16 to nms under autocast, so that without this PR, it will throw RuntimeError: "nms_kernel" not implemented for 'BFloat16', while with this PR, autocast will convert the input back to FP32 so that it could run successfully.

A modified version following item 1 will be like this:

@pytest.mark.parametrize("iou", (0.2, 0.5, 0.8)) @pytest.mark.parametrize("seed", range(10)) def test_autocast_cpu(self, iou, seed): torch.random.manual_seed(seed) boxes, scores = self._create_tensors_with_iou(1000, iou) keep = ops.nms(boxes, scores, iou) with torch.cpu.amp.autocast(dtype=torch.bfloat16): keep_autocast = ops.nms(boxes.to(torch.bfloat16), scores.to(torch.bfloat16), iou) torch.testing.assert_close(keep, keep_autocast)

The issue with this version is that, since boxes.to(torch.bfloat16) and scores.to(torch.bfloat16) has converted data from high precision to low precision, when directly comparing the result keep_autocast and keep, we'll meet:
AssertionError: The values for attribute 'shape' do not match: torch.Size([434]) != torch.Size([436])..
I used boxes.to(dtype).float() in the current implementation here to simulate the process of converting from high precision to low precision and convert it back to float to be the reference result to compare with the autocast result.

NicolasHug

Thanks for the PR and for your explanations @chunyuan-w . LGTM, I'll merge when green

chunyuan-w · 2023-10-23T01:17:41Z

@NicolasHug I've updated the branch to merge the latest branch from the main branch. Could you please help approve the CI workflow runs?

chunyuan-w · 2023-10-26T00:39:22Z

@NicolasHug there's one failure in the CI which is marked as unrelated to this PR. May I know if it's fine to land it? I can update the branch to merge the latest changes from the main branch if needed.

Reviewed By: vmoens Differential Revision: D50789096 fbshipit-source-id: ca847eded3b90409b0cd25e849034cb5dc83b6b8 Co-authored-by: Nicolas Hug <[email protected]> Co-authored-by: Nicolas Hug <[email protected]>

chunyuan-w added 2 commits October 16, 2023 15:39

add autocast for nms, roi_align on CPU

e6d6159

add UT

58bc1d6

facebook-github-bot added the cla signed label Oct 17, 2023

chunyuan-w mentioned this pull request Oct 17, 2023

[Inductor] [cpu][amp] Eager model failed to run for some torchbench models pytorch/pytorch#110841

Closed

chunyuan-w added 2 commits October 17, 2023 01:53

Merge branch 'main' into chunyuan/cpu_autocast_pr

a1acb13

update tol

134af61

chunyuan-w marked this pull request as ready for review October 17, 2023 07:13

chunyuan-w marked this pull request as draft October 18, 2023 08:11

fix "multiple rules generate" error on CUDA

b4de21e

chunyuan-w marked this pull request as ready for review October 18, 2023 09:24

NicolasHug reviewed Oct 18, 2023

View reviewed changes

update UT and fix format

a792605

NicolasHug reviewed Oct 19, 2023

View reviewed changes

Put back test_autocast_cpu as standalone

8918d2b

NicolasHug approved these changes Oct 20, 2023

View reviewed changes

Merge branch 'main' into chunyuan/cpu_autocast_pr

11927fa

Merge branch 'main' into chunyuan/cpu_autocast_pr

7e222c1

NicolasHug merged commit 8a0b491 into pytorch:main Oct 27, 2023

NicolasHug added enhancement module: ops labels Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add autocast for nms, roi_align on CPU #8049

Add autocast for nms, roi_align on CPU #8049

Uh oh!

chunyuan-w commented Oct 17, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 17, 2023 •

edited

Loading

Uh oh!

chunyuan-w commented Oct 17, 2023

Uh oh!

NicolasHug left a comment

Uh oh!

Uh oh!

NicolasHug Oct 18, 2023

Uh oh!

chunyuan-w Oct 19, 2023

Uh oh!

NicolasHug Oct 19, 2023

Uh oh!

chunyuan-w Oct 20, 2023

Uh oh!

NicolasHug left a comment

Uh oh!

chunyuan-w commented Oct 23, 2023

Uh oh!

chunyuan-w commented Oct 26, 2023

Uh oh!

Uh oh!

Add autocast for nms, roi_align on CPU #8049

Add autocast for nms, roi_align on CPU #8049

Uh oh!

Conversation

chunyuan-w commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8049

❌ 1 New Failure, 25 Unrelated Failures

Uh oh!

chunyuan-w commented Oct 17, 2023

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

chunyuan-w Oct 19, 2023

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 19, 2023

Choose a reason for hiding this comment

Uh oh!

chunyuan-w Oct 20, 2023

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

chunyuan-w commented Oct 23, 2023

Uh oh!

chunyuan-w commented Oct 26, 2023

Uh oh!

Uh oh!

chunyuan-w commented Oct 17, 2023 •

edited

Loading

pytorch-bot bot commented Oct 17, 2023 •

edited

Loading