Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add autocast for nms, roi_align on CPU #8049

Merged
merged 9 commits into from
Oct 27, 2023

Conversation

chunyuan-w
Copy link
Contributor

@chunyuan-w chunyuan-w commented Oct 17, 2023

nms and roi_align only support FP32. The current code only handled these OPs for CUDA via autocast to convert all the inputs to FP32. This PR adds the autocast support on CPU.

Fixes bfloat16 on CPU for the below models (pytorch/pytorch#110841):

  • detectron2_fasterrcnn_r_101_c4
  • detectron2_fasterrcnn_r_101_dc5
  • detectron2_fasterrcnn_r_101_fpn
  • detectron2_fasterrcnn_r_50_c4
  • detectron2_fasterrcnn_r_50_dc5
  • detectron2_fasterrcnn_r_50_fpn
  • detectron2_fcos_r_50_fpn
  • vision_maskrcnn

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 17, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8049

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 25 Unrelated Failures

As of commit 7e222c1 with merge base b80bdb7 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@chunyuan-w chunyuan-w marked this pull request as ready for review October 17, 2023 07:13
@chunyuan-w
Copy link
Contributor Author

@chunyuan-w chunyuan-w marked this pull request as draft October 18, 2023 08:11
@chunyuan-w chunyuan-w marked this pull request as ready for review October 18, 2023 09:24
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chunyuan-w , PR looks good, I made a few comments below. LMK what you think!

test/test_ops.py Outdated Show resolved Hide resolved
test/test_ops.py Outdated
@pytest.mark.parametrize("dtype", (torch.float, torch.bfloat16))
def test_autocast_cpu(self, iou, dtype):
with torch.cpu.amp.autocast():
self.test_nms_cpu(iou=iou, dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be enough to just call test_nms_ref() here (perhaps with slight modifications) ?

If we really need a specific test as in test_nms_cpu then maybe we can just inline it below instead of having test_nms_cpu as a standalone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified test_nms_ref to check the case where the dtype is torch.bfloat16 on cpu. Could you please check if it looks good to you?

test/test_ops.py Outdated
Comment on lines 740 to 743
if dtype == torch.bfloat16:
keep_ref_float = ops.nms(boxes.to(dtype).float(), scores.to(dtype).float(), iou)
keep_dtype = ops.nms(boxes.to(dtype), scores.to(dtype), iou)
torch.testing.assert_close(keep_ref_float, keep_dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update and sorry for the noise - shouldn't we instead assert that we get equivalent results when autocast is on and when it's off? e.g. something like

    @pytest.mark.parametrize("iou", (0.2, 0.5, 0.8))
    @pytest.mark.parametrize("seed", range(10))
    def test_autocast_cpu(self, iou, seed):
        torch.random.manual_seed(seed)
        boxes, scores = self._create_tensors_with_iou(1000, iou)
        keep = ops.nms(boxes, scores, iou)
        with torch.cuda.amp.autocast(dtype=torch.bfloat16):
            keep_autocast = ops.nms(boxes, scores, iou)
        torch.testing.assert_close(keep, keep_autocast)

The test for roi_align calls test_forward() which has a slightly different logic where instead we check our implem against a reference, both with autocast. Unfortunately that same test seems to fail for nms, so maybe we can just use the snippet above. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Here are two concerns:

  1. Since nms is not in the white list of autocast, simply running the above UT won't be able to expose the issue that this PR is trying to fix, since the input to nms will still be fp32 in the above snippet. Instead, we need to provide an input in BF16 to nms under autocast, so that without this PR, it will throw RuntimeError: "nms_kernel" not implemented for 'BFloat16', while with this PR, autocast will convert the input back to FP32 so that it could run successfully.
  2. A modified version following item 1 will be like this:
    @pytest.mark.parametrize("iou", (0.2, 0.5, 0.8))
    @pytest.mark.parametrize("seed", range(10))
    def test_autocast_cpu(self, iou, seed):
        torch.random.manual_seed(seed)
        boxes, scores = self._create_tensors_with_iou(1000, iou)
        keep = ops.nms(boxes, scores, iou)
        with torch.cpu.amp.autocast(dtype=torch.bfloat16):
            keep_autocast = ops.nms(boxes.to(torch.bfloat16), scores.to(torch.bfloat16), iou)
        torch.testing.assert_close(keep, keep_autocast)

The issue with this version is that, since boxes.to(torch.bfloat16) and scores.to(torch.bfloat16) has converted data from high precision to low precision, when directly comparing the result keep_autocast and keep, we'll meet:
AssertionError: The values for attribute 'shape' do not match: torch.Size([434]) != torch.Size([436])..
I used boxes.to(dtype).float() in the current implementation here to simulate the process of converting from high precision to low precision and convert it back to float to be the reference result to compare with the autocast result.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and for your explanations @chunyuan-w . LGTM, I'll merge when green

@chunyuan-w
Copy link
Contributor Author

@NicolasHug I've updated the branch to merge the latest branch from the main branch. Could you please help approve the CI workflow runs?

@chunyuan-w
Copy link
Contributor Author

@NicolasHug there's one failure in the CI which is marked as unrelated to this PR. May I know if it's fine to land it? I can update the branch to merge the latest changes from the main branch if needed.

@NicolasHug NicolasHug merged commit 8a0b491 into pytorch:main Oct 27, 2023
facebook-github-bot pushed a commit that referenced this pull request Nov 14, 2023
Reviewed By: vmoens

Differential Revision: D50789096

fbshipit-source-id: ca847eded3b90409b0cd25e849034cb5dc83b6b8

Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants