Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opcheck testing for nms #7961

Merged
merged 9 commits into from
Oct 20, 2023
Merged

Add opcheck testing for nms #7961

merged 9 commits into from
Oct 20, 2023

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Sep 13, 2023

Signed-off-by: Edward Z. Yang [email protected]

cc @pmeier

Signed-off-by: Edward Z. Yang <[email protected]>
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 13, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7961

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 00673be with merge base e3fb8c0 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@ezyang
Copy link
Contributor Author

ezyang commented Sep 13, 2023

This adds testing for the meta registrations as promised. PyTorch core doesn't have all of the things needed landed yet, so we can't merge it yet pytorch/pytorch#108936

ezyang added a commit to pytorch/pytorch that referenced this pull request Sep 14, 2023
…enerating optest"


Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Companion torchvision PR which uses this at pytorch/vision#7961

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit to pytorch/pytorch that referenced this pull request Sep 14, 2023
Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Companion torchvision PR which uses this at pytorch/vision#7961

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Sep 14, 2023
Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Companion torchvision PR which uses this at pytorch/vision#7961

Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: #108936
Approved by: https://github.com/zou3519
test/test_ops.py Outdated
Comment on lines 847 to 854
optests.generate_opcheck_tests(
TestNMS,
["torchvision"],
{},
"test/test_ops.py",
[],
data_dependent_torchvision_test_checks,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, after pytorch/pytorch#109110 the failures dict is no longer a python dict, generate_opcheck_tests now assumes the existence of a json file.

The NMS tests work on everything, so there are no expected failures. It seems unfortunate that we would need to have a .json file in the repo to use generate_opcheck_tests. But I don't really have a better idea right now, the reason why we require a json file is so that we can automatically update it by writing to it; a string is a bit more difficult because we haven't hooked into the expecttest mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind having to have an empty JSON, but I am going to have to do this for each test class (of which there are a lot)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a similar problem with fbgemm, where there are a lot of test classes. Maybe everything should roll into the same json file

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @ezyang

Our CI is missing the expecttest dep, I pushed a commit for that. Let's see.

pytest.ini Outdated Show resolved Hide resolved
test/test_ops.py Outdated Show resolved Hide resolved
@ezyang
Copy link
Contributor Author

ezyang commented Sep 25, 2023

Can't land this yet, because @zou3519 is going to make some more changes to the opcheck API that I want to land before I do this, but the review is appreciated.

@@ -462,9 +471,10 @@ def test_boxes_shape(self):

@pytest.mark.parametrize("aligned", (True, False))
@pytest.mark.parametrize("device", cpu_and_cuda_and_mps())
@pytest.mark.parametrize("x_dtype", (torch.float16, torch.float32, torch.float64), ids=str)
@pytest.mark.parametrize("x_dtype", (torch.float16, torch.float32, torch.float64)) # , ids=str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang I updated the PR according to the new opcheck, LGTM. LMK if there's anything more you wanted to add on your side, otherwise I'll merge.

Just FYI, we're hitting this "NYI" failure #7961 (comment)

@ezyang
Copy link
Contributor Author

ezyang commented Oct 20, 2023

No, thank you so much for finishing it up! Please merge whenever you're ready.

@NicolasHug NicolasHug merged commit 68161e9 into pytorch:main Oct 20, 2023
@github-actions
Copy link

Hey @NicolasHug!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

@NicolasHug NicolasHug added the pt2 label Nov 13, 2023
facebook-github-bot pushed a commit that referenced this pull request Nov 13, 2023
Summary: Signed-off-by: Edward Z. Yang <[email protected]>

Reviewed By: vmoens

Differential Revision: D50789092

fbshipit-source-id: 614de3d6949a84ca576b9e7344de2e8e18152bf3

Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants