Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the resize anchor edge variable if size is scalar #2868

Closed
pmeier opened this issue Oct 22, 2020 · 11 comments
Closed

Make the resize anchor edge variable if size is scalar #2868

pmeier opened this issue Oct 22, 2020 · 11 comments

Comments

@pmeier
Copy link
Collaborator

pmeier commented Oct 22, 2020

🚀 Feature

Make the resize anchor edge variable if size is scalar.

Motivation

Right now torchvision.transforms.Resize() assumes that the smaller edge of the image should be resized to size if its an int.

(h, w), output size will be matched to this. If size is an int,
smaller edge of the image will be matched to this number.

That can be cumbersome if the user wants for example an image that is not larger than x or wants a specific edge to be this size.

Pitch

I propose we add a edge keyword argument which can be one of "short" (default), "long", "vert", and "horz". Based on this we calculate the size.

cc @vfdev-5

@fmassa
Copy link
Member

fmassa commented Oct 22, 2020

An alternative to this, with is widely used in detection models and is present already in Faster R-CNN in torchvision

def _resize_image_and_masks(image, self_min_size, self_max_size, target):

is to provide additionally a max_size argument, with the criteria that, if after rescaling the max size is larger than max_size, then the image will be rescaled back so that it respects max_size.

So the constraint would be that either min(sizes) == min_size or max(sizes) == max_size.

Thoughts?

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 22, 2020

That covers the two (most-important) cases ("short" and "long") but falls flat if I want a specific edge to be that size ("vert" and "horz").

@NicolasHug
Copy link
Member

I might be missing something but it seems that the two proposals are orthogonal and compatible.
Maybe we can implement both, i.e. introduce edge and max_size?

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 23, 2021

@NicolasHug max_size=size would be the same as size=size, edge="long". IMO there is no point of having both other than for minimal convenience.

@NicolasHug
Copy link
Member

My understanding of max_size from #2868 (comment) is that size must always be specified, and the max_size constraint is only evaluated after the image has been resized according to size (and potentially edge). Is this not the case?

Basically I think we can do what you proposed in #2868 (comment), and also add max_size whose logic would be applied after the initial resizing, ensuring that max(new_sizes) == max_size

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 23, 2021

My understanding of max_size from #2868 (comment) is that size must always be specified, and the max_size constraint is only evaluated after the image has been resized according to size (and potentially edge). Is this not the case?

Yeah, I was wrong about that. Basically the two proposals would behave as follows individually:

  • Resize this image using a specific edge as anchor.
  • Resize this image using the short edge as anchor to the given size unless the new size would violate the limits. In that case, scale the new size accordingly.

Basically I think we can do what you proposed in #2868 (comment), and also add max_size whose logic would be applied after the initial resizing, ensuring that max(new_sizes) == max_size

The combinations edge="short", min_size=min_size and edge="long", max_size=max_size would be nonsensical unless the user specifies size that violates the limits directly. In the first case since we are already picking the short edge as anchor the new size will never be smaller than min_size unless size is. For the second case this is the same for the upper limit.

IMO putting both functionalities into resize makes the interface more confusing. My idea was to put my proposal into resize since it is simply a parametrization of the current fixed assumption that the user always wants to use the shorter edge as anchor. The case @fmassa proposed would be more fitting in a resize_clamp function which could be a thin wrapper around resize.

@NicolasHug
Copy link
Member

I agree that edge="long", max_size=max_size doesn't make much sense, we could warn or error in this case. Regarding min_size, I would also agree that this parameter is not needed (on top of the redundancy, it might be impossible to satisfy both min_size and max_size).

@fmassa would you mind sharing your thoughts on:

  • introducing edge only
  • introducing max_size only
  • introducing both

My personal opinion is that they're (mostly) orthogonal concepts and that usage wouldn't be confusing as long as docs are properly informative. But I'm not familiar enough with the use-cases to really assess whether the added complexity is worth it.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 23, 2021

Regarding min_size, I would also agree that this parameter is not needed (on top of the redundancy, it might be impossible to satisfy both min_size and max_size).

If we allow the user to chose an anchor, min_size is "needed" in case the selected edge is the longer one. Consider an image of size (10,2) that we want to resize to size=5 with edge="long" and min_size=2. Without min_size the resulting size would be (5,1) and with min_size=2 (10, 2).

@fmassa
Copy link
Member

fmassa commented Feb 23, 2021

@NicolasHug I think adding both would hurt readability and understandability of the function, even if the documentation tries to make it clear. It's just going to be too much branching in the function, and it will probably confuse more than one user.

Let's have a look at what some other libraries have done about this:

I'd love to see what some other libraries have done on this regard.

@NicolasHug
Copy link
Member

From what I can tell, detectron2's ResizeShortestEdge corresponds to torchvision's current resize with the max_size parameter.

Regarding the other transforms, they don't allow size to be an int, which is what we're concerned with (Detectron2.resize accepts an int but it will output a square size x size image). In tensorflow, preserve_aspect_ratio=True makes size act as a bounding box for the resized image, so it has a bit of a max_size flavour too.

I could not find similar things in classyvision or in PIL

@NicolasHug
Copy link
Member

I think we can close this now that #8459 allows Resize to resize the longest edge to max_size. (summary: #8358 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants