Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] PIL image/enhance ; OpenCV; scikit-image ops <> torchvision transforms migration advice / summary table / test+diff images / comments in individual functions #5194

Open
vadimkantorov opened this issue Jan 13, 2022 · 9 comments

Comments

@vadimkantorov
Copy link

vadimkantorov commented Jan 13, 2022

Original name: [docs] F.gaussian_blur should specify relation of kernel_size / sigma to PIL's sigma

📚 The doc issue

A lot of legacy GaussianBlur augmentations like in https://github.com/facebookresearch/dino/blob/d2f3156/utils.py#L36 and in https://github.com/facebookresearch/unbiased-teacher/blob/main/ubteacher/data/transforms/augmentation_impl.py#L20 have used PIL's GaussianBlur which has only a single radius parameter. New torchvision's GaussianBlur has two parameters: hard kernel_size and soft sigma. It would be very useful if their semantics is explained in relation to existing/popular/legacy PIL's arguments. This will help in porting the legacy code to new torchvision's native GaussianBlur.

For reference, native pillow's implementation: https://github.com/python-pillow/Pillow/blob/95cff6e959bb3c37848158ed2145d49d49806a31/src/libImaging/BoxBlur.c#L286. It also seems that Pillow's implementation is actually a uniform weighted blur, not a true gaussian one

Related question on SO: https://stackoverflow.com/questions/62968174/for-pil-imagefilter-gaussianblur-how-what-kernel-is-used-and-does-the-radius-par which suggests that radius ~= sigma

Related issues:

cc @vfdev-5 @datumbox

@vadimkantorov vadimkantorov changed the title F.gaussian_blur should specify relation of kernel_size / sigma to PIL's sigma [docs] F.gaussian_blur should specify relation of kernel_size / sigma to PIL's sigma Jan 17, 2022
@vadimkantorov
Copy link
Author

vadimkantorov commented Jan 18, 2022

Similar explicit comments would be very useful for migration away from PIL's usages of ImageOps / ImageEnhance found in many codebases (e.g. recent https://github.com/microsoft/SoftTeacher/blob/main/ssod/datasets/pipelines/rand_aug.py). Explicit comments on supported image ranges are also useful (e.g. uint8 [0; 255], int8 [-128; 128], float [0; 1], float [-1, 1], more general zero-centered float)

@vadimkantorov vadimkantorov changed the title [docs] F.gaussian_blur should specify relation of kernel_size / sigma to PIL's sigma [docs] PIL image/enhance ops <> torchvision transforms migration advice / table / comments in individual functions Jan 18, 2022
@vadimkantorov vadimkantorov changed the title [docs] PIL image/enhance ops <> torchvision transforms migration advice / table / comments in individual functions [docs] PIL image/enhance ops <> torchvision transforms migration advice / summary table / comments in individual functions Jan 18, 2022
@vadimkantorov
Copy link
Author

Recent example of related PIL <> torchvision transforms confusion: #5204

@vadimkantorov
Copy link
Author

Ideally, there should be tests and image examples of PIL vs torchvision calls of same transforms

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 18, 2022

@vadimkantorov researchers have a freedom to use any kind of implementation and coin a name to it. When GaussianBlur was introduced to torchvision (#2635), the reference was swav project and https://github.com/facebookresearch/swav/blob/d4970c83791c8bd9928ec4efcc25d4fd788c405f/src/multicropdataset.py#L65 where they use opencv, fixed kernel size and sigma. swav historically is earlier than dino and others. I do not know the reason why they switched from one to another implementation (maybe, to kick-out opencv dependency?). Also PIL blur kernel size is limited as far as I remember.

As for #5194 (comment)
it is not about PIL vs torchvision (as PIL does not provide shear arg, only affine matrix) but how parametrization (rotation, scale, shear, translation -> affine matrix) is done in torchvision vs scikit-image and computer vision/graphics conventions.

@vadimkantorov
Copy link
Author

vadimkantorov commented Jan 18, 2022

torchvision vs scikit-image

Yeah, I guess this comparison is also relevant...

My point is that there are plenty of code using legacy non-tensor ops implemented in PIL and OpenCV. For building upon this code, modernization of using more modern torchvision impls is considered (as it probably improves perf and sometimes enable differentiability and GPU usage). It would be nice to document differences of torchvision versions from other existing versions in PIL / OpenCV / scikit-image.

coin a name to it.

I think this should not be a reason for inviting everyone to re-duplicate effort of testing different versions of PIL's GaussianBlur vs torchvision's GaussianBlur and OpenCV's GaussianBlur vs torchvision's GaussianBlur.

@vadimkantorov vadimkantorov changed the title [docs] PIL image/enhance ops <> torchvision transforms migration advice / summary table / comments in individual functions [docs] PIL image/enhance ; OpenCV; scikit-image ops <> torchvision transforms migration advice / summary table / comments in individual functions Jan 18, 2022
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 18, 2022

It would be nice to document differences of torchvision versions from other existing versions in PIL / OpenCV / scikit-image.

This is very large question :)
For instance, resize op with all its options is not implemented identically in PyTorch, TF, Opencv, PIL, Scikit-Image. OpenCV is a reference for PyTorch implementations but OpenCV nearest resize has a bug which is "fixed" with another option (to keep BC). For example, https://gist.github.com/vfdev-5/a5bd5b1477b1c82a87a0f9e25c727664
And imagine doing that for all other ops :)

@vadimkantorov
Copy link
Author

vadimkantorov commented Jan 18, 2022

Well, that's the problem :) It would be nice to start collecting a battery of test examples and small code snippets evaluating these functions with different options (or at least default options?) with different libraries: publish these images, html with all them and diff images. Especially if the ops are implemented slightly differently, these differences in results should be presented even if the root cause of difference isn't diagnosed / researched yet.

Otherwise, users do this duplicate effort or just abandon it and have some hard-to-detect bugs in all these teacher-student self-supervised models that are relying on very particular hyper-params of augmentations to be set right.

My point is that incomplete, published to docs result of this testing are better than no testing and it being discussed in forums and passed from a PhD student to the next one as dark knowledge :)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 18, 2022

Yeah, implementation details (and bugs) could be as you say the dark knowledge to pass. Like why deeplab models have input of size 513 pixels, https://ppwwyyxx.com/blog/2021/Where-are-Pixels/#Libraries

@vadimkantorov
Copy link
Author

Yeah, I'm a big fan of specifying in the docs (also of box_ops) of exactly what the impl is doing and exactly what format of coordinates is expected under big influence of this blog post :)

@vadimkantorov vadimkantorov changed the title [docs] PIL image/enhance ; OpenCV; scikit-image ops <> torchvision transforms migration advice / summary table / comments in individual functions [docs] PIL image/enhance ; OpenCV; scikit-image ops <> torchvision transforms migration advice / summary table / test+diff images / comments in individual functions Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants