-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] PIL image/enhance ; OpenCV; scikit-image ops <> torchvision transforms migration advice / summary table / test+diff images / comments in individual functions #5194
Comments
Similar explicit comments would be very useful for migration away from PIL's usages of ImageOps / ImageEnhance found in many codebases (e.g. recent https://github.com/microsoft/SoftTeacher/blob/main/ssod/datasets/pipelines/rand_aug.py). Explicit comments on supported image ranges are also useful (e.g. uint8 [0; 255], int8 [-128; 128], float [0; 1], float [-1, 1], more general zero-centered float) |
Recent example of related PIL <> torchvision transforms confusion: #5204 |
Ideally, there should be tests and image examples of PIL vs torchvision calls of same transforms |
@vadimkantorov researchers have a freedom to use any kind of implementation and coin a name to it. When GaussianBlur was introduced to torchvision (#2635), the reference was swav project and https://github.com/facebookresearch/swav/blob/d4970c83791c8bd9928ec4efcc25d4fd788c405f/src/multicropdataset.py#L65 where they use opencv, fixed kernel size and sigma. swav historically is earlier than dino and others. I do not know the reason why they switched from one to another implementation (maybe, to kick-out opencv dependency?). Also PIL blur kernel size is limited as far as I remember. As for #5194 (comment) |
Yeah, I guess this comparison is also relevant... My point is that there are plenty of code using legacy non-tensor ops implemented in PIL and OpenCV. For building upon this code, modernization of using more modern torchvision impls is considered (as it probably improves perf and sometimes enable differentiability and GPU usage). It would be nice to document differences of torchvision versions from other existing versions in PIL / OpenCV / scikit-image.
I think this should not be a reason for inviting everyone to re-duplicate effort of testing different versions of PIL's GaussianBlur vs torchvision's GaussianBlur and OpenCV's GaussianBlur vs torchvision's GaussianBlur. |
This is very large question :) |
Well, that's the problem :) It would be nice to start collecting a battery of test examples and small code snippets evaluating these functions with different options (or at least default options?) with different libraries: publish these images, html with all them and diff images. Especially if the ops are implemented slightly differently, these differences in results should be presented even if the root cause of difference isn't diagnosed / researched yet. Otherwise, users do this duplicate effort or just abandon it and have some hard-to-detect bugs in all these teacher-student self-supervised models that are relying on very particular hyper-params of augmentations to be set right. My point is that incomplete, published to docs result of this testing are better than no testing and it being discussed in forums and passed from a PhD student to the next one as dark knowledge :) |
Yeah, implementation details (and bugs) could be as you say the dark knowledge to pass. Like why deeplab models have input of size 513 pixels, https://ppwwyyxx.com/blog/2021/Where-are-Pixels/#Libraries |
Yeah, I'm a big fan of specifying in the docs (also of box_ops) of exactly what the impl is doing and exactly what format of coordinates is expected under big influence of this blog post :) |
Original name:
[docs] F.gaussian_blur should specify relation of kernel_size / sigma to PIL's sigma
📚 The doc issue
A lot of legacy GaussianBlur augmentations like in https://github.com/facebookresearch/dino/blob/d2f3156/utils.py#L36 and in https://github.com/facebookresearch/unbiased-teacher/blob/main/ubteacher/data/transforms/augmentation_impl.py#L20 have used PIL's GaussianBlur which has only a single
radius
parameter. New torchvision'sGaussianBlur
has two parameters: hardkernel_size
and softsigma
. It would be very useful if their semantics is explained in relation to existing/popular/legacy PIL's arguments. This will help in porting the legacy code to new torchvision's native GaussianBlur.For reference, native pillow's implementation: https://github.com/python-pillow/Pillow/blob/95cff6e959bb3c37848158ed2145d49d49806a31/src/libImaging/BoxBlur.c#L286. It also seems that Pillow's implementation is actually a uniform weighted blur, not a true gaussian one
Related question on SO: https://stackoverflow.com/questions/62968174/for-pil-imagefilter-gaussianblur-how-what-kernel-is-used-and-does-the-radius-par which suggests that radius ~= sigma
Related issues:
cc @vfdev-5 @datumbox
The text was updated successfully, but these errors were encountered: