Question Regarding Image Downscaling Method #3263

wzy-99 · 2024-06-26T11:37:29Z

nerfstudio/nerfstudio/models/splatfacto.py

Line 83 in 9b3cbc7

def resize_image(image: torch.Tensor, d: int):

Why was the following method chosen for downscaling images instead of directly using F.resize?

def resize_image(image: torch.Tensor, d: int):
    """
    Downscale images using the same 'area' method in opencv

    :param image shape [H, W, C]
    :param d downscale factor (must be 2, 4, 8, etc.)

    return downscaled image in shape [H//d, W//d, C]
    """
    import torch.nn.functional as tf

    image = image.to(torch.float32)
    weight = (1.0 / (d * d)) * torch.ones((1, 1, d, d), dtype=torch.float32, device=image.device)
    return tf.conv2d(image.permute(2, 0, 1)[:, None, ...], weight, stride=d).squeeze(1).permute(1, 2, 0)

My concern is that this method may lead to misaligned coordinates. For instance, if we input an image of size 19x19 and downscale it by a factor of 4, the last 3 pixels would be left empty, whereas ideally, these 3 pixels should be evenly distributed in one row.

nerfstudio/nerfstudio/data/dataparsers/colmap_dataparser.py

Line 460 in 9b3cbc7

def _downscale_images(

Additionally, I noticed in another part of the code, linear interpolation (FFMPEG default) is used for image downsampling. Therefore, for code consistency, I believe the same interpolation method should be used during dataset preprocessing and training phase downsampling.

ginazhouhuiwu · 2024-06-26T20:05:31Z

Actually I think a while back we wanted to upgrade this part to support downscaling better (not just by a constant), so thanks for bringing it up! I would be happy to look into this and you are more than welcome to make a PR if you want to address this feature immediately.

jb-ye · 2024-07-11T17:47:13Z

Interpolation based downscaling is known to have a detrimental effect on training GS if the downscale factor is greater than 2, because they are not antialiased. The reason I use convolution is because this is antialiased and differentiable.

Yes, the current method only supports downscale factor to be 2, 4, 8, etc. But this is not really a major issue in coarse-to-fine training.

jb-ye · 2024-07-11T17:52:48Z

Regarding your concerns about "misaligned coordinates": in gsplat library, we use the convention of graphics/calibration literature rather than opencv convention. That is, the top-left pixel of image represents the color at 2D coordinate (0.5, 0.5). See related discussion here.

Basically under this convention (rather than the one commonly used in opencv), we won't have misaligned coordinates.

wzy-99 · 2024-07-12T08:41:52Z

Hi, @jb-ye,

I understand the 0.5 pixel problem you mentioned.

But actually my problem is not that.

Here I will give you a detail explain of my consideration.

Assume:

an image of size 19x19
downscale factor of 4
so the stride is 4

the original method

def resize_image(image: torch.Tensor, d: int):
    """
    Downscale images using the same 'area' method in opencv

    :param image shape [H, W, C]
    :param d downscale factor (must be 2, 4, 8, etc.)

    return downscaled image in shape [H//d, W//d, C]
    """
    import torch.nn.functional as tf

    image = image.to(torch.float32)
    weight = (1.0 / (d * d)) * torch.ones((1, 1, d, d), dtype=torch.float32, device=image.device)
    return tf.conv2d(image.permute(2, 0, 1)[:, None, ...], weight, stride=d).squeeze(1).permute(1, 2, 0)

(This is a 19 * 19 sized image, with each colored square representing a convolutional region. I only demonstrated one row as an example here)

You can see the 3 unsampled pixels is biased in the end of the first row, which caused misalign.

the better method

If we evenly scatter the 3 pixels across a row, the misalign will be reduced.

jb-ye · 2024-07-12T18:37:30Z

@wzy-99 Assuming the principal point of original resolution is at (9.5, 9.5) (the center of original image). When we resize image, we simply multiply the scale factor on to the principal point, which is (2.375, 2.375). This is the convention we use in nerfstudio and most other calibration library (e.g. colmap). Note that this also means the rescaled PP might not be necessarily the center unless the image resolution is multiplies of 4.

In the original method, the right most columns are cut off by design. We resize 19x19 image to 4x4 image with principle point being (2.375, 2.375) where 2.375 = 9.5 / 4. The resulting downscale image is consistent with the choice how to scale principle point. We end up with a 4x4 image with principal point at (2.375, 2.375. Note PP shifts to right from the center of image, because of the aforementioned cut-off.

In your method, the pixels on the first row in downsampled image represents the 4x4 color region centering at
2, 7, 12, 17 (instead of 2, 6, 10, 14). The resulting scaling factor is not exactly 4, but 5. If we downscale the image as your suggestion, the principal point after scaling should be 1.9 = 9.5 / 5 instead of 2.375.

In short, what really matters is to make sure the way we scale principal point is consistent with the way we scale image.

wzy-99 · 2024-07-13T08:38:56Z

Yes. You are right. I have made a mistake, that is I ignored the pixel plane will be also shifted.

As we know, when render a 4*4 image with cx=2, cy=2, the pixel plane is just like the red box area.

But following the below code,

    def rescale_output_resolution(
        self,
        scaling_factor: Union[Shaped[Tensor, "*num_cameras"], Shaped[Tensor, "*num_cameras 1"], float, int],
        scale_rounding_mode: str = "floor",
    ) -> None:
        """Rescale the output resolution of the cameras.

        Args:
            scaling_factor: Scaling factor to apply to the output resolution.
            scale_rounding_mode: round down or round up when calculating the scaled image height and width
        """
        if isinstance(scaling_factor, (float, int)):
            scaling_factor = torch.tensor([scaling_factor]).to(self.device).broadcast_to((self.cx.shape))
        elif isinstance(scaling_factor, torch.Tensor) and scaling_factor.shape == self.shape:
            scaling_factor = scaling_factor.unsqueeze(-1)
        elif isinstance(scaling_factor, torch.Tensor) and scaling_factor.shape == (*self.shape, 1):
            pass
        else:
            raise ValueError(
                f"Scaling factor must be a float, int, or a tensor of shape {self.shape} or {(*self.shape, 1)}."
            )

        self.fx = self.fx * scaling_factor
        self.fy = self.fy * scaling_factor
        self.cx = self.cx * scaling_factor
        self.cy = self.cy * scaling_factor
        if scale_rounding_mode == "floor":
            self.height = (self.height * scaling_factor).to(torch.int64)
            self.width = (self.width * scaling_factor).to(torch.int64)
        elif scale_rounding_mode == "round":
            self.height = torch.floor(0.5 + (self.height * scaling_factor)).to(torch.int64)
            self.width = torch.floor(0.5 + (self.width * scaling_factor)).to(torch.int64)
        elif scale_rounding_mode == "ceil":
            self.height = torch.ceil(self.height * scaling_factor).to(torch.int64)
            self.width = torch.ceil(self.width * scaling_factor).to(torch.int64)
        else:
            raise ValueError("Scale rounding mode must be 'floor', 'round' or 'ceil'.")

we will get cx=2.375, cy=2.375, this will shift the red box to the blue one.

And the blue box is really same to the downsampled GT from the convolutional method.

Thank you very much!!!🌹

ginazhouhuiwu added enhancement New feature or request good first issue Good for newcomers labels Jun 26, 2024

JTStephens18 mentioned this issue Jul 9, 2024

Upgraded resize_image function to support interpolation #3297

Closed

jb-ye closed this as completed Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question Regarding Image Downscaling Method #3263

Question Regarding Image Downscaling Method #3263

wzy-99 commented Jun 26, 2024

ginazhouhuiwu commented Jun 26, 2024 •

edited

Loading

jb-ye commented Jul 11, 2024 •

edited

Loading

jb-ye commented Jul 11, 2024 •

edited

Loading

wzy-99 commented Jul 12, 2024

jb-ye commented Jul 12, 2024 •

edited

Loading

wzy-99 commented Jul 13, 2024

Question Regarding Image Downscaling Method #3263

Question Regarding Image Downscaling Method #3263

Comments

wzy-99 commented Jun 26, 2024

ginazhouhuiwu commented Jun 26, 2024 • edited Loading

jb-ye commented Jul 11, 2024 • edited Loading

jb-ye commented Jul 11, 2024 • edited Loading

wzy-99 commented Jul 12, 2024

the original method

the better method

jb-ye commented Jul 12, 2024 • edited Loading

wzy-99 commented Jul 13, 2024

ginazhouhuiwu commented Jun 26, 2024 •

edited

Loading

jb-ye commented Jul 11, 2024 •

edited

Loading

jb-ye commented Jul 11, 2024 •

edited

Loading

jb-ye commented Jul 12, 2024 •

edited

Loading