The justification in the paper for using disparity is "scale normalization". I know t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Alternate depth normalization about vit-lens HOT 4 OPEN

jbrownkramer commented on May 20, 2024

Alternate depth normalization

from vit-lens.

Comments (4)

jbrownkramer commented on May 20, 2024 1

Another comment is that the use of RandomResizedCrop in augmentation during training might largely break the connection between scale and the object being imaged. It might be good to maintain that information by applying the same (224/h) scaling factor during training (where h is the height of the crop). Actually, since w can be very different from h in RandomResizedCrop, something like sqrt((224/h)*(224/w)) might be better since it preserves information about the area of the object.

from vit-lens.

StanLei52 commented on May 20, 2024

Thank you so much for pointing this out and for your insightful suggestion! I will look into this and experiment with this normalization you mentioned in the depth-related experiments, to see whether it yields better performance.

from vit-lens.

StanLei52 commented on May 20, 2024

@jbrownkramer Thank you for your comments. If possible, could you please provide your implementation as you mentioned so that I can find some time later to conduct experiments on this? Thanks.

from vit-lens.

jbrownkramer commented on May 20, 2024

Here you go. You should be able to replace RandomResizedCrop in RGBD_Processor_Train with this. It is untested code, FYI.

import torch
import torchvision.transforms.functional as F
from torchvision.transforms import RandomResizedCrop

class RandomResizedCropAndScale(RandomResizedCrop):
    """
    Crop a random portion of image and resize it to a given size. Scale it by the sqrt of the ratio in areas between the new image and the crop size in the original image, but only apply this scaling to the final channel.
    """

    def __init__(self, size, *args, **kwargs):
        super().__init__(size, *args, **kwargs)

    def forward(self, img):
        """
        Args:
            img (PIL Image or Tensor): Image to be cropped and resized.

        Returns:
            Tensor: Randomly cropped, resized, and scaled image.
        """
        i, j, h, w = self.get_params(img, self.scale, self.ratio)
        cropped_and_resized_img = F.resized_crop(img, i, j, h, w, self.size, self.interpolation, antialias=self.antialias)

        # Convert the cropped and resized image to a tensor if it's not already
        if not isinstance(cropped_and_resized_img, torch.Tensor):
            cropped_and_resized_img = F.to_tensor(cropped_and_resized_img)

        _, height, width = F.get_dimensions(cropped_and_resized_img)
        
        scale_factor = torch.sqrt((height * width) / (h * w))
        
        scaled_img = cropped_and_resized_img.clone()  # Clone to avoid in-place modification issues
        scaled_img[-1, :, :] *= scale_factor  # This applies scale_factor to the last channel


        return scaled_img

You should also be able to replace the __call__ function in RGBD_Processor_Eval with

    def __call__(self, depth):
        # here depth refers to disparity, in torch savefile format
        # note use ToTensor to scale image to [0,1] first
        img = torch.randn((3, 224, 224))

        if depth.ndim == 2:
            depth = depth.unsqueeze(0)
			
	scale = 224/depth.shape[0]

        rgbd = torch.cat([img, depth * scale], dim=0)
        transform_rgbd = self.rgbd_transform(rgbd)
        img = transform_rgbd[0:3, ...]
        depth = transform_rgbd[3:4, ...]
        
        return depth

from vit-lens.

Alternate depth normalization about vit-lens HOT 4 OPEN

Comments (4)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent