Giter Club home page Giter Club logo

Comments (4)

jbrownkramer avatar jbrownkramer commented on May 20, 2024 1

Another comment is that the use of RandomResizedCrop in augmentation during training might largely break the connection between scale and the object being imaged. It might be good to maintain that information by applying the same (224/h) scaling factor during training (where h is the height of the crop). Actually, since w can be very different from h in RandomResizedCrop, something like sqrt((224/h)*(224/w)) might be better since it preserves information about the area of the object.

from vit-lens.

StanLei52 avatar StanLei52 commented on May 20, 2024

Thank you so much for pointing this out and for your insightful suggestion! I will look into this and experiment with this normalization you mentioned in the depth-related experiments, to see whether it yields better performance.

from vit-lens.

StanLei52 avatar StanLei52 commented on May 20, 2024

@jbrownkramer Thank you for your comments. If possible, could you please provide your implementation as you mentioned so that I can find some time later to conduct experiments on this? Thanks.

from vit-lens.

jbrownkramer avatar jbrownkramer commented on May 20, 2024

Here you go. You should be able to replace RandomResizedCrop in RGBD_Processor_Train with this. It is untested code, FYI.

import torch
import torchvision.transforms.functional as F
from torchvision.transforms import RandomResizedCrop

class RandomResizedCropAndScale(RandomResizedCrop):
    """
    Crop a random portion of image and resize it to a given size. Scale it by the sqrt of the ratio in areas between the new image and the crop size in the original image, but only apply this scaling to the final channel.
    """

    def __init__(self, size, *args, **kwargs):
        super().__init__(size, *args, **kwargs)

    def forward(self, img):
        """
        Args:
            img (PIL Image or Tensor): Image to be cropped and resized.

        Returns:
            Tensor: Randomly cropped, resized, and scaled image.
        """
        i, j, h, w = self.get_params(img, self.scale, self.ratio)
        cropped_and_resized_img = F.resized_crop(img, i, j, h, w, self.size, self.interpolation, antialias=self.antialias)

        # Convert the cropped and resized image to a tensor if it's not already
        if not isinstance(cropped_and_resized_img, torch.Tensor):
            cropped_and_resized_img = F.to_tensor(cropped_and_resized_img)

        _, height, width = F.get_dimensions(cropped_and_resized_img)
        
        scale_factor = torch.sqrt((height * width) / (h * w))
        
        scaled_img = cropped_and_resized_img.clone()  # Clone to avoid in-place modification issues
        scaled_img[-1, :, :] *= scale_factor  # This applies scale_factor to the last channel


        return scaled_img

You should also be able to replace the __call__ function in RGBD_Processor_Eval with

    def __call__(self, depth):
        # here depth refers to disparity, in torch savefile format
        # note use ToTensor to scale image to [0,1] first
        img = torch.randn((3, 224, 224))

        if depth.ndim == 2:
            depth = depth.unsqueeze(0)
			
	scale = 224/depth.shape[0]

        rgbd = torch.cat([img, depth * scale], dim=0)
        transform_rgbd = self.rgbd_transform(rgbd)
        img = transform_rgbd[0:3, ...]
        depth = transform_rgbd[3:4, ...]
        
        return depth

from vit-lens.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.