Comments (4)
Another comment is that the use of RandomResizedCrop in augmentation during training might largely break the connection between scale and the object being imaged. It might be good to maintain that information by applying the same (224/h) scaling factor during training (where h is the height of the crop). Actually, since w can be very different from h in RandomResizedCrop, something like sqrt((224/h)*(224/w)) might be better since it preserves information about the area of the object.
from vit-lens.
Thank you so much for pointing this out and for your insightful suggestion! I will look into this and experiment with this normalization you mentioned in the depth-related experiments, to see whether it yields better performance.
from vit-lens.
@jbrownkramer Thank you for your comments. If possible, could you please provide your implementation as you mentioned so that I can find some time later to conduct experiments on this? Thanks.
from vit-lens.
Here you go. You should be able to replace RandomResizedCrop in RGBD_Processor_Train with this. It is untested code, FYI.
import torch
import torchvision.transforms.functional as F
from torchvision.transforms import RandomResizedCrop
class RandomResizedCropAndScale(RandomResizedCrop):
"""
Crop a random portion of image and resize it to a given size. Scale it by the sqrt of the ratio in areas between the new image and the crop size in the original image, but only apply this scaling to the final channel.
"""
def __init__(self, size, *args, **kwargs):
super().__init__(size, *args, **kwargs)
def forward(self, img):
"""
Args:
img (PIL Image or Tensor): Image to be cropped and resized.
Returns:
Tensor: Randomly cropped, resized, and scaled image.
"""
i, j, h, w = self.get_params(img, self.scale, self.ratio)
cropped_and_resized_img = F.resized_crop(img, i, j, h, w, self.size, self.interpolation, antialias=self.antialias)
# Convert the cropped and resized image to a tensor if it's not already
if not isinstance(cropped_and_resized_img, torch.Tensor):
cropped_and_resized_img = F.to_tensor(cropped_and_resized_img)
_, height, width = F.get_dimensions(cropped_and_resized_img)
scale_factor = torch.sqrt((height * width) / (h * w))
scaled_img = cropped_and_resized_img.clone() # Clone to avoid in-place modification issues
scaled_img[-1, :, :] *= scale_factor # This applies scale_factor to the last channel
return scaled_img
You should also be able to replace the __call__
function in RGBD_Processor_Eval with
def __call__(self, depth):
# here depth refers to disparity, in torch savefile format
# note use ToTensor to scale image to [0,1] first
img = torch.randn((3, 224, 224))
if depth.ndim == 2:
depth = depth.unsqueeze(0)
scale = 224/depth.shape[0]
rgbd = torch.cat([img, depth * scale], dim=0)
transform_rgbd = self.rgbd_transform(rgbd)
img = transform_rgbd[0:3, ...]
depth = transform_rgbd[3:4, ...]
return depth
from vit-lens.
Related Issues (13)
- What kind of textual prompts do you use during the training period? HOT 2
- Why use the cross-attention instead of only self-attention when implementing perceiver layers? HOT 2
- Training code or training parameter configurations HOT 2
- 点云和文本输出结果不对 HOT 1
- Training Time and GPU usages HOT 4
- Something about Training Methodologies and Experimental Approaches for Video Data HOT 1
- Reproducing NYUv2 Results HOT 2
- plug in problem HOT 5
- SUN RGB-D is not in millimeters HOT 4
- Can not load eeg ckpt HOT 2
- InstructBLIP and SEED Implementation HOT 2
- reproduce evaluation results HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vit-lens.