I was trying to apply this model to my own data and not getting good results. I ran t

Got it, thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hov

SUN RGB-D is not in millimeters about vit-lens HOT 4 OPEN

jbrownkramer commented on May 20, 2024

SUN RGB-D is not in millimeters

from vit-lens.

Comments (4)

jbrownkramer commented on May 20, 2024 1

I will look into LanguageBind.

I will say this: I updated the processing on my pipeline to match the circular shift, quantization, and camera intrinsics as the NYU data. The results on our data are still not very good. My suspicion is that SUN RGB-D has no people in it, and the text labels I am trying to match are about the locations of people in the scene.

from vit-lens.

StanLei52 commented on May 20, 2024

Thank you for pointing this out -- it is important to figure this out for a more general depth model. As such, could you please also check LanguageBind and their uploaded NYU-D -- I will look into their preprocessing pipeline instead of following ImageBind if it works on your own data.

from vit-lens.

jbrownkramer commented on May 20, 2024

Below is the transformation pipeline in LanguageBind. The starting format is depth in mm (NOT DISPARITY). I ran their inference example from the git homepage and max_depth is configured to 10. So in summary: read in the data in mm, convert to meters, clamp between .01 and 10 meters. Divide by 10 meters. Resize and center crop to 224, and normalize by OPENAI_DATASET_MEAN, OPENAI_DATASET_STD.

I tried running on the SUN RGB-D versions of the NYUv2 data directly and LanguageBind gave bad outputs. When I did a circular shift (to put it back into mm) it gave good results, so they are doing some preprocessing to convert the NYU data to mm first.

class DepthNorm(nn.Module):
    def __init__(
        self,
        max_depth=0,
        min_depth=0.01,
    ):
        super().__init__()
        self.max_depth = max_depth
        self.min_depth = min_depth
        self.scale = 1000.0  # nyuv2 abs.depth

    def forward(self, image):
        # image = np.array(image)
        depth_img = image / self.scale  # (H, W)   in meters
        depth_img = depth_img.clip(min=self.min_depth)
        if self.max_depth != 0:
            depth_img = depth_img.clip(max=self.max_depth)
            depth_img /= self.max_depth   #  0-1
        else:
            depth_img /= depth_img.max()
        depth_img = torch.from_numpy(depth_img).unsqueeze(0).repeat(3, 1, 1)  # assume image
        return depth_img.to(torch.get_default_dtype())

def get_depth_transform(config):
    config = config.vision_config
    transform = transforms.Compose(
        [
            DepthNorm(max_depth=config.max_depth),
            transforms.Resize(224, interpolation=transforms.InterpolationMode.BICUBIC),
            transforms.CenterCrop(224),
            transforms.Normalize(OPENAI_DATASET_MEAN, OPENAI_DATASET_STD),  # assume image
            # transforms.Normalize((0.5, ), (0.5, ))  # 0-1 to norm distribution
            # transforms.Normalize((0.0418, ), (0.0295, ))  # sun rgb-d  imagebind
            # transforms.Normalize((0.02, ), (0.00295, ))  # nyuv2
        ]
    )
    return transform

from vit-lens.

StanLei52 commented on May 20, 2024

Got it, thanks @jbrownkramer! I will look into this.

from vit-lens.

SUN RGB-D is not in millimeters about vit-lens HOT 4 OPEN

Comments (4)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent