Giter Club home page Giter Club logo

scene-representation-networks's People

Contributors

kailas-v avatar vsitzmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scene-representation-networks's Issues

specific_observation_idcs option format

Hey,
For comparison purposes, I need to evaluate the reconstruction quality of this network on a specific output view from one specific input view, of each object (ie single image reconstruction). It looks as though the specific_observation_idcs option may be the right way to achieve this? Could you share an example file for this option so I can see the format? I'm assuming it specifies an input id and output id for each object.
Thanks!

One-shot Scene Reconstruction

Hey there,

First of all I'd like to congratulate @vsitzmann for the awesome work. Keep it up!

I'm trying to reconstruct scenes out of one single view with the one shot implementation. I have rendered my own data set and, while scene reconstruction with multi-view training yields decent results (see left gt_comparison below), I am getting nowhere with one-shot reconstruction (see right gt_comparison below) .

000026 000022

Has anyone obtained successful results with one-shot reconstruction? If so, would you care to share how?
I have trained the model for one-shot reconstruction for 100,000 iterations so far.

Can you upload the trained model

Hello,
I have read the paper and I have found that your model requires at least 48GB of memory to train with RTX 6000 GPU. Is that possible to upload the trained model for cars or chairs dataset ?
Thanks !

The original 3D models and rendering tools?

Hi,

Is it also possible to release the original 3D models (e.g. Shapenet chairs) as well as the rendering tools/scripts to generate the training images so that we can also generate new data in the same format by ourselves?

Thanks

How do you run evaluation with unseen images? Currently getting the training images in the reconstruction.

Hey, I am trained the model on my own training data. When I run it on the test dataset, what I get is reconstructions of the training data, not the images in the test dataset. I am using the following options for training:
python train.py --data_root data/cars/train --val_root data/cars/val
--logging_root logs_cars --batch_size 16

And for evaluation:
python test.py
--data_root data/cars/test
--logging_root logs_cars
--num_instances 6210 (number of images in training set)
--checkpoint logs_cars/checkpoints/
--specific_observation_idcs 0

The output I'm getting is reconstructions of images in the training set, in the poses specified in the test set. What I need to get is for the unseen objects in the test set, specific output views predicted from some specific input views. How should I do that?

Questions about the paper: is dataset-specific model parameters necessary?

Hi,

First of all, thx for your work which is a lot of help to me. I'm currently working on using such scene representations as input (may have some modification) for mobile robot policy networks.

I've done some reading on your paper. If I've understood it correctly, for every category of objects/datasets, the methods needs to train a specific model for it, say the latent code z, the mapping function psi, and the nueral redering function theta for each dataset.
I think it's true since for car/chair datasets, you have different pretrained models.
It seems to be the same with other methods in this area, such as GQN.

I'm wondering if it's possible that only the prior initial latent code z needs to be dataset-specific, other networks (the mappings and the rendering networks) can be shared among all type of datasets.

Intuitively, I think the latent code should contain enough prior information, and it would be much much time saving that different types of objects share the same other networks. Since I'm trying to extract representaions for complex scenes composed of all types of objects and I want to extract representations for each of the detected objects, this would be a lot more convinient.

What's the cost of making this assumption? Loss of accuracy?

BTW, I see that you are currently working on compositional SRNs, which is of huge interest to me.
May I ask are you using topological graphs to model such compositional relations? You dont need to answer this question if you mind.

Issues with extrinsics

Hello, love your work on this repo.

I have an issue where i use a modified version of stanford render script for my car obj but when i predict on cars with your pre trained model, i dont see any prediction in the gt_compare.

Is this occuring because the coordinate system of blender is not opencv? How do we approach this issue?

PyTorch3D Camera Convention for Shapenet Chairs / Cars

Hi Vincent,

Thank you very much for sharing your excellent work!

I am trying to render the Shapenet v2 Chairs and Cars data using Pytorch3D cameras. However, I'm unable to find a suitable coordinate transformation for the extrinsics provided in the Pose files.

I tried to follow the convention mentioned in the README to render the point cloud from the ShapeNet (NMR) dataset, but the rendered images do not match the given views. Here's what I did:

Given Pose: camera2world with (+X right, +Y down, +Z into the image plane) [ref]

Required Pose: Pytorch3D world2camera with (+X left, +Y up, +Z into the image plane) [ref]

Attempt:

def srn_to_pytorch3d(srn_pose):
    """pose: 4x4 camera2world matrix with last row as [0, 0, 0, 1]"""

    # Take inverse to go from world2camera
    world2camera_srn = torch.linalg.inv(srn_pose)

    # X and Y axes change sign (convention as described above)
    world2camera_transformed = world2camera_srn @ torch.diag(torch.tensor([-1, -1, 1, 1]))  # X and Y change sign
    R = world2camera_transformed[:3, :3]
    T = world2camera_transformed[:3, 3]

    # Pytorch3D performs right multiplication (X_cam = X_w @ R + T). Therefore pass the transpose of R.
    camera = pytorch3d.renderer.cameras.PerspectiveCameras(R=R.T, T=T)

    return camera

If this approach is incorrect, could you please point me to the mesh / point cloud data that was used to render the views in the Chairs and Cars dataset with the given camera poses?

Thanks for your time!

Naveen.

Intrinsic for custom images

Hello I love what you have built and would like to get camera intrinsically for my own images. Would you kindly tell me how to generate them for custom images?

about inference

Hi, i have a question, if i want to test with my own image, i only have one image. should i provide additional geometry or depth for my image?

Few shot learning is currently crashing

Hello, keep up the great work! I am trying to replicate your results in the paper using few shot learning and the pre trained model given, but I believe the config file is not updated and it crashes due to no validation route specified and there is no parameter called specific samples. Some of the values are in string format when it should be integer. Override embedding crashes due to a parameter not found. Unable to fine tune because Freeze networks crashes.

Thanks for your help. Love your work

bugs in dataset

Hello, After downloading your data from googledrive I got errors when extracting from 'cars_train.zip'
image
Could you please fix it ? All other files are ok.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.