vsitzmann / scene-representation-networks Goto Github PK

Official Pytorch implementation of Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

License: MIT License

Python 100.00%

scene-representation-networks's People

Contributors

Stargazers

Watchers

scene-representation-networks's Issues

specific_observation_idcs option format

Hey,
For comparison purposes, I need to evaluate the reconstruction quality of this network on a specific output view from one specific input view, of each object (ie single image reconstruction). It looks as though the specific_observation_idcs option may be the right way to achieve this? Could you share an example file for this option so I can see the format? I'm assuming it specifies an input id and output id for each object.
Thanks!

How do you obtain the shapenet train/test split?

I am wondering how do you obtain the train/test split on shapenet. Did you do a random split?

One-shot Scene Reconstruction

Hey there,

First of all I'd like to congratulate @vsitzmann for the awesome work. Keep it up!

I'm trying to reconstruct scenes out of one single view with the one shot implementation. I have rendered my own data set and, while scene reconstruction with multi-view training yields decent results (see left gt_comparison below), I am getting nowhere with one-shot reconstruction (see right gt_comparison below) .

Has anyone obtained successful results with one-shot reconstruction? If so, would you care to share how?
I have trained the model for one-shot reconstruction for 100,000 iterations so far.

Can you upload the trained model

Hello,
I have read the paper and I have found that your model requires at least 48GB of memory to train with RTX 6000 GPU. Is that possible to upload the trained model for cars or chairs dataset ?
Thanks !

The original 3D models and rendering tools?

Hi,

Is it also possible to release the original 3D models (e.g. Shapenet chairs) as well as the rendering tools/scripts to generate the training images so that we can also generate new data in the same format by ourselves?

Thanks

How do you run evaluation with unseen images? Currently getting the training images in the reconstruction.

Hey, I am trained the model on my own training data. When I run it on the test dataset, what I get is reconstructions of the training data, not the images in the test dataset. I am using the following options for training:
python train.py --data_root data/cars/train --val_root data/cars/val
--logging_root logs_cars --batch_size 16

And for evaluation:
python test.py
--data_root data/cars/test
--logging_root logs_cars
--num_instances 6210 (number of images in training set)
--checkpoint logs_cars/checkpoints/
--specific_observation_idcs 0

The output I'm getting is reconstructions of images in the training set, in the poses specified in the test set. What I need to get is for the unseen objects in the test set, specific output views predicted from some specific input views. How should I do that?

Questions about the paper: is dataset-specific model parameters necessary?

Hi,

First of all, thx for your work which is a lot of help to me. I'm currently working on using such scene representations as input (may have some modification) for mobile robot policy networks.

I've done some reading on your paper. If I've understood it correctly, for every category of objects/datasets, the methods needs to train a specific model for it, say the latent code z, the mapping function psi, and the nueral redering function theta for each dataset.
I think it's true since for car/chair datasets, you have different pretrained models.
It seems to be the same with other methods in this area, such as GQN.

I'm wondering if it's possible that only the prior initial latent code z needs to be dataset-specific, other networks (the mappings and the rendering networks) can be shared among all type of datasets.

Intuitively, I think the latent code should contain enough prior information, and it would be much much time saving that different types of objects share the same other networks. Since I'm trying to extract representaions for complex scenes composed of all types of objects and I want to extract representations for each of the detected objects, this would be a lot more convinient.

What's the cost of making this assumption? Loss of accuracy?

BTW, I see that you are currently working on compositional SRNs, which is of huge interest to me.
May I ask are you using topological graphs to model such compositional relations? You dont need to answer this question if you mind.

Is it possible to release the instructions of creating the Minecraft datasets?

For rendering scenes like the Minecraft, how do you put the cameras?
We cannot put them on a sphere any more, right?

Thank you very much
https://vsitzmann.github.io/srns/img/minecraft.mp4

Issues with extrinsics

Hello, love your work on this repo.

I have an issue where i use a modified version of stanford render script for my car obj but when i predict on cars with your pre trained model, i dont see any prediction in the gt_compare.

Is this occuring because the coordinate system of blender is not opencv? How do we approach this issue?

output is very different to input

i dont konw why the output is very different to input ( middle:output right:input)
can you answer me ? thanks !

Why car_test has larger resolutions ?

Hi,
May I ask why the car_test has larger resolutions than chairs_test ?

PyTorch3D Camera Convention for Shapenet Chairs / Cars

Hi Vincent,

Thank you very much for sharing your excellent work!

I am trying to render the Shapenet v2 Chairs and Cars data using Pytorch3D cameras. However, I'm unable to find a suitable coordinate transformation for the extrinsics provided in the Pose files.

I tried to follow the convention mentioned in the README to render the point cloud from the ShapeNet (NMR) dataset, but the rendered images do not match the given views. Here's what I did:

Given Pose: camera2world with (+X right, +Y down, +Z into the image plane) [ref]

Required Pose: Pytorch3D world2camera with (+X left, +Y up, +Z into the image plane) [ref]

Attempt:

def srn_to_pytorch3d(srn_pose):
    """pose: 4x4 camera2world matrix with last row as [0, 0, 0, 1]"""

    # Take inverse to go from world2camera
    world2camera_srn = torch.linalg.inv(srn_pose)

    # X and Y axes change sign (convention as described above)
    world2camera_transformed = world2camera_srn @ torch.diag(torch.tensor([-1, -1, 1, 1]))  # X and Y change sign
    R = world2camera_transformed[:3, :3]
    T = world2camera_transformed[:3, 3]

    # Pytorch3D performs right multiplication (X_cam = X_w @ R + T). Therefore pass the transpose of R.
    camera = pytorch3d.renderer.cameras.PerspectiveCameras(R=R.T, T=T)

    return camera

If this approach is incorrect, could you please point me to the mesh / point cloud data that was used to render the views in the Chairs and Cars dataset with the given camera poses?

Thanks for your time!

Naveen.

Intrinsic for custom images

Hello I love what you have built and would like to get camera intrinsically for my own images. Would you kindly tell me how to generate them for custom images?

about inference

Hi, i have a question, if i want to test with my own image, i only have one image. should i provide additional geometry or depth for my image?

Few shot learning is currently crashing

Hello, keep up the great work! I am trying to replicate your results in the paper using few shot learning and the pre trained model given, but I believe the config file is not updated and it crashes due to no validation route specified and there is no parameter called specific samples. Some of the values are in string format when it should be integer. Override embedding crashes due to a parameter not found. Unable to fine tune because Freeze networks crashes.

Thanks for your help. Love your work

bugs in dataset

Hello, After downloading your data from googledrive I got errors when extracting from 'cars_train.zip'

Could you please fix it ? All other files are ok.

How to convert intrinsics.txt and pose txts in COLMAP poses_bounds.npy format?

I apologize if it's a trivial question as I am a beginner in the field. I'd like to know if there's a way to combine the information presented in intrinsics.txt and in the pose txt for each image and convert it in the COLMAP 1x17 vector format (https://github.com/Fyusion/LLFF#using-your-own-poses-without-running-colmap). Thanks in advance

how to visualize the norm images?

can you please guide me how to visualize the norm images? what part of code will do that?

vsitzmann / scene-representation-networks Goto Github PK

scene-representation-networks's People

Contributors

Stargazers

Watchers

Forkers

scene-representation-networks's Issues

Recommend Projects

Recommend Topics

Recommend Org