vsitzmann / scene-representation-networks Goto Github PK
View Code? Open in Web Editor NEWOfficial Pytorch implementation of Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
License: MIT License
Official Pytorch implementation of Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
License: MIT License
Hey,
For comparison purposes, I need to evaluate the reconstruction quality of this network on a specific output view from one specific input view, of each object (ie single image reconstruction). It looks as though the specific_observation_idcs option may be the right way to achieve this? Could you share an example file for this option so I can see the format? I'm assuming it specifies an input id and output id for each object.
Thanks!
I am wondering how do you obtain the train/test split on shapenet. Did you do a random split?
Hey there,
First of all I'd like to congratulate @vsitzmann for the awesome work. Keep it up!
I'm trying to reconstruct scenes out of one single view with the one shot implementation. I have rendered my own data set and, while scene reconstruction with multi-view training yields decent results (see left gt_comparison below), I am getting nowhere with one-shot reconstruction (see right gt_comparison below) .
Has anyone obtained successful results with one-shot reconstruction? If so, would you care to share how?
I have trained the model for one-shot reconstruction for 100,000 iterations so far.
Hello,
I have read the paper and I have found that your model requires at least 48GB of memory to train with RTX 6000 GPU. Is that possible to upload the trained model for cars or chairs dataset ?
Thanks !
Hi,
Is it also possible to release the original 3D models (e.g. Shapenet chairs) as well as the rendering tools/scripts to generate the training images so that we can also generate new data in the same format by ourselves?
Thanks
Hey, I am trained the model on my own training data. When I run it on the test dataset, what I get is reconstructions of the training data, not the images in the test dataset. I am using the following options for training:
python train.py --data_root data/cars/train --val_root data/cars/val
--logging_root logs_cars --batch_size 16
And for evaluation:
python test.py
--data_root data/cars/test
--logging_root logs_cars
--num_instances 6210 (number of images in training set)
--checkpoint logs_cars/checkpoints/
--specific_observation_idcs 0
The output I'm getting is reconstructions of images in the training set, in the poses specified in the test set. What I need to get is for the unseen objects in the test set, specific output views predicted from some specific input views. How should I do that?
Hi,
First of all, thx for your work which is a lot of help to me. I'm currently working on using such scene representations as input (may have some modification) for mobile robot policy networks.
I've done some reading on your paper. If I've understood it correctly, for every category of objects/datasets, the methods needs to train a specific model for it, say the latent code z, the mapping function psi, and the nueral redering function theta for each dataset.
I think it's true since for car/chair datasets, you have different pretrained models.
It seems to be the same with other methods in this area, such as GQN.
I'm wondering if it's possible that only the prior initial latent code z needs to be dataset-specific, other networks (the mappings and the rendering networks) can be shared among all type of datasets.
Intuitively, I think the latent code should contain enough prior information, and it would be much much time saving that different types of objects share the same other networks. Since I'm trying to extract representaions for complex scenes composed of all types of objects and I want to extract representations for each of the detected objects, this would be a lot more convinient.
What's the cost of making this assumption? Loss of accuracy?
BTW, I see that you are currently working on compositional SRNs, which is of huge interest to me.
May I ask are you using topological graphs to model such compositional relations? You dont need to answer this question if you mind.
For rendering scenes like the Minecraft, how do you put the cameras?
We cannot put them on a sphere any more, right?
Thank you very much
https://vsitzmann.github.io/srns/img/minecraft.mp4
Hello, love your work on this repo.
I have an issue where i use a modified version of stanford render script for my car obj but when i predict on cars with your pre trained model, i dont see any prediction in the gt_compare.
Is this occuring because the coordinate system of blender is not opencv? How do we approach this issue?
Hi,
May I ask why the car_test has larger resolutions than chairs_test ?
Hi Vincent,
Thank you very much for sharing your excellent work!
I am trying to render the Shapenet v2 Chairs and Cars data using Pytorch3D cameras. However, I'm unable to find a suitable coordinate transformation for the extrinsics provided in the Pose files.
I tried to follow the convention mentioned in the README to render the point cloud from the ShapeNet (NMR) dataset, but the rendered images do not match the given views. Here's what I did:
Given Pose: camera2world with (+X right, +Y down, +Z into the image plane) [ref]
Required Pose: Pytorch3D world2camera with (+X left, +Y up, +Z into the image plane) [ref]
Attempt:
def srn_to_pytorch3d(srn_pose):
"""pose: 4x4 camera2world matrix with last row as [0, 0, 0, 1]"""
# Take inverse to go from world2camera
world2camera_srn = torch.linalg.inv(srn_pose)
# X and Y axes change sign (convention as described above)
world2camera_transformed = world2camera_srn @ torch.diag(torch.tensor([-1, -1, 1, 1])) # X and Y change sign
R = world2camera_transformed[:3, :3]
T = world2camera_transformed[:3, 3]
# Pytorch3D performs right multiplication (X_cam = X_w @ R + T). Therefore pass the transpose of R.
camera = pytorch3d.renderer.cameras.PerspectiveCameras(R=R.T, T=T)
return camera
If this approach is incorrect, could you please point me to the mesh / point cloud data that was used to render the views in the Chairs and Cars dataset with the given camera poses?
Thanks for your time!
Naveen.
Hello I love what you have built and would like to get camera intrinsically for my own images. Would you kindly tell me how to generate them for custom images?
Hi, i have a question, if i want to test with my own image, i only have one image. should i provide additional geometry or depth for my image?
Hello, keep up the great work! I am trying to replicate your results in the paper using few shot learning and the pre trained model given, but I believe the config file is not updated and it crashes due to no validation route specified and there is no parameter called specific samples. Some of the values are in string format when it should be integer. Override embedding crashes due to a parameter not found. Unable to fine tune because Freeze networks crashes.
Thanks for your help. Love your work
I apologize if it's a trivial question as I am a beginner in the field. I'd like to know if there's a way to combine the information presented in intrinsics.txt and in the pose txt for each image and convert it in the COLMAP 1x17 vector format (https://github.com/Fyusion/LLFF#using-your-own-poses-without-running-colmap). Thanks in advance
can you please guide me how to visualize the norm images? what part of code will do that?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.