Giter Club home page Giter Club logo

mcc's Introduction

Multiview Compressive Coding for 3D Reconstruction

Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichntenhofer, Georgia Gkioxari

mcc.mp4

This is a PyTorch implementation of Multiview Compressive Coding (MCC):

@article{wu2023multiview,
  author    = {Wu, Chao-Yuan and Johnson, Justin and Malik, Jitendra and Feichtenhofer, Christoph and Gkioxari, Georgia},
  title     = {Multiview Compressive Coding for 3{D} Reconstruction},
  journal   = {arXiv preprint arXiv:2301.08247},
  year      = {2023},
}

Installation

This repo is a modified from the MAE repo. Installation and preparation follow that repo. Please also install PyTorch3D for 3D related funcionality.

pip install h5py omegaconf submitit

Data

Please see DATASET.md for information on data preparation.

Running MCC on CO3D v2

To train an MCC model on the CO3D v2 dataset, please run

python submitit_mcc.py \
    --use_volta32 \
    --job_dir ./output \
    --nodes 4 \
    --co3d_path [path to CO3D dataset] \
    --resume [path to pretrained weights for RGB encoder (optional)] \
    --holdout_categories \
  • With --holdout_categories, we hold out a subset of categories during training, and evaluate on the held out categories only. To train on all categories, please remove the argument.
  • Here we use 4 nodes (machines) by default; Users may use a different value.
  • Optional: The RGB encoder may be initialized by a pre-trained image model. An ImageNet1K-MAE-pretrained model is available [here]. Using a pre-trained model may speed up training but does not affect the final results much.

Running MCC on Hypersim

To train an MCC model on the Hypersim dataset, please run

python submitit_mcc.py \
    --use_volta32 \
    --job_dir ./output \
    --nodes 4 \
    --hypersim_path [path to Hypersim dataset] \
    --resume [path to pretrained weights for RGB encoder (optional)] \
    --use_hypersim \
    --viz_granularity 0.2 --eval_granularity 0.2 \
    --blr 5e-5 \
    --epochs 50 \
    --train_epoch_len_multiplier 3200 \
    --eval_epoch_len_multiplier 200 \
  • Here we additionally specify --use_hypersim for running Hypersim scene reconstruction experiments.
  • We use slightly different hyperparameters to accommodate the scene reconstruction task.

Testing on iPhone captures

To test on iPhone captures, please use the Record3D App on an iPhone to capture an RGB image and the corresonding point cloud (.obj) file. To generate the segmentation mask, we used a private segmentation model; Users may use other tools/models to obtain the mask. Two example captures are available in the demo folder.

To run MCC inference on the example, please use, e.g.,

python demo.py --image demo/quest2.jpg --point_cloud demo/quest2.obj --seg demo/quest2_seg.png \
--checkpoint [path to model checkpoint] \

One may use a checkpoint from the training step above or download a model that is already trained on all CO3D v2 categories [here]. One may set the --score_thresholds argument to specify the score thresholds (More points are shown with a lower threshold, but the predictions might be noisier). The script will generate an html file showing an interactive visualizaion of the MCC output with plotly.

plotly

Acknowledgement

Part of this implementation is based on the MAE codebase. We thank Sasha Sax for help on loading Hypersim and Taskonomy data.

License

Multiview Compressive Coding is released under the CC-BY-NC 4.0.

mcc's People

Contributors

chaoyuaw avatar eltociear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mcc's Issues

MCC reconstruction to input depth cloud

Thank you for your insightful work!

Playing around with the model, I noticed that the model's outputs seem to be normalized in some way, meaning that the reconstruction seems to be stretched or shrunk in some way with respect to the input image.

Is there any straightforward way to convert the point clouds outputted by MCC into the same coordinate space as the input point cloud used for depth?

Using iPhone captures for inference

Very insightful work! I am trying to test this model's inference with RGB-D images captured on Record3D as detailed in the readme. However, it doesn't seem like there is an option to export a Record3D video into both a sequence of jpg images and an obj point cloud. How might I generate examples similar to the oculus and spyro examples in the repository?

Thank you for your help!

Get *.obj file of images generated by DALLE-2

MCC works on images generated from DALL·E 2. Since the generated images do not include depth, we use an off-the-shelf depth prediction model to estimate depth for these images before feeding to MCC.

Hi, may I ask, how to get the *.obj file corresponding to the image generated by DALLE-2?

I've tried:

  • Only *.png depth image, segmentation image and *.pfm image can be obtained by the DPT repo you mentioned.

Question: Could you please add another demo to help solve this problem?

Thanks a lot. :-)

Camera intrinsics

Hi,

Since the paper performs unprojection of RGB-D images to pointclouds, I assume intrinsics are required during the inference. But it seems this project does not rely on that information. Is the underlying assumption here being orthogonal camera?

Thank you!

Cannot be reimplemented

[19:56:29.655498] Loading dataset map (ball)
[19:56:29.940445] Loaded 0 categores for train
[19:56:29.940480] Loaded 1 categores for val
[19:56:29.945834] 1 categories loaded
[19:56:29.945937] containing 495 examples
[19:56:29.946329] 0 categories loaded
[19:56:29.946344] containing 0 examples
[19:56:29.946444] 1 categories loaded
[19:56:29.946513] containing 495 examples
[19:56:29.946606] Start training for 100 epochs
Backend QtAgg is interactive backend. Turning interactive mode on.
[19:56:39.632502] Epoch 0:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

No training data was found.
[19:56:29.940445] Loaded 0 categories for train

1. What I did

dict_keys(['train_known', 'train_unseen', 'test_known', 'test_unseen'])

/home/.../lib/python3.8/site-packages/pytorch3d/implicitron/dataset/json_index_dataset_map_provider_v2.py:327: UserWarning: 
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Some eval batches are missing from the test dataset.
The evaluation results will be incomparable to the
evaluation results calculated on the original dataset.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
  warnings.warn(

Process finished with exit code 0

**Could you please reclaim the data installation? I tried several times but failed. Lots of problems exist. **

Before this, I downloaded the sub-dataset 'ball', then I modified the data dir as shown following json_index_dataset_map_provider_v2:

        self.dataset_root
            ├── <category_0>
            │   ├── <sequence_name_0>
            │   │   ├── depth_masks
            │   │   ├── depths
            │   │   ├── images
            │   │   ├── masks
            │   │   └── pointcloud.ply
            │   ├── <sequence_name_1>
            │   │   ├── depth_masks
            │   │   ├── depths
            │   │   ├── images
            │   │   ├── masks
            │   │   └── pointcloud.ply
            │   ├── ...
            │   ├── <sequence_name_N>
            │   ├── set_lists
            │       ├── set_lists_<subset_name_0>.json
            │       ├── set_lists_<subset_name_1>.json
            │       ├── ...
            │       ├── set_lists_<subset_name_M>.json
            │   ├── eval_batches
            │   │   ├── eval_batches_<subset_name_0>.json
            │   │   ├── eval_batches_<subset_name_1>.json
            │   │   ├── ...
            │   │   ├── eval_batches_<subset_name_M>.json
            │   ├── frame_annotations.jgz
            │   ├── sequence_annotations.jgz
            ├── <category_1>
            ├── ...
            ├── <category_K>

Then the key of the json file is not correct in terms of train, val and test.

2. Suggestion

Could you please give clearer instructions for data preparation?

Thanks a lot!

RuntimeError: Could not infer dtype of NoneType

Thanks for the great work! Im trying to test it on my own data. But got the following error:

File "C:\MCC\demo.py", line 104, in main
seen_rgb = (torch.tensor(rgb).float() / 255)[..., [2, 1, 0]]
RuntimeError: Could not infer dtype of NoneType

Is it the problem of the input obj file? Do you have any idea how to solve it?

Many thanks!

Question about the paper - Occupancy

Thank you for this wonderful work,

I have a question about the model, why do you predict the occupancy of the surface of the object and not the occupancy of a point lying inside the object such as OccNet does.

A query is considered “occupied” (positive) if it is located within radius τ = 0.1 to a ground truth point, and “unoccupied” (negative) otherwise.

And do you think it can be upgraded for thus task in a setting w/o RGB images.

Uknown subset name : fewview_dev

Hi, I downloaded and tried to load the dataset however encountered the following error with pytorch3d using the prepare_co3d.py script provided.
The data was downloaded using the script provided here for single sequence subset using the command python ./co3d/download_dataset.py --download_folder <DOWNLOAD_FOLDER> --single_sequence_subset .
The following is the complete stack trace. Any guidance would be highly appreciated.

Traceback (most recent call last):
  File "/home/aradhya/mcc/MCC/scripts/prepare_co3d.py", line 46, in <module>
    main(args)
  File "/home/aradhya/mcc/MCC/scripts/prepare_co3d.py", line 27, in main
    dataset_map = get_dataset_map(
  File "/home/aradhya/mcc/MCC/scripts/../util/co3d_utils.py", line 49, in get_dataset_map
    dataset_map_provider = JsonIndexDatasetMapProviderV2(
  File "<string>", line 15, in __init__
  File "/home/aradhya/anaconda3/envs/mccenv/lib/python3.9/site-packages/pytorch3d/implicitron/dataset/json_index_dataset_map_provider_v2.py", line 214, in __post_init__
    dataset_map = self._load_category(self.category)
  File "/home/aradhya/anaconda3/envs/mccenv/lib/python3.9/site-packages/pytorch3d/implicitron/dataset/json_index_dataset_map_provider_v2.py", line 264, in _load_category
    raise ValueError(
ValueError: Unknown subset name fewview_dev. Choose one of available subsets: ['manyview_dev_0', 'manyview_dev_1', 'manyview_test_0'].

Type of gpu?

| We train with Adam [29] for 150k iterations with an effective batch size of 512 using 32 GPUs. Training takes∼2.5 days.
Here you mean V100 or A100 exactly? How about the memory size of these gpus you are using?
Please let me know the computation cost of training such model. Thanks.

Question about the paper -- Granularity

First of all, thanks for sharing this amazing work!

I was wondering if you have tried more fine-grained sampling strategy, meaning lowering the granularity when defining the positive and negative samples from the ground truth, and increasing the number of samples / queries?

Currently the granularity $\sigma=0.1$ and you sample 550 points during training. I was wondering if you decrease the $\sigma$ to 0.01 or 0.001, will this create reconstruction with more details and how much more time will it cost? Thanks.

Bug on F1-score calculation

As pointed out in NU-MCC (https://github.com/sail-sg/numcc), there is a bug in MCC's F1-score calculation.

The bug is located in

for i in range(int(np.ceil(predicted_xyz.shape[0] / slice_size))):
predicted_xyz should be gt_xyz.

This bug hurts MCC metrics when the number of predicted points are less than ground truth points. After fixing this bug, higher F1-score can be obtained by setting higher --eval_score_threshold.

RuntimeError: No shared folder available

Traceback (most recent call last):
File "submitit_mcc.py", line 133, in
main()
File "submitit_mcc.py", line 123, in main
args.dist_url = get_init_file().as_uri()
File "submitit_mcc.py", line 48, in get_init_file
os.makedirs(str(get_shared_folder()), exist_ok=True)
File "submitit_mcc.py", line 43, in get_shared_folder
raise RuntimeError("No shared folder available")
RuntimeError: No shared folder available

Inference successed: How to visualize the new 3D reconstructed model to distinguish it with "demo/quest2.obj".

Thanks so much for this inspiring work!
When I execute the cmd below to do inference as you recommended, and there is a new file "demo/output.html" generated.
But could you help to describe how can I visualize the new 3D reconstructed model to distinguish it with "demo/quest2.obj".

python demo.py --image demo/quest2.jpg --point_cloud demo/quest2.obj --seg demo/quest2_seg.png \

--checkpoint [path to model checkpoint]

Problem with libtorch_cuda_cu.so

I tried to check you demo and use conda pytorch3d env but get this problem
Traceback (most recent call last):
File "/home/ubuntu/Philip/MCC/demo.py", line 12, in
from pytorch3d.io.obj_io import load_obj
File "/opt/conda/envs/pytorch3d/lib/python3.9/site-packages/pytorch3d/io/init.py", line 8, in
from .obj_io import load_obj, load_objs_as_meshes, save_obj
File "/opt/conda/envs/pytorch3d/lib/python3.9/site-packages/pytorch3d/io/obj_io.py", line 22, in
from pytorch3d.renderer import TexturesAtlas, TexturesUV
File "/opt/conda/envs/pytorch3d/lib/python3.9/site-packages/pytorch3d/renderer/init.py", line 7, in
from .blending import (
File "/opt/conda/envs/pytorch3d/lib/python3.9/site-packages/pytorch3d/renderer/blending.py", line 10, in
from pytorch3d import _C
ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.