Giter Club home page Giter Club logo

cad-estate's Introduction

This is not an officially supported Google product.

CAD-Estate dataset: 3D object and room layout annotations on RGB videos of real estate scenes

Current state-of-the-art methods for 3D scene understanding are driven by large annotated datasets. To address this, we propose CAD-Estate, a large dataset of complex multi-object RGB videos, each annotated with a globally-consistent 3D representation of its objects, as well as with a room layout, consisting of structural elements in 3D, such as wall, floor, and ceiling.

We annotate each object with a CAD model from a database, and place it in the 3D coordinate frame of the scene with a 9-DoF pose transformation. Our method [1] works on commonly-available RGB videos, without requiring a depth sensor. Many steps are performed automatically, and the tasks performed by humans are simple, well-specified, and require only limited reasoning in 3D. This makes them feasible for crowd-sourcing and has allowed us to construct a large-scale dataset. CAD-Estate offers 101K instances of 12K unique CAD models placed in the 3D representations of 20K videos. The videos of CAD-Estate offer wide complex views of real estate properties. They pose a difficult challenge for automatic scene understanding methods, as they contain numerous objects in each frame, many of which are far from the camera and thus appear small. In comparison to Scan2CAD, the largest existing dataset with CAD model annotations on real scenes, CAD-Estate has 7x more instances and 4x more unique CAD models.

We produce generic 3D room layouts from 2D segmentation masks, which are easy to annotate for humans. Our method [2], automatically reconstructs 3D plane equations and spatial extents for the structural elements from the annotations, and connects adjacent elements at the appropriate contact edges. CAD-Estate offers room layouts for 2246 videos. The videos contain complex topologies, with multiple rooms connected by open doors, multiple floors connected by stairs, and generic geometry with slanted structural elements. Our automatic quality control procedure guarantees high quality of the resulting 3D room layouts.

This dataset contains both datasets that are described in [1], and [2].


Examples of objects dataset [1]


Examples of layouts dataset [2]

How to use the dataset

You need to first download the dataset and the accompanying source code, following the instructions here. The text below assumes that the code lives in ${WORKDIR}/cad_estate and the dataset in ${WORKDIR}/cad_estate/data. Please set the environmental variable WORKDIR first, according to the instructions.

You can visualize individual scenes with included Jupyter notebooks: objects_notebook.ipynb and room_structure_notebook.ipynb. To start a Jupyter kernel for them, use:

cd ${WORKDIR}/cad_estate/src
jupyter notebook

The kernel requires a CUDA capable GPU.

There are also two PyTorch dataset classes for reading video frames and their object or room structure annotations. You can find more details in the source code

Finally, this file describes the structure of the CAD-Estate annotation files.

How to cite

If you use the object annotations, please cite [1,3,4]. CAD-Estate contains object annotations [1] that align ShapeNet [3] models over RealEstate10K videos[4]. If you use the 3D room layouts, please cite [2,4]. CAD-Estate contains 3D room layouts [2] over RealEstate10K videos [4].

[1] K.-K. Maninis, S. Popov, M. Nießner, and V. Ferrari. CAD-Estate: Large-scale CAD Model Annotation in RGB Videos. In ICCV, 2023 (to appear).
[2] D. Rozumnyi, S. Popov, K.-K. Maninis, M. Nießner, V. Ferrari. Estimating Generic 3D Room Structures from 2D Annotations. In NeurIPS, 2023.
[3] A. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and Fisher Yu. ShapeNet: An Information-Rich 3D Model Repository. arXiv preprint, 2015.
[4] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. Stereo Magnification: Learning view synthesis using multiplane images. In SIGGRAPH, 2018.

cad-estate's People

Contributors

kmaninis avatar rozumden avatar vitaminsp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cad-estate's Issues

Code release

I wonder if the algorithm described in the paper will be released too?

When will you release the object annotation?

Hi, thanks for your excellent work! Your dataset will definitely promote the development of indoor scene reconstruction. I'm wondering when will you release the object annotation. Will it be released by July?

Failed to download the videos

Thanks for you great efforts in releasing the dataset!

I was able to download and extract the CAD-Estate annotations.
But I ran into the following issues when downloading the videos

$ PYTHONPATH=src python -m cad_estate.download_and_extract_frames --cad_estate_dir="$(realpath data/)" --skip_extract 
Loading frame timestamps:   0%|                                                                                                                            | 0/21452 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/localhome/qiruiw/miniconda3/envs/estate/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/localhome/qiruiw/miniconda3/envs/estate/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/cs/3dlg-project/3dlg-hcvc/rlsd/data/cad_estate/src/cad_estate/download_and_extract_frames.py", line 267, in <module>
    main()
  File "/cs/3dlg-project/3dlg-hcvc/rlsd/data/cad_estate/src/cad_estate/download_and_extract_frames.py", line 221, in main
    video_timestamps = [
  File "/cs/3dlg-project/3dlg-hcvc/rlsd/data/cad_estate/src/cad_estate/download_and_extract_frames.py", line 222, in <listcomp>
    get_video_timestamps(v)
  File "/cs/3dlg-project/3dlg-hcvc/rlsd/data/cad_estate/src/cad_estate/download_and_extract_frames.py", line 135, in get_video_timestamps
    video_id = ann_json["video_id"]
KeyError: 'video_id'
Loading frame timestamps:   0%|                                                                                                                            | 0/21452 [00:00<?, ?it/s]

It seems that the annotation json files don't contain the required field "video_id".
Can you help check it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.