Giter Club home page Giter Club logo

noncuboidroom's Introduction

NonCuboidRoom

Paper

Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

Cheng Yang*, Jia Zheng*, Xili Dai, Rui Tang, Yi Ma, Xiaojun Yuan.

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022

[arXiv] [Paper] [Supplementary Material]

(*: Equal contribution)

Installation

The code is tested with Ubuntu 16.04, PyTorch v1.5, CUDA 10.1 and cuDNN v7.6.

# create conda env
conda create -n layout python=3.6
# activate conda env
conda activate layout
# install pytorch
conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=10.1 -c pytorch
# install dependencies
pip install -r requirements.txt

Data Preparation

Structured3D Dataset

Please download Structured3D dataset and our processed 2D line annotations. The directory structure should look like:

data
└── Structured3D
    │── Structured3D
    │   ├── scene_00000
    │   ├── scene_00001
    │   ├── scene_00002
    │   └── ...
    └── line_annotations.json

SUN RGB-D Dataset

Please download SUN RGB-D dataset, our processed 2D line annotation for SUN RGB-D dataset, and layout annotations of NYUv2 303 dataset. The directory structure should look like:

data
└── SUNRGBD
    │── SUNRGBD
    │    ├── kv1
    │    ├── kv2
    │    ├── realsense
    │    └── xtion
    │── sunrgbd_train.json      // our extracted 2D line annotations of SUN RGB-D train set
    │── sunrgbd_test.json       // our extracted 2D line annotations of SUN RGB-D test set
    └── nyu303_layout_test.npz  // 2D ground truth layout annotations provided by NYUv2 303 dataset

Pre-trained Models

You can download our pre-trained models here:

  • The model trained on Structured3D dataset.
  • The model trained on SUN RGB-D dataset and NYUv2 303 dataset.

Structured3D Dataset

To train the model on the Structured3D dataset, run this command:

python train.py --model_name s3d --data Structured3D

To evaluate the model on the Structured3D dataset, run this command:

python test.py --pretrained DIR --data Structured3D

NYUv2 303 Dataset

To train the model on the SUN RGB-D dataset and NYUv2 303 dataset, run this command:

# first fine-tune the model on the SUN RGB-D dataset
python train.py --model_name sunrgbd --data SUNRGBD --pretrained Structure3D_DIR --split all --lr_step []
# Then fine-tune the model on the NYUv2 subset
python train.py --model_name nyu --data SUNRGBD --pretrained SUNRGBD_DIR --split nyu --lr_step [] --epochs 10

To evaluate the model on the NYUv2 303 dataset, run this command:

python test.py --pretrained DIR --data NYU303

Inference on the customized data

To predict the results of customized images, run this command:

python test.py --pretrained DIR --data CUSTOM

Citation

@inproceedings{NonCuboidRoom,
  title     = {Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image},
  author    = {Cheng Yang and
              Jia Zheng and
              Xili Dai and
              Rui Tang and
              Yi Ma and
              Xiaojun Yuan},
  booktitle = {WACV},
  year      = {2022}
}

LICENSE

The code is released under the MIT license. Portions of the code are borrowed from HRNet-Object-Detection and CenterNet.

Acknowledgements

We would like to thank Lei Jin for providing us the code for parsing the layout annotations in SUN RGB-D dataset.

noncuboidroom's People

Contributors

bertjiazheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

noncuboidroom's Issues

Depth Pixelwise

I was reviewing the code, i noticed that dt_params3d_pixelwise is not used in the code. Line 326 in test.py

# post process on output feature map size, and extract planes, lines, plane params instance and plane params pixelwise dt_planes, dt_lines, dt_params3d_instance, dt_params3d_pixelwise = post_process(x, Mnms=1)

In the function ConvertLayout in reconstruction.py there is a parameter called pixelwise which is set to None. If i try to assign dt_params3d_pixelwise to the pixelwise parameter function of Convert Layout i get an error.

# opt results seg, depth, img, polys = ConvertLayout( inputs['img'][i], ups, downs, attribution, K=inputs['intri'][i].cpu().numpy(), pwalls=params_layout, pfloor=pfloor, pceiling=pceiling, ixy1map=inputs['ixy1map'][i].cpu().numpy(), valid=inputs['iseg'][i].cpu().numpy(), oxy1map=inputs['oxy1map'][i].cpu().numpy(), pixelwise=None )

So my questions are:

  • What is dt_params3d_pixelwise used for?
  • What should be passed to the function ConvertLayout as pixelwise parameters, when is not set to None.

Thanks

Structured 3D Dataset Corrupted

I tried to download the structured 3D dataset but it when I try to extract the zip file it says it is corrupted. I tried several times but the result is always the same. Anyone had the same problem?

Curved Depth Map

Hi, I am trying to create a visualizer that allows me to visualize the room layout. I am using a custom image for inference.
The results look good, when i check the segmentation overlayed on my image. The main issue I have is when I try to use open 3d to plot, it looks like the depth map is curved. Do you have any explanation for this?

This is the overlayed image
overlay_img

This is my Open3d visualization of the polygons:

Screen Shot 2021-11-24 at 4 18 19 PM

Screen Shot 2021-11-24 at 4 18 34 PM

The main issue is the curvature, and that is due to the depth map extracted by the network. Any idea of how to solve it?
Thanks

Future Improvements Ideas

Hello guys, I really appreciate the work and I was finally able to reconstruct and visualize my room from a 2d picture.

Screen Shot 2021-12-06 at 5 05 12 PM

I was wandering if you have any directions/ideas on how we could combine different predictions from different images of the same room, to obtain a complete layout reconstruction of the full room. The idea would be to we take 3 or 4 partially overlapping pictures that cover the whole room and then having a full 3D reconstruction of the room, maybe by using the camera intrinsics and extrinsics and the by joining the extracted planes. This is just an idea but I wanted to know if you thought about it and if you think is something feasible or not. Thanks a lot.

Wrong Predictions using custom Images

I tried to use the Structured 3D pretrained model on some custom images taken from my phone and online.
I noticed that the prediction are not very correct and I wanted to know if there is any preprocessing step that needs to be done on the images. Here are some results I obtained:

0_select (2)

0_select (3)

Any help would be appreciated

Something wrong with testing Structured3D dataset

I downloaded Structured3D dataset and 2D line annotations which contains 6280 labels, but I found that scene_03499_142535_2 is not exist in my downloaded dataset.
And I found some totally wrong label like this (the second col is GT)
scene_03253_533743_4
1710_select
scene_03440_527_3
2755_select
scene_03394_4372_0
1256_select

Inference on custom data

I am doing some inference using the Structure3D_pretrained.pt model which is downloaded from this repo, and the custom image is come from the InteriorNet dataset, which is introduced by the KuJiaLe, too.
The image size is (480, 640), and the camera intrinsic is [[600, 0, 320], [0, 600, 240], [0, 0, 1].

Part of the result output by the model seems to be reasonable, but other result is hard to accept.
So should i change the intrinsic matrix ? or how should i modify the hyperparameter setting ?
Any suggestion will be grateful.

In the following pictures, red is the GT edge and green is predicted by the model:
comparison_1509_1602_3FO4II8OLSP6_Dining_room_6
comparison_1201_1301_3FO4IGB214OC_Dining_room_17
comparison_807_909_3FO4IDX9JIBX_Bathroom_2
comparison_910_1011_3FO4IEQVRLOT_Kids_room_5
comparison_2607_2702_3FO4ILIXGI9L_Dining_room_13

How to calculate the value of metric PE without semantic infomantion?

Hello, Thanks for the impressive work of you and your team!

The output of your network are planes, lines and plane parameters, and there is no semantic infomation included in the output , so i`m very confuse about the computing method of metric PE, expecting for your reply.

Thanks again.

question of optimization process

Hi
I don't really understand the process of optimizing plane parameter with detected line. In the paper, the optimize function seems to let the line intersected by two walls to fit the detected one.
image

line = np.dot((p0[:3] / p0[3] - p1[:3] / p1[3]), K_inv)

line = np.dot((p0[:3] * p1[3] - p1[:3] * p0[3]), K_inv) # 3

These two codes seem different but have the same result with
line = -1 * line[1:] / line[0]

What does these two code and the results means? In my view, it calculates a vector in camera coordinate, but what it means with dot K_inv.
The result -1 * line[1:]/line[0] seems to calculate the m and b of x=my+b. But I don't really know why.

Having a hard time getting the right output as explained in the paper.

Hello.
I really enjoyed reading your paper and am so excited to test your code.

Here are some outputs that used your pretrained model, both Structure3D and NYU303.
0_select (1)
0_select (2)
0_select

I am having a hard time getting the right output as explained in the paper.
It was happening for both pretrained models.

  • Do you have any restrictions on input images by pretrained models?
  • Two models is outputting different results. Which model do you recommend more?
  • Do you have any restrictions on the environment settings?

Could you please share some more explanations on how to use the pretrained model you provided?
And it would be great if you can share some sample images you used for the inference.

Thank you.

Structured 3D dataset - corruptions in zip files

Many thanks for making this dataset available!
I have downloaded the Structured3D_panorama_00.zip ... Structured3D_panorama_17.zip from the Azure cloud and found that all except 14,15,16,17 have some files corrupted inside the zip archives. I have downloaded the archives on multiple university and my private internet connections and used different applications to unzip the files (unzip, 7z in Ubuntu 22.04 and with Windows 10 archive manager). In all cases, the same errors were encountered. For example, unzipping the Structured3D_panorama_13.zip results in the following errors:

unzip -q Structured3D_panorama_13.zip -d /media/jiri/Pluto/datasets/Structured3D/Structured3D_panorama/13/
error: invalid compressed data to inflate /media/jiri/Pluto/datasets/Structured3D/Structured3D_panorama/13/Structured3D/scene_02601/2D_rendering/843/panorama/simple/normal.png
/media/jiri/Pluto/datasets/Structured3D/Structured3D_panorama/13/Structured3D/scene_02608/2D_rendering/345/panorama/full/rgb_coldlight.png bad CRC 406b2c0b (should be 53007390)
file #1444: bad zipfile offset (local header sig): 536390646
/media/jiri/Pluto/datasets/Structured3D/Structured3D_panorama/13/Structured3D/scene_02609/2D_rendering/1235/panorama/simple/rgb_warmlight.png bad CRC fbc88119 (should be 35d6a845)
file #1486: bad zipfile offset (local header sig): 553273189
file #1487: bad zipfile offset (local header sig): 553279282
file #1488: bad zipfile offset (local header sig): 553279355
file #1489: bad zipfile offset (local header sig): 553279437
file #1490: bad zipfile offset (local header sig): 553279568
file #1491: bad zipfile offset (local header sig): 553279656
file #1492: bad zipfile offset (local header sig): 553519416
/media/jiri/Pluto/datasets/Structured3D/Structured3D_panorama/13/Structured3D/scene_02609/2D_rendering/4681/panorama/empty/rgb_coldlight.png bad CRC c4090c4d (should be 02b78ee6)
file #1579: bad zipfile offset (local header sig): 586400993

and with 7z

7z x -o/media/jiri/Pluto/datasets/Structured3D/Structured3D_panorama/13/ Structured3D_panorama_13.zip

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz (306F2),ASM,AES-NI)

Scanning the drive for archives:
1 file, 12381936537 bytes (12 GiB)

Extracting archive: Structured3D_panorama_13.zip

--
Path = Structured3D_panorama_13.zip
Type = zip
Physical Size = 12381936537
64-bit = +

ERROR: Data Error : Structured3D/scene_02601/2D_rendering/843/panorama/simple/normal.png
ERROR: CRC Failed : Structured3D/scene_02608/2D_rendering/345/panorama/full/rgb_coldlight.png
ERROR: Headers Error : Structured3D/scene_02608/2D_rendering/345/panorama/full/rgb_rawlight.png
ERROR: CRC Failed : Structured3D/scene_02609/2D_rendering/1235/panorama/simple/rgb_warmlight.png
ERROR: Headers Error : Structured3D/scene_02609/2D_rendering/1235/panorama/simple/semantic.png
ERROR: Headers Error : Structured3D/scene_02609/2D_rendering/1419
ERROR: Headers Error : Structured3D/scene_02609/2D_rendering/1419/panorama
ERROR: Headers Error : Structured3D/scene_02609/2D_rendering/1419/panorama/camera_xyz.txt
ERROR: Headers Error : Structured3D/scene_02609/2D_rendering/1419/panorama/empty
ERROR: Headers Error : Structured3D/scene_02609/2D_rendering/1419/panorama/empty/albedo.png
ERROR: Headers Error : Structured3D/scene_02609/2D_rendering/1419/panorama/empty/depth.png
ERROR: CRC Failed : Structured3D/scene_02609/2D_rendering/4681/panorama/empty/rgb_coldlight.png
ERROR: Headers Error : Structured3D/scene_02609/2D_rendering/4681/panorama/empty/rgb_rawlight.png

Sub items Errors: 13
Archives with Errors: 1
Sub items Errors: 13

Could you please verify that these files are fine in the Azure cloud? If yes, then perhaps the files are corrupted only on the Azure mirror in the UK, which would be a very serious fault in Azure data management.

Kind Regards
Jiri

Inclination is incorrect

Thanks for the amazing work. I tried the running test.py on nyu dataset and blended the seg to check the alignment. It seems to be going off. Can you suggest how to fix it? Please find the example below.

Blended segmentation and image
9

segmentation and image
9

Pipeline for inference on custom data

I would know what are the steps to run an inference on custom data? More precisely, if we run the command specified on the readme, it will use the custom dataset, what are the things to modify on the dataset and on the cfg.yaml so we can run the model on custom data?

question regarding the network design choice

Hi author,

I have a question regarding the network design.

From the planar detection (Section 3.1), you indicated that "Each channel of the center likelihood map C represents different categories"., and looks like that you attempt to solve planar detection + classification of wall/ceiling/floor together via center likelihood map.

This make me confused. As the channel for offset is still HW2 instead of HW6, where it looks like that wall/ceiling/floor share the same offset. I am not quite understand the design here as normally, the offset for wall/ceiling/floor may not be the same. Could you make some comments for my question? and will it make more sense to decouple them to keypoint detection + classification instead?

Thank you if you can consider answer my question.

about the training

may I ask about the training time and the gpu device you used while training?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.