MMSNet: Multi-Modal scene recognition using multi-scale encoded features

This repository provides the implementation of the following paper:

MMSNet: Multi-Modal scene recognition using multi-scale encoded features
Ali Caglayan *, Nevrez Imamoglu *, Ryosuke Nakamura
[Paper]

Requirements

Before starting, it is required to install the following libraries. Note that the package versions might need to be changed depending on the system:

conda create -n mmsnet python=3.7
conda activate mmsnet
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -U scikit-learn
pip install opencv-python
pip install psutil
pip install h5py
pip install seaborn

Also, source code path might need to be included to the PYTHONPATH (e.g. export PYTHONPATH=$PYTHONPATH:/path_to_project/MMSNet/src/utils).

Data Preparation

SUN RGB-D Scene

SUN RGB-D Scene dataset is the largest real-world RGB-D indoor dataset as of today. Download the dataset from here, keep the file structure as is after extracting the files. In addition, allsplit.mat and SUNRGBDMeta.mat files need to be downloaded from the SUN RGB-D toolbox. allsplit.mat file is under SUNRGBDtoolbox/traintestSUNRGBD and SUNRGBDMeta.mat is under SUNRGBDtoolbox/Metada. Both files should be placed under the root folder of SUN RGB-D dataset. E.g. :

SUNRGBD ROOT PATH
├── SUNRGBD
│   ├── kv1 ...
│   ├── kv2 ...
│   ├── realsense ...
│   ├── xtion ...
├── allsplit.mat
├── SUNRGBDMeta.mat

The dataset is presented in a complex hierarchy. Therefore, it's adopted to the local system as follows:

python utils/organize_sunrgb_scene.py --dataset-path <SUNRGBD ROOT PATH>

This creates train/eval splits, copies RGB and depth files together with camera calibration parameters files for depth data under the corresponding split structure. Then, depth colorization is applied as below, which takes a couple of hours.

python utils/depth_colorize.py --dataset "sunrgbd" --dataset-path <SUNRGBD ROOT PATH>

NYUV2 RGB-D Scene

NYUV2 RGB-D Scene dataset is available here. In addition, splits.mat file needs to be downloaded from here together with sceneTypes.txt from here. The dataset structure should be something like below:

NYUV2 ROOT PATH
├── nyu_depth_v2_labeled.mat
├── splits.mat
├── sceneTypes.txt

Unlike other datasets, NYUV2 dataset is provided as a Matlab .mat file in nyu_depth_v2_labeled.mat. This work uses the provided in-painted depth maps and RGB images. In order to prepare depth data offline, depth colorization can be applied as follows:

python utils/depth_colorize.py --dataset "nyuv2" --dataset-path <NYUV2 ROOT PATH>

Fukuoka RGB-D Scene

Fukuoka RGB-D Indoor Scene dataset is used for the first time in the literature for benchmarking in this work. There are 6 categories: corridor, kitchen, lab, office, study room, and toilet (see the download links below). The files should be extracted in a parent folder (e.g. fukuoka). The dataset structure should be something like below:

Fukuoka ROOT PATH
├── fukuoka
│   ├── corridors ...
│   ├── kitchens ...
│   ├── labs ...
│   ├── offices ...
│   ├── studyrooms ...
│   ├── toilets ...

The dataset is organized using the following command, which creates eval-set under the root path:

python utils/organize_fukuoka_scene.py --dataset-path <Fukuoka ROOT PATH>

Then, depth colorization is applied similar to the other dataset usages.

python utils/depth_colorize.py --dataset "fukuoka" --dataset-path <Fukuoka ROOT PATH>

Evaluation

Trained Models

Trained models that give the results in the paper are provided as follows in a tree hierarchy. Download the models to run the evaluation code. Note that we share the used random weights here. However, it's possible to generate new random weights using the param --reuse-randoms 0 (default 1). The results might change slightly (could be higher or lower). We discuss the effect of randomness in our previous paper here. Note that this (random modeling) should be done during the training process, not only for the evaluation (as the new random set naturally creates a new distribution).

ROOT PATH TO MODELS
├── models
│   ├── resnet101_sun_rgb_best_checkpoint.pth
│   ├── resnet101_sun_depth_best_checkpoint.pth
│   ├── sunrgbd_mms_best_checkpoint.pth
│   ├── nyuv2_mms_best_checkpoint.pth
│   ├── fukuoka_mms_best_checkpoint.pth
├── random_weights
│   ├── resnet101_reduction_random_weights.pkl
│   ├── resnet101_rnn_random_weights.pkl

Evaluation

After data preparation and downloading the models, to evaluate to models on SUN RGB-D, NYUV2, and Fukuoka RGB-D, run the following commands:

python eval_models.py --dataset "sunrgbd" --dataset-path <SUNRGBD ROOT PATH> --models-path <ROOT PATH TO MODELS>
python eval_models.py --dataset "nyuv2" --dataset-path <NYUV2 ROOT PATH> --models-path <ROOT PATH TO MODELS>
python eval_models.py --dataset "fukuoka" --dataset-path <Fukuoka ROOT PATH> --models-path <ROOT PATH TO MODELS>

Results

Multi-modal performance comparison of this work (MMSNet) with the related methods on SUN RGB-D, NYUV2 RGB-D, and Fukuoka RGB-D Scene datasets in terms of accuracy (%).

Method	Paper	SUN RGB-D	NYUV2 RGB-D	Fukuoka RGB-D
Places CNN-RBF SVM	NeurIPS’14	39.0	-	-
SS-CNN-R6	ICRA’16	41.3	-	-
DMFF	CVPR’16	41.5	-	-
Places CNN-RCNN	CVPR’16	48.1	63.9	-
MSMM	IJCAI’17	52.3	66.7	-
RGB-D-CNN	AAAI’17	52.4	65.8	-
D-BCNN	RAS’17	55.5	64.1	-
MDSI-CNN	TPAMI’18	45.2	50.1	-
DF2Net	AAAI’18	54.6	65.4	-
HP-CNN-T	Auton.’19	42.2	-	-
LM-CNN	Cogn. Comput.’19	48.7	-	-
RGB-D-OB	TIP’19	53.8	67.5	-
Cross-Modal Graph	AAAI’19	55.1	67.4	-
RAGC	ICCVW’19	42.1	-	-
MAPNet	PR’19	56.2	67.7	-
TRecgNet Aug	CVPR’19	56.7	69.2	-
G-L-SOOR	TIP’20	55.5	67.4	-
MSN	Neurocomp.’20	56.2	68.1	-
CBCL	BMVC’20	59.5	70.9	-
ASK	TIP’21	57.3	69.3	-
2D-3D FusionNet	Inf. Fusion’21	58.6	75.1	-
TRecgNet Aug	IJCV’21	59.8	71.8	-
CNN-randRNN	CVIU’22	60.7	69.1	78.3
MMSNet	This work	62.0	72.2	81.7

We also share our LaTeX comparison tables together with the bibtext file for SUN RGB-D and NYUV2 benchmarking (see LaTeX directory). Feel free to use them.

Citation

If you find this work useful in your research, please cite the following papers:

@article{Caglayan2022MMSNet,
    title={MMSNet: Multi-Modal Scene Recognition Using Multi-Scale Encoded Features},
    journal = {SSRN},
    author={Ali Caglayan and Nevrez Imamoglu and Ryosuke Nakamura},
    doi = {http://dx.doi.org/10.2139/ssrn.4032570 },
    year={2022}
}

@article{Caglayan2022CNNrandRNN,
    title={When CNNs meet random RNNs: Towards multi-level analysis for RGB-D object and scene recognition},
    journal = {Computer Vision and Image Understanding},
    author={Ali Caglayan and Nevrez Imamoglu and Ahmet Burak Can and Ryosuke Nakamura},
    volume = {217},
    pages = {103373},
    issn = {1077-3142},
    doi = {https://doi.org/10.1016/j.cviu.2022.103373},
    year={2022}
}

License

This project is released under the MIT License (see the LICENSE file for details).

Acknowledgment

This paper is based on the results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

nevrez / mmsnet Goto Github PK

mmsnet's Introduction

MMSNet: Multi-Modal scene recognition using multi-scale encoded features

Requirements

Data Preparation

SUN RGB-D Scene

NYUV2 RGB-D Scene

Fukuoka RGB-D Scene

Evaluation

Trained Models

Evaluation

Results

Citation

License

Acknowledgment

mmsnet's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent