Giter Club home page Giter Club logo

mmsnet's Introduction

MMSNet: Multi-Modal scene recognition using multi-scale encoded features

This repository provides the implementation of the following paper:

MMSNet: Multi-Modal scene recognition using multi-scale encoded features
Ali Caglayan *, Nevrez Imamoglu *, Ryosuke Nakamura
[Paper]


Graphical abstract

Requirements

Before starting, it is required to install the following libraries. Note that the package versions might need to be changed depending on the system:

conda create -n mmsnet python=3.7
conda activate mmsnet
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -U scikit-learn
pip install opencv-python
pip install psutil
pip install h5py
pip install seaborn

Also, source code path might need to be included to the PYTHONPATH (e.g. export PYTHONPATH=$PYTHONPATH:/path_to_project/MMSNet/src/utils).

Data Preparation

SUN RGB-D Scene

SUN RGB-D Scene dataset is the largest real-world RGB-D indoor dataset as of today. Download the dataset from here, keep the file structure as is after extracting the files. In addition, allsplit.mat and SUNRGBDMeta.mat files need to be downloaded from the SUN RGB-D toolbox. allsplit.mat file is under SUNRGBDtoolbox/traintestSUNRGBD and SUNRGBDMeta.mat is under SUNRGBDtoolbox/Metada. Both files should be placed under the root folder of SUN RGB-D dataset. E.g. :

SUNRGBD ROOT PATH
├── SUNRGBD
│   ├── kv1 ...
│   ├── kv2 ...
│   ├── realsense ...
│   ├── xtion ...
├── allsplit.mat
├── SUNRGBDMeta.mat

The dataset is presented in a complex hierarchy. Therefore, it's adopted to the local system as follows:

python utils/organize_sunrgb_scene.py --dataset-path <SUNRGBD ROOT PATH>

This creates train/eval splits, copies RGB and depth files together with camera calibration parameters files for depth data under the corresponding split structure. Then, depth colorization is applied as below, which takes a couple of hours.

python utils/depth_colorize.py --dataset "sunrgbd" --dataset-path <SUNRGBD ROOT PATH>

NYUV2 RGB-D Scene

NYUV2 RGB-D Scene dataset is available here. In addition, splits.mat file needs to be downloaded from here together with sceneTypes.txt from here. The dataset structure should be something like below:

NYUV2 ROOT PATH
├── nyu_depth_v2_labeled.mat
├── splits.mat
├── sceneTypes.txt

Unlike other datasets, NYUV2 dataset is provided as a Matlab .mat file in nyu_depth_v2_labeled.mat. This work uses the provided in-painted depth maps and RGB images. In order to prepare depth data offline, depth colorization can be applied as follows:

python utils/depth_colorize.py --dataset "nyuv2" --dataset-path <NYUV2 ROOT PATH>

Fukuoka RGB-D Scene

Fukuoka RGB-D Indoor Scene dataset is used for the first time in the literature for benchmarking in this work. There are 6 categories: corridor, kitchen, lab, office, study room, and toilet (see the download links below). The files should be extracted in a parent folder (e.g. fukuoka). The dataset structure should be something like below:

Fukuoka ROOT PATH
├── fukuoka
│   ├── corridors ...
│   ├── kitchens ...
│   ├── labs ...
│   ├── offices ...
│   ├── studyrooms ...
│   ├── toilets ...

The dataset is organized using the following command, which creates eval-set under the root path:

python utils/organize_fukuoka_scene.py --dataset-path <Fukuoka ROOT PATH> 

Then, depth colorization is applied similar to the other dataset usages.

python utils/depth_colorize.py --dataset "fukuoka" --dataset-path <Fukuoka ROOT PATH>

Evaluation

Trained Models

Trained models that give the results in the paper are provided as follows in a tree hierarchy. Download the models to run the evaluation code. Note that we share the used random weights here. However, it's possible to generate new random weights using the param --reuse-randoms 0 (default 1). The results might change slightly (could be higher or lower). We discuss the effect of randomness in our previous paper here. Note that this (random modeling) should be done during the training process, not only for the evaluation (as the new random set naturally creates a new distribution).

ROOT PATH TO MODELS
├── models
│   ├── resnet101_sun_rgb_best_checkpoint.pth
│   ├── resnet101_sun_depth_best_checkpoint.pth
│   ├── sunrgbd_mms_best_checkpoint.pth
│   ├── nyuv2_mms_best_checkpoint.pth
│   ├── fukuoka_mms_best_checkpoint.pth
├── random_weights
│   ├── resnet101_reduction_random_weights.pkl
│   ├── resnet101_rnn_random_weights.pkl

Evaluation

After data preparation and downloading the models, to evaluate to models on SUN RGB-D, NYUV2, and Fukuoka RGB-D, run the following commands:

python eval_models.py --dataset "sunrgbd" --dataset-path <SUNRGBD ROOT PATH> --models-path <ROOT PATH TO MODELS>
python eval_models.py --dataset "nyuv2" --dataset-path <NYUV2 ROOT PATH> --models-path <ROOT PATH TO MODELS>
python eval_models.py --dataset "fukuoka" --dataset-path <Fukuoka ROOT PATH> --models-path <ROOT PATH TO MODELS>

Results

Multi-modal performance comparison of this work (MMSNet) with the related methods on SUN RGB-D, NYUV2 RGB-D, and Fukuoka RGB-D Scene datasets in terms of accuracy (%).

Method Paper SUN RGB-D NYUV2 RGB-D Fukuoka RGB-D
Places CNN-RBF SVM NeurIPS’14 39.0 - -
SS-CNN-R6 ICRA’16 41.3 - -
DMFF CVPR’16 41.5 - -
Places CNN-RCNN CVPR’16 48.1 63.9 -
MSMM IJCAI’17 52.3 66.7 -
RGB-D-CNN AAAI’17 52.4 65.8 -
D-BCNN RAS’17 55.5 64.1 -
MDSI-CNN TPAMI’18 45.2 50.1 -
DF2Net AAAI’18 54.6 65.4 -
HP-CNN-T Auton.’19 42.2 - -
LM-CNN Cogn. Comput.’19 48.7 - -
RGB-D-OB TIP’19 53.8 67.5 -
Cross-Modal Graph AAAI’19 55.1 67.4 -
RAGC ICCVW’19 42.1 - -
MAPNet PR’19 56.2 67.7 -
TRecgNet Aug CVPR’19 56.7 69.2 -
G-L-SOOR TIP’20 55.5 67.4 -
MSN Neurocomp.’20 56.2 68.1 -
CBCL BMVC’20 59.5 70.9 -
ASK TIP’21 57.3 69.3 -
2D-3D FusionNet Inf. Fusion’21 58.6 75.1 -
TRecgNet Aug IJCV’21 59.8 71.8 -
CNN-randRNN CVIU’22 60.7 69.1 78.3
MMSNet This work 62.0 72.2 81.7

We also share our LaTeX comparison tables together with the bibtext file for SUN RGB-D and NYUV2 benchmarking (see LaTeX directory). Feel free to use them.

Citation

If you find this work useful in your research, please cite the following papers:

@article{Caglayan2022MMSNet,
    title={MMSNet: Multi-Modal Scene Recognition Using Multi-Scale Encoded Features},
    journal = {SSRN},
    author={Ali Caglayan and Nevrez Imamoglu and Ryosuke Nakamura},
    doi = {http://dx.doi.org/10.2139/ssrn.4032570 },
    year={2022}
}

@article{Caglayan2022CNNrandRNN,
    title={When CNNs meet random RNNs: Towards multi-level analysis for RGB-D object and scene recognition},
    journal = {Computer Vision and Image Understanding},
    author={Ali Caglayan and Nevrez Imamoglu and Ahmet Burak Can and Ryosuke Nakamura},
    volume = {217},
    pages = {103373},
    issn = {1077-3142},
    doi = {https://doi.org/10.1016/j.cviu.2022.103373},
    year={2022}
}

License

This project is released under the MIT License (see the LICENSE file for details).

Acknowledgment

This paper is based on the results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

mmsnet's People

Contributors

acaglayan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.