Giter Club home page Giter Club logo

cnn_randrnn's Introduction

When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition

This repository presents the implementation of a general two-stage framework for RGB-D object and scene recognition tasks. The framework employs a convolutional neural network (CNN) model as the underlying feature extractor and random recursive neural network (RNN) to encode these features into high-level representations. For the details, please refer to:

When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition
Ali Caglayan, Nevrez Imamoglu, Ahmet Burak Can, Ryosuke Nakamura
[arXiv] [Paper][Demo]


Overview of the two-stage framework

Introduction

The framework is a general PyTorch-based codebase for RGB-D object and scene recognition. The overall structure has been designed in a modular and extendable way through a unified CNN and RNN process. Therefore, it offers an easy and flexible use. These also can be extended with new capabilities and combinations with different setups, and employing other models for implementing new ideas.

This work has been tested on the popular Washington RGB-D Object and SUN RGB-D Scene datasets demonstrating state-of-the-art results both in object and scene recognition tasks.

Feature Highlights

  • Support both one-stage CNN feature extraction and two-stage incorporation of CNN-randRNN feature extraction.
  • Applicable to AlexNet, VGGNet-16, ResNet-50, ResNet-101, DenseNet-121 as backbone CNN models.
  • Pretrained models can be used as fixed feature extractors in a fast way. They also can be used after performing finetuning.
  • A novel random pooling strategy, which extends the uniform randomness in RNNs and is applicable to both spatial and channel wise downsampling, is presented to cope with the high dimensionality of CNN activations.
  • A soft voting approach based on individual SVM confidences for multi-modal fusion has been presented.
  • An effective depth colorization based on surface normals has been presented.
  • Clear and extendible code structure for supporting more datasets and applying to new ideas.

Installation

System Requirements

System requirements for each models are reported in the supplementary material. Ideally, it would be better to have a multi-core processor, 32 GB RAM, graphics card with at least 10 GB memory, and enough disk space to store models, features, etc. depending on saving choices and initial parameters.

Setup

conda has been used as the virtual environment manager and pip as the package manager. It is possible to use either pip or conda (or both) for package management. Before starting, it is needed to install following libraries:

  • PyTorch
  • Scikit-learn and OpenCV
  • psutil, h5py, seaborn, and matplotlib libs.
    We have installed these libraries with pip as below:
  1. Create virtual environment.
conda create -n cnnrandrnn python=3.7
conda activate cnnrandrnn
  1. Install Pytorch according to your system preferences such as OS, package manager, and CUDA version (see more details here):
    e.g. pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
    This will install some other libs including numpy, pillow, etc.

  2. Install scikit-learn and OpenCV libraries:

pip install -U scikit-learn
pip install opencv-python
  1. Install psutil, h5py, seaborn and matplotlib libs:
pip install psutil
pip install h5py
pip install seaborn
pip install -U matplotlib

Getting Started

File Structure

The following directory structure is a reference to run the code as described in this documentation. This structure can be changed according to the command line parameters.

CNN_randRNN
├── data
│   ├── wrgbd
│   │   │──eval-set
│   │   │   ├──apple
│   │   │   ├──ball
│   │   │   ├──...
│   │   │   ├──water_bottle
│   │   │──split.mat
│   ├── sunrgbd
│   │   │──SUNRGBD
│   │   │   ├──kv1
│   │   │   ├──kv2
│   │   │   ├──realsense
│   │   │   ├──xtion
│   │   │──allsplit.mat
│   │   │──SUNRGBDMeta.mat
│   │   │──organized-set
│   │   │   ├──Depth_Colorized_HDF5
│   │   │   │  ├──test
│   │   │   │  ├──train
│   │   │   ├──RGB_JPG
│   │   │   │  ├──test
│   │   │   │  ├──train
│   │   │──models-features
│   │   │   ├──fine_tuning
│   │   │   │  ├──resnet101_Depth_Colorized_HDF5_best_checkpoint.pth
│   │   │   │  ├──resnet101_RGB_JPG_best_checkpoint.pth
│   │   │   ├──overall_pipeline_run
│   │   │   │  ├──svm_estimators
│   │   │   │  │  ├──resnet101_Depth_Colorized_HDF5_l5.sav
│   │   │   │  │  ├──resnet101_Depth_Colorized_HDF5_l6.sav
│   │   │   │  │  ├──resnet101_Depth_Colorized_HDF5_l7.sav
│   │   │   │  │  ├──resnet101_RGB_JPG_l5.sav
│   │   │   │  │  ├──resnet101_RGB_JPG_l6.sav
│   │   │   │  │  ├──resnet101_RGB_JPG_l7.sav
│   │   │   │  ├──demo_images
│   │   │   ├──random_weights
│   │   │   │  ├──resnet101_reduction_random_weights.pkl
│   │   │   │  ├──resnet101_rnn_random_weights.pkl
├── src
├── logs

Washington RGB-D Object Recognition

Data Preparation

Washington RGB-D Object dataset is available here. We have tested our framework using cropped evaluation set without extra background subtraction. Uncompress the data and place in data/wrgbd (see the file structure above).

To convert depth maps to colorized RGB-like depth representations:

sh run_steps.sh step="COLORIZED_DEPTH_SAVE"
python main_steps.py --dataset-path "../data/wrgbd/" --data-type "depthcrop" --debug-mode 0

Note that you might need to export /src/utils to the PYTHONPATH (e.g. export PYTHONPATH=$PYTHONPATH:/home/user/path_to_project/CNN_randRNN/src/utils). debug-mode with 1 runs the framework for a small proportion of data (you can choose the size with debug-size parameter, which sets the number of samples for each instance.) This will create colorized depth images under the /data/wrgbd/models-features/colorized_depth_images.

Run Overall Pipeline

Before demonstrating how to run the program, see the explanations for command line parameters with their default values here.

To run the overall pipeline with the default parameter values:

python main.py

This will train/test SVM for every 7 layers. You may want to make levels other than that of optimum ones to the comment lines. It is also possible to run the system step by step. See the details here.

SUN RGB-D Scene Recognition

This codebase is presented based on Washington RGB-D object recognition. It can also be applied to SUN RGB-D Scene dataset. Please see the details here to use SUN RGB-D Scene dataset. This can also be considered as a reference guide for the use of other datasets.

Scene Recognition Demo

A demo application using RGB images is presented. Download trained models and RNN random weights here. Uncompress the folder and place as the file structure given above. There are two run modes. Run the demo application with the default parameters for each mode as below:

python demo.py --mode "image"
python demo.py --mode "camera"

image mode takes the images in the demo_images folder, while the camera modes takes camera images as inputs.

Citation

If you find this work useful in your research, please consider citing:

@article{Caglayan2022CNNrandRNN,
    title={When CNNs meet random RNNs: Towards multi-level analysis for RGB-D object and scene recognition},
    journal = {Computer Vision and Image Understanding},
    author={Ali Caglayan and Nevrez Imamoglu and Ahmet Burak Can and Ryosuke Nakamura},
    volume = {217},
    pages = {103373},
    issn = {1077-3142},
    doi = {https://doi.org/10.1016/j.cviu.2022.103373},
    year={2022}
}

License

This project is released under the MIT License (see the LICENSE file for details).

Acknowledgment

This paper is based on the results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

cnn_randrnn's People

Contributors

acaglayan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cnn_randrnn's Issues

What does the split.mat file mean?

In the file structure you gave, I see a file called split.mat in the wrgbd directory, what does that file contain? Can you send it here? Thank you very much!

About of accuracy on Washington RGB-D dataset

Hi, thanks for your good job.
I run

sh run_steps.sh step="FIX_EXTRACTION"
python main_steps.py
sh run_steps.sh step="FIX_RECURSIVE_NN"
python main_steps.py

and got Fusion result: 79.08% (121/153) which is not equal to the result in the Tabel 1 of your paper. I have two questions here:

  1. Could you tell me how to modify and run the code if I want to get the result in Table 1?
  2. Is the result of Top-1 average accuracy or others in Table 1?

Thank you so much

About of accuracy on SUN RGB-D dataset

Hi, thanks for your good job.
I'd like to confirm whether you compute the average accuracy of all samples or the average accuracy of all categories on SUN RGB-D dataset.

Failed to load the SVM confidence scores

When I run the command "python main.py --net-model "resnet101" --debug-mode 0 --run-mode 2 --save-features 1"
I got the error :

RNN forward propagation through the data..
RNN: 8
RNN: 16
RNN: 24
RNN: 32
RNN: 40
RNN: 48
RNN: 56
RNN: 64
RNN: 72
RNN: 80
RNN: 88
RNN: 96
RNN: 104
RNN: 112
RNN: 120
RNN: 128
Failed to load the SVM confidence scores: [Errno 2] Unable to open file (unable to open file: name = '../data/wrgbd/models-features[debug]/fixed_recursive_nn/svm_confidence_scores/resnet101_depthcrop_split_1.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

What should I do to solve the problem , thanks a lot!

Training using Places365

Hi,
Thanks for sharing your work !
Can you give guidelines on how I can train the model on the Places365 dataset ?

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.