Giter Club home page Giter Club logo

vrd-dsr's Introduction

Visual Relationship Detection with Deep Structural Ranking

The code is written in python and pytorch (0.2.0) [torch-0.2.0.post3].

Since I have graduated, I may not be able to respond to the issues in time. Thanks for your understanding.

Clone the repo

OR

Data Preparation

  1. Download VRD Dateset (image, annotation, backup) and put it in the path ~/data. Replace ~/data/sg_dataset/sg_test_images/4392556686_44d71ff5a0_o.gif with ~/data/vrd/4392556686_44d71ff5a0_o.jpg

  2. Download VGG16 trained on ImageNet and put it in the path ~/data

  3. Download the meta data (so_prior.pkl) [Baidu YUN] or [Google Drive] and put it in ~/data/vrd

  4. Download visual genome data (vg.zip) [Baidu YUN] or [Google Drive] and put it in ~/data/vg

  5. Word2vec representations of the subject and object categories are provided in this project. If you want to use the model for novel categories, please refer to this blog.

The folder should be:

├── sg_dataset
│   ├── sg_test_images
│   ├── sg_train_images
│   
├── VGG_imagenet.npy
└── vrd
    ├── gt.mat
    ├── obj.txt
    ├── params_emb.pkl
    ├── proposal.pkl
    ├── rel.txt
    ├── so_prior.pkl
    ├── test.pkl
    ├── train.pkl
    └── zeroShot.mat

Data format

  • train.pkl or test.pkl

    • python list
    • each item is a dictionary with the following keys: {'img_path', 'classes', 'boxes', 'ix1', 'ix2', 'rel_classes'}
      • 'classes' and 'boxes' describe the objects contained in a single image.
      • 'ix1': subject index.
      • 'ix2': object index.
      • 'rel_classes': relationship for a subject-object pair.
  • proposal.pkl

         >>> proposals.keys()
         ['confs', 'boxes', 'cls']
         >>> proposals['confs'].shape, proposals['boxes'].shape, proposals['cls'].shape
         ((1000,), (1000,), (1000,))
         >>> proposals['confs'][0].shape, proposals['boxes'][0].shape, proposals['cls'][0].shape
         ((9, 1), (9, 4), (9, 1))
         ```

Prerequisites

  • Python 2.7
  • Pytorch 0.2.0
  • opencv-python
  • tabulate
  • CUDA 8.0 or higher

Installation

  • Edit ~/lib/make.sh to set CUDA_PATH and choose your -arch option to match your GPU.

    GPU model Architecture
    TitanX (Maxwell/Pascal) sm_52
    GTX 960M sm_50
    GTX 1080 (Ti) sm_61
    Grid K520 (AWS g2.2xlarge) sm_30
    Tesla K80 (AWS p2.xlarge) sm_37
  • Build the Cython modules for the roi_pooling layer and choose the right -arch to compile the cuda code refering to https://github.com/ruotianluo/pytorch-faster-rcnn.

    cd lib
    ./make.sh

Demo

Train

  • Model Structure

Model Structure

  • Run

    cd tool
    CUDA_VISIBLE_DEVICES=0 python train.py --dataset vrd --name VRD_RANK --epochs 10 --print-freq 500 --model_type RANK_IM

 You can set the parser argument -no_so to discard separate bbox visual input and --no_obj to discard semantic cue.

  • This project contains all training and testing code for predicate detection. For relationship detection, our proposed pipeline contains two stages. The first stage is object detection and not included in this project. I am trying to release the code ASAP. Before that, you may refer to some other projects such as pytorch-faster-rcnn and faster-rcnn.pytorch.

Citation

If you use this code, please cite the following paper(s):

@article{liang2018Visual,
	title={Visual Relationship Detection with Deep Structural Ranking},
	author={Liang, Kongming and Guo, Yuhong and Chang, Hong and Chen, Xilin},
	booktitle={AAAI Conference on Artificial Intelligence},
	year={2018}
}

License

The source codes and processed data can only be used for none-commercial purpose.

vrd-dsr's People

Contributors

griffinliang avatar sx14 avatar sukrutrao avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.