Giter Club home page Giter Club logo

ddpn's Introduction

DDPN

This project is the implementation of the paper Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding.The network architecture with DDPN for our visual grounding model is illustrated in Figure 1.

Figure 1: The model architecture for our visual grounding model.

Figure 1: The model network architecture for our visual grounding model.

Requirements

  • Python version 2.7
  • easydict
  • cv2
  • Pytorch 0.3 (optional, used for speed-up multi-threads data loading, recommend)

Pretrained Models

We release the trained models on four datasets, which achieve slightly better results than that shown in the paper.

Datasets Flickr30k-Entities Referit Refcoco Refcoco+
val 72.78% 63.77% 76.61% 64.34%
test 73.45% 63.27% 76.23% 64.01%
testA 79.99% 71.24%
testB 72.11% 55.55%
  1. Download pretrained models BaiduYun
  2. Unzip the model files in directory './pretrained_model'.

Preprocess

  • Caffe

    cd ./caffe
    make all -j32
    make pycaffe
    
  • Download Images, Images only

    • flickr30k-entities
    • referit, download the Referit Images.
      wget -O ./data/referit/ImageCLEF/referitdata.tar.gz http://www.eecs.berkeley.edu/~ronghang/projects/cvpr16_text_obj_retrieval/referitdata.tar.gz
      tar -xzvf ./data/referit/ImageCLEF/referitdata.tar.gz -C ./data/referit/ImageCLEF/
      
    • refcoco/refcoco+, download the mscoco train2014 Images
      • mscoco train2014.
      • move images of mscoco train2014 to directory './data/mscoco/image2014/train2014/'
  • Extract DDPN image features. For a 3xhxw image, we extract the 2048-D visual feature and 4-D spatial feature (post-processed to 5-D) as the input feature for our model. The script we use is as follows. Note that we use --num_bbox 100,100 to extract a fix number of proposals (K=100) for each image.

    ./tools/extract_feat.py --gpu 0,1,2,3 --cfg experiments/cfgs/faster_rcnn_end2end_resnet_vg.yml --def models/vg/ResNet-101/faster_rcnn_end2end/test.prototxt --net /path/to/caffemodel --img_dir /path/to/images/ --out_dir /path/to/outfeat/ --num_bbox 100,100 --feat_name pool5_flat
    
    • For flickr30k or referit we output the images features in directory 'data/[flickr30k, referit]/features/bottom-up-feats/' by default. And for refcoco/refcoco+ we output the images features in 'data/mscoco/features/bottom-up-feats/train2014'.
  • Download Annotation files, we preprocess the annotations of flickr30k-entities, referit, refcoco, refcoco+ which makes all kind of data to be in same format, download our processed annotations here, BaiduYun, then unzip these zip files in directory './data'. We will release the code for preprocessing annotation in directory './preprocess'.

  • Modify the paths in the config file to adapt to your own environment, set data loader threads and images features dir and images dir in yaml config files in directory './config/experiments/'.

Training

  • flickr30k-entities
    python train_net.py --gpu_id 0 --train_split train --val_split val --cfg config/experiments/flickr30k-kld-bbox_reg.yaml
    
  • referit
    python train_net.py --gpu_id 0 --train_split train --val_split val --cfg config/experiments/referit-kld-bbox_reg.yaml
    
  • refcoco
    python train_net.py --gpu_id 0 --train_split train --val_split val --cfg config/experiments/refcoco-kld-bbox_reg.yaml
    
  • refcoco+
    python train_net.py --gpu_id 0 --train_split train --val_split val --cfg config/experiments/refcoco+-kld-bbox_reg.yaml
    
  • Output model will be put in directory './models'
  • Validation log output will be writen in directory './log'

Testing

  • flickr30k-entities
    python test_net.py --gpu_id 0 --test_split test --batchsize 64 --test_net pretrained_model/flickr30k/test.prototxt --pretrained_model pretrained_model/flickr30k/final.caffemodel --cfg config/experiments/flickr30k-kld-bbox_reg.yaml
    
  • referit
    python test_net.py --gpu_id 0 --test_split test --batchsize 64 --test_net pretrained_model/referit/test.prototxt --pretrained_model pretrained_model/referit/final.caffemodel --cfg config/experiments/referit-kld-bbox_reg.yaml
    
  • refcoco
    python test_net.py --gpu_id 0 --test_split test --batchsize 64 --test_net pretrained_model/refcoco/test.prototxt --pretrained_model pretrained_model/refcoco/final.caffemodel --cfg config/experiments/refcoco-kld-bbox_reg.yaml
    
  • refcoco+
    python test_net.py --gpu_id 0 --test_split test --batchsize 64 --test_net pretrained_model/refcoco+/test.prototxt --pretrained_model pretrained_model/refcoco+/final.caffemodel --cfg config/experiments/refcoco+-kld-bbox_reg.yaml
    

Citation

If the codes are helpful for your research, please cite

@article{yu2018rethining,
  title={Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding},
  author={Yu, Zhou and Yu, Jun and Xiang, Chenchao and Zhao, Zhou and Tian, Qi and Tao, Dacheng},
  journal={International Joint Conference on Artificial Intelligence (IJCAI)},
  year={2018}
}

ddpn's People

Contributors

xiangchenchao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.