Giter Club home page Giter Club logo

unrel's Introduction

Weakly-supervised learning of visual relations

Created by Julia Peyre at INRIA, Paris.

Introduction

This is the code for the paper :

Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic, Weakly-supervised learning of visual relations, ICCV17.

The webpage for this project is available here, with a link to the paper and UnRel dataset.

License

The code and dataset are available for research purpose. Check the LICENSE file for details.

Cite

If you find this code useful in your research, please, consider citing our paper:

@InProceedings{Peyre17, author = "Peyre, Julia and Laptev, Ivan and Schmid, Cordelia and Sivic, Josef", title = "Weakly-supervised learning of visual relations", booktitle = "ICCV", year = "2017" }

Related work

If you are interested in learning more about the optimization with discriminative clustering, check :

Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev, Josef Sivic, Learning from video and text via large-scale discriminative clustering, ICCV17 (paper,code)

Contents

  1. Dependencies
  2. Data
  3. Demo
  4. Running on new images

Dependencies

To run this code, you need to install :

  1. MOSEK : version 7
  2. CVX : version 2.1
  3. VLFEAT : version 0.9.20

Once installed, setup the paths in startup file :

startup.m

Data

To use this code with Visual Relationship Detection dataset and UnRel, follow the following steps to get the data.

  1. Download the pre-processed data
wget http://www.di.ens.fr/willow/research/unrel/release/preproc_data.zip
unzip preproc_data.zip

This repository contains the folders for the datasets. The structure of the folders is as follows :

./data/
------ models/
------------- gmm-{weak,full}.mat   # GMM model for quantized spatial configurations trained with weak/full supervision
------------- pca-{weak,full}.mat   # PCA model for dimension reduction on appearance features trained with weak/full supervision
------------- vgg16_fast_rcnn_iter_80000.caffemodel   # VGG16 object detector caffemodel
------------- test.prototxt                           # VGG16 object detector prototxt
------ classifiers/   # trained models
------ vrd-dataset/   # dataset name
------------------ test/   # split
------------------------ candidates/   # candidates name
----------------------------------- pairs.mat     # candidates pairs of boxes stored in a structure :
						  # pairs.im_id : image id for the pair of boxes
						  # pairs.sub_id : id of the subject box
						  # pairs.obj_id : id of the object box
						  # pairs.rel_id : id for the pair of boxes
						  # pairs.sub_cat : subject category
						  # pairs.obj_cat : object category
						  # pairs.rel_cat : predicate category
						  # pairs.sub_box : coordinates of subject box [xmin,ymin,xmax,ymax]
						  # pairs.obj_box : coordinates of object box [xmin,ymin,xmax,ymax]
----------------------------------- objects.mat   # candidates objects
----------------------------------- objectscores.mat   # object scores for the candidates objects computed with object detector
----------------------------------- features/
-------------------------------------------- appearance-{weak,full}/   # appearance features for the candidates objects for each image (first column indicates obj_id)  

The folder candidates/ can contain different types of candidates :

  • annotated: candidates pairs corresponding to groundtruth positive annotations
  • candidates: candidates pairs of boxes obtained with selective search
  • gt-candidates: candidates pairs of boxes built from all groundtruth object boxes (but not necessarily annotated)
  • Lu-candidates: candidates pairs of boxes provided by [Lu16]
  • densecap-candidates: candidates regions obtained with DenseCap method [Johnson16]

The folder features/ only contains the pre-processed appearance features for the objects (given by an object detector, and after applying PCA). Now, we need to compute the spatial features for the pairs of boxes.

  1. Extract the spatial features
    To compute the spatial features, run :
populate_spatialfeats.m

This step will pre-compute the spatial features for all candidates pairs of boxes in each image.

Demo

Now you can run our demo script to train and/or evaluate :

   demo.m

You can change the training/evaluation options by modifying the object opts. Refer to config.m to see the different options available for training and evaluation. We also provide scripts in folder experiments/ to reproduce the results in our paper.

You will need approximately 10G memory to re-train our weakly-supervised model on Visual Relationship Dataset.

Running on new images

You might want to test our model on new images. For this, follow these steps :

  1. Get candidate pairs of boxes for the new images (you can use different proposal methods, e.g. selective search)
  2. Run our pre-trained Fast-RCNN object detector on Caffe to get object scores and extract the appearance features
  3. Apply PCA for dimension reduction on appearance features
  4. Run compute_spatial_features.m to compute the quantized spatial features

You can contact the first author for more information regarding the pre-processing steps on new images.

unrel's People

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.