Giter Club home page Giter Club logo

o2dp's Introduction

Second-order Democratic Aggregation

Created by Tsung-Yu Lin, Subhransu Maji and Piotr Koniusz.

Introduction

This repository contains the code for reproducing the results in our ECCV 2018 paper:

@inproceedings{lin2018o2dp,
    Author = {Tsung-Yu Lin and Subhransu Maji and Piotr Koniusz},
    Title = {Second-order Democratic Aggregation},
    Booktitle = {European Conference on Computer Vision (ECCV)},
    Year = {2018}
}

The paper analyzes various feature aggregators in the context of second-order features and proposes γ-democratic pooling which generalizes sum pooling and democratic aggregation. See the project page and the paper for the detail. The code is tested on Ubuntu 14.04 using NVIDIA Titan X GPU and MATLAB R2016a.

Prerequisite

  1. MatConvNet: Our code was developed on the MatConvNet version 1.0-beta24.
  2. VLFEAT
  3. bcnn-package: The package includes our implementation of customized layers.

The packages are set up as the git submodules. Check them out by the following commands and follow the instructions on MatConvNet and VLFEAT project pages to install them.

>> git submodule init
>> git submodule update

Datasets

To run the experiments, download the following datasets and edit the model_setup.m file to point them to the dataset locations. For instance, you can point to the birds dataset directory by setting opts.cubDir = 'data/cub'.

Fine-grained classification datasets:

Texture and indoor scene datasets:

Pre-trained models

  • ImageNet LSVRC 2012 pre-trained models: The vgg-verydeep-16 and reset-101 ImageNet pre-trained models are used as our basic models. Download them from MatConvNet pre-trained models page.
  • B-CNN fine-tuned models: We also provide the B-CNN fine-tuned models with vgg-verydeep-16 from which we can extract the CNN features and aggregate them to construct the image descriptor. Download the models for CUB Birds, FGVC Aircrafts, or Stanford Cars to reproduce the accuracy provided in the paper.

Testing the models:

Solving the coefficients for γ-democratic aggregation involves sinkhorn iteration. The hyperparameters for the sinkhorn iteration are configurable in the entry codes run_experiments_o2dp.m and run_experiments_sketcho2dp_resnet.m. See the comment in the code for the detail.

  • Second-order γ-democratic aggregation: Point the variable model_path to the location of the model in run_experiments_o2dp.m and run the command run_experiments_o2dp(dataset, gamma, gpuidx) in matlab terminal.

    • For example:
    % gamma is the hyper-parameter gamma for gamma-democratic aggregation
    % gpuidx is the index of gpu on which you run the experiment
    run_experiments_o2dp('mit_indoor', 0.3, 1) 
    • Classification results: Sum and democratic aggregation can be achieved by setting the proper values of γ. The optimal γ values are indicated in the parenthesis. In general γ=0.5 performs reasonably well. For DTD and FMD these numbers are reported on the first split. For the fine-grained recognition datasets (†) the results are obtained by using the fine-tuned B-CNN models while for the texture and indoor scene datasets the ImageNet pre-trained vgg-verydeep-16 model is used.

      Dataset Sum(γ=1)    Democratic(γ=0)    γ-democratic
      Caltech UCSD Birds † 84.0 84.7 84.9 (0.5)
      Stanford Cars † 90.6 89.7 90.8 (0.5)
      FGVC Aircrafts † 85.7 86.7 86.7 (0.0)
      DTD 71.2 72.2 72.3 (0.3)
      FMD 84.6 82.8 84.8 (0.8)
      MIT Indoor 79.5 79.6 80.4 (0.3)
  • Second-order γ-democratic aggregation in sketch space: Point the variable model_path to the location of the model in run_experiments_sketcho2dp_resnet.m and run the command run_experiments_sketcho2dp_resnet(dataset, gamma, d, gpuidx) in matlab terminal.

    • For example:
    % gamma is the hyper-parameter gamma for gamma-democratic aggregation
    % d is the dimension for the sketch space
    % gpuidx is the index of gpu on which you run the experiment
    run_experiments_sketcho2dp_resnet('mit_indoor', 0.5, 8192, 1) 
    • The script aggregates the second-order ResNet features pre-trained on ImageNet in a 8192-dimensional sketch space with γ-democratic aggregator. With ResNet features the model achieves the following results. For DTD and FMD the accuracy is averaged over 10 splits.

      DTD    FMD    MIT Indoor
      Accuracy 76.2 ∓ 0.7 84.3 ∓ 1.5 84.3

o2dp's People

Contributors

tsungyu avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.