Giter Club home page Giter Club logo

aaai21-wsdhq's Introduction

WSDHQ: Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval

[toc]

1. Introduction

This repository provides the code for our paper at AAAI 2021:

Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval. Jinpeng Wang, Bin Chen, Qiang Zhang, Zaiqiao Meng, Shangsong Liang, Shu-Tao Xia. [link].

We proposed WSDHQ, a weakly supervised deep quantization approach for image retrieval. Instead of requiring ground-truth labels, WSDHQ leverages the informal tags provided by amateur users to guide quantization learning, which can alleviate the reliance on manual annotations and facilitate the feasibility of industrial deployment. In WSDHQ, we propose a tag processing mechanism based on correlation to enhance the weak semantics of such noisy tags. Besides, we learn quantized representations on the hypersphere manifold, on which we design a novel adaptive cosine margin loss for embedding learning and a supervised cosine quantization loss for quantization. Experiments on Flickr-25K and NUS-WIDE datasets demonstrate the superiority of WSDHQ.

In the following, we will guide you how to use this repository step by step. ๐Ÿค—

2. Preparation

git clone https://github.com/gimpong/AAAI21-WSDHQ.git
cd AAAI21-WSDHQ/
tar -xvzf data.tar.gz
rm -f data.tar.gz

2.1 Requirements

  • python 3.7.8
  • numpy 1.19.1
  • scikit-learn 0.23.1
  • h5py 2.10.0
  • python-opencv 3.4.2
  • tqdm 4.51.0
  • tensorflow 1.15.0

2.2 Download image datasets and pre-trained models. Organize them properly

Before running the code, we need to make sure that everything needed is ready. First, the working directory is expected to be organized as below:

AAAI21-WSDHQ/
  • data/
    • flickr25k/
      • tags
        • FinalTagEmbs.txt
        • TagIdMergeMap.pkl
      • common_tags.txt
      • database_img.txt
      • database_label.txt
      • train_img.txt
      • train_tag.txt
      • test_img.txt
      • test_label.txt
    • nus-wide/
      • tags
        • FinalTagEmbs.txt
        • TagIdMergeMap.pkl
      • TagList1k.txt
      • database_img.txt
      • database_label.txt
      • train_img.txt
      • train_tag.txt
      • test_img.txt
      • test_label.txt
  • datasets/
    • GoogleNews-vectors-negative300.bin.gz
    • flickr25k/
      • mirflickr/
        • im1.jpg
        • im2.jpg
        • ...
    • nus-wide/
      • Flickr/
        • actor/
          • 0001_2124494179.jpg
          • 0002_174174086.jpg
          • ...
        • administrative_assistant/
          • ...
        • ...
  • scripts/
    • run0001.sh
    • run0002.sh
    • ...
    • tag_processing.sh
  • train.py
  • validation.py
  • net.py
  • net_val.py
  • util.py
  • dataset.py
  • alexnet.npy

Notes

  • The data/ folder is the collection of data splits for Flickr25K and NUS-WIDE datasets. The raw images of Flickr25K and NUS-WIDE datasets should be downloaded additionally and arranged in datasets/flickr25k/ and datasets/nus-wide/ respectively. Here we provide copies of these image datasets, you can download them via Google Drive or Baidu Wangpan (Web Drive, password: ocmv).

  • The pre-trained files of AlexNet (alexnet.npy) and Word2Vec (GoogleNews-vectors-negative300.bin.gz) can be downloaded from Baidu Wangpan (Web Drive, password: ocmv).

3. Enhance the weak semantic information of tags via preprocessing (Optional)

We have provided enhanced tag embeddings in this repository. See data/flickr25k/tags/ and data/nus-wide/tags/. If you want to reproduce these files, you can remove them and execute

cd scripts/
# '0' is the id of GPU
bash tag_processing.sh 0

4. Train and then evaluate

To facilitate reproducibility, we provide the scripts with configurations for each experiment. The scripts can be found under the scripts/ folder. For example, if you want to train and evaluate an 8-bit WSDHQ model on Flickr25K dataset, you can do

cd scripts/
# '0' is the id of GPU
bash run0001.sh 0

The script run0001.sh includes the running commands:

#!/bin/bash

cd ..

##8 bits
#                     dataset  lr      iter  lambda    subspace_num  loss   notes  gpu
python train.py       flickr   0.0003  800   0.0001    1             WSDQH  0001   $1
#                     dataset  model_weight                                                                 gpu
python validation.py  flickr   ./checkpoints/flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.npy  $1

cd -

After running a script, a series of files will be saved under logs/ and checkpoints/. Take run0001.sh as an example:

AAAI21-WSDHQ/
  • logs/
    • flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.log
  • checkpoints/
    • flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.npy
    • flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001_retrieval.h5
  • ...

Here we report the results of running the scripts on a GTX 1080 Ti. Results are shown in the following table. We have also uploaded the logs and checkpoint information for reference, which can be downloaded from Baidu Wangpan (Web Drive, password: ocmv).

Note that some values can slightly deviate from the reported results in our original paper. The phenomenon is caused by the randomness of Tensorflow and the software and hardware discrepancies.

Script Dataset Code Length / bits MAP Log
run0001.sh Flickr25K 8 0.766 flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.log
run0002.sh 16 0.755 flickr_WSDQH_nbits=16_adaMargin_gamma=1_lambda=0.0001_0002.log
run0003.sh 24 0.765 flickr_WSDQH_nbits=24_adaMargin_gamma=1_lambda=0.0001_0003.log
run0004.sh 32 0.767 flickr_WSDQH_nbits=32_adaMargin_gamma=1_lambda=0.0001_0004.log
run0005.sh NUS-WIDE 8 0.717 nuswide_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0005.log
run0006.sh 16 0.727 nuswide_WSDQH_nbits=16_adaMargin_gamma=1_lambda=0.0001_0006.log
run0007.sh 24 0.730 nuswide_WSDQH_nbits=24_adaMargin_gamma=1_lambda=0.0001_0007.log
run0008.sh 32 0.729 nuswide_WSDQH_nbits=32_adaMargin_gamma=1_lambda=0.0001_0008.log

5. References

If you find this code useful or use the toolkit in your work, please consider citing:

@inproceedings{wang2021wsdhq,
  title={Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval},
  author={Wang, Jinpeng and Chen, Bin and Zhang, Qiang and Meng, Zaiqiao and Liang, Shangsong and Xia, Shutao},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={4},
  pages={2755--2763},
  year={2021}
}

6. Acknowledgements

We use DeepHash as the code base in our implementation.

7. Contact

If you have any question, you can raise an issue or email Jinpeng Wang ([email protected]). We will reply you soon.

aaai21-wsdhq's People

Contributors

gimpong avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.