Giter Club home page Giter Club logo

weak-sup-visual-grounding's Introduction

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

This repository is the official implementation of CVPR 2021 paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Requirements

  • Tensorflow-1-15

Training

To train the NCE model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5

To train the NCE+Distill model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5 \
  --phrase_to_label_json=phrase_to_label.json

Evaluation

To evaluate the model on Flickr30K, run:

python eval_model.py \
  --region_feat_path=region_features_test.hdf5 \
  --phrase_feat_path=phrase_features_test.hdf5 \
  --glove_path=glove.hdf5 \
  --restore_path=checkpoint.meta

Pre-trained Models

You can download pretrained models using Res101 VG features here:

You can also find the features on Flickr30K test split here.

The pretrained models achieve the following performance on Flickr30K test split:

Model Name R@1 R@5 R@10
NCE+Distill 0.5310 0.7394 0.7875
NCE 0.5135 0.7338 0.7833

Citation

If you use our implementation in your research or wish to refer to the results published in our paper, please use the following BibTeX entry.

@InProceedings{Wang_2021_CVPR,
    author    = {Wang, Liwei and Huang, Jing and Li, Yin and Xu, Kun and Yang, Zhengyuan and Yu, Dong},
    title     = {Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {14090-14100}
}

weak-sup-visual-grounding's People

Contributors

jhuang81 avatar

Stargazers

Akash Kumar avatar Wentao Zhang avatar Dawn avatar Yangliu avatar Rongjie Li avatar Hirokazu Kiyomaru avatar Duan Yihan avatar Yin Li avatar Zi-Yuan Hu avatar  avatar  avatar WANG avatar

Watchers

 avatar

weak-sup-visual-grounding's Issues

About experiment

I could't get the experimental results in the paper when using region features and object classfication logits extracted by myself.
So could you please provide your features files,ths!

More details about the environment

I'm struggling with running the evaluation code.
Can you provide more details about the environment? (python version, library dependencies)

Missing key values when evaluating

I try to evaluate the model with the test file you provided, including region_bottomup_vg_res101_features_test.hdf5, phrase_features_test.hdf5, glove.hdf5 and flickr-bottom-up-oidv2-tf1-distill.meta.
However, at the beginning of the evaluation, the following line of code showed an error.

phrase = self.f_feats['phrase'][im_name][sent_index, :, :]

Traceback (most recent call last):
  File "eval_model.py", line 107, in <module>
    tf.app.run(main=eval_main, argv=[sys.argv[0]] + unparsed)
  File "/home/ubuntu19/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/ubuntu19/anaconda3/envs/tf/lib/python3.6/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/ubuntu19/anaconda3/envs/tf/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "eval_model.py", line 71, in eval_main
    input_values = get_batch(data_loader, i)
  File "/home/ubuntu19/lsz/vg/weak-sup-visual-grounding-main/nce_distill_model.py", line 211, in get_batch
    input_values = data_loader.get_batch(batch_index, B) 
  File "/home/ubuntu19/lsz/vg/weak-sup-visual-grounding-main/dataset_utils.py", line 199, in get_batch
    self.sample_items(sample_inds, sample_size)
  File "/home/ubuntu19/lsz/vg/weak-sup-visual-grounding-main/dataset_utils.py", line 160, in sample_items
    phrase = self.f_feats['phrase'][im_name][sent_index, :, :]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ubuntu19/anaconda3/envs/tf/lib/python3.6/site-packages/h5py/_hl/group.py", line 288, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object '1000' doesn't exist)"

I guess it is caused by the key mismatch between the two files region_bottomup_vg_res101_features_test.hdf5 and phrase_features_test.hdf5. Could you check if this problem exists? Thanks~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.