Giter Club home page Giter Club logo

rva's Introduction

Recursive Visual Attention in Visual Dialog

This repository contains the code for the following paper:

  • Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen, Recursive Visual Attention in Visual Dialog. In CVPR, 2019. (PDF)
@InProceedings{niu2019recursive,
    author = {Niu, Yulei and Zhang, Hanwang and Zhang, Manli and Zhang, Jianhong and Lu, Zhiwu and Wen, Ji-Rong},
    title = {Recursive Visual Attention in Visual Dialog},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2019}
}

This code is reimplemented as a fork of batra-mlp-lab/visdial-challenge-starter-pytorch.

Setup and Dependencies

This code is implemented using PyTorch v1.0, and provides out of the box support with CUDA 9 and CuDNN 7. Anaconda/Miniconda is the recommended to set up this codebase:

Anaconda or Miniconda

  1. Install Anaconda or Miniconda distribution based on Python3+ from their downloads' site.
  2. Clone this repository and create an environment:
git clone https://www.github.com/yuleiniu/rva
conda create -n visdial-ch python=3.6

# activate the environment and install all dependencies
conda activate visdial-ch
cd rva/
pip install -r requirements.txt

# install this codebase as a package in development version
python setup.py develop

Download Data

  1. Download the VisDial v1.0 dialog json files from here and keep it under $PROJECT_ROOT/data directory, for default arguments to work effectively.

  2. Get the word counts for VisDial v1.0 train split here. They are used to build the vocabulary.

  3. batra-mlp-lab provides pre-extracted image features of VisDial v1.0 images, using a Faster-RCNN pre-trained on Visual Genome. If you wish to extract your own image features, skip this step and download VisDial v1.0 images from here instead. Extracted features for v1.0 train, val and test are available for download at these links. Note that these files do not contain the bounding box information.

  1. batra-mlp-lab also provides pre-extracted FC7 features from VGG16.
  1. Download the GloVe pretrained word vectors from here, and keep glove.6B.300d.txt under $PROJECT_ROOT/data directory.

Extracting Features (Optional)

With Docker (Optional)

For Dockerfile, please refer to batra-mlp-lab/visdial-challenge-starter-pytorch.

Without Docker (Optional)

  1. Set up opencv, cocoapi and Detectron.

  2. Prepare the MSCOCO and Flickr images.

  3. Extract visual features.

python ./data/extract_features_detectron.py --image-root /path/to/MSCOCO/train2014/ /path/to/MSCOCO/val2014/ --save-path /path/to/feature --split train # Bottom-up features of 36 proposals from images of train split.
python ./data/extract_features_detectron.py --image-root /path/to/Flickr/VisualDialog_val2018 --save-path /path/to/feature --split val # Bottom-up features of 36 proposals from images of val split.
python ./data/extract_features_detectron.py --image-root /path/to/Flickr/VisualDialog_test2018 --save-path /path/to/feature --split test # Bottom-up features of 36 proposals from images of test split.

Initializing GloVe Word Embeddings

Simply run

python data/init_glove.py

Training

Train the model provided in this repository as:

python train.py --config-yml configs/rva.yml --gpu-ids 0 # provide more ids for multi-GPU execution other args...

Saving model checkpoints

This script will save model checkpoints at every epoch as per path specified by --save-dirpath. Refer visdialch/utils/checkpointing.py for more details on how checkpointing is managed.

Logging

We use Tensorboard for logging training progress. Recommended: execute tensorboard --logdir /path/to/save_dir --port 8008 and visit localhost:8008 in the browser.

Evaluation

Evaluation of a trained model checkpoint can be done as follows:

python evaluate.py --config-yml /path/to/config.yml --load-pthpath /path/to/checkpoint.pth --split val --gpu-ids 0

This will generate an EvalAI submission file, and report metrics from the Visual Dialog paper (Mean reciprocal rank, R@{1, 5, 10}, Mean rank), and Normalized Discounted Cumulative Gain (NDCG), introduced in the first Visual Dialog Challenge (in 2018).

The metrics reported here would be the same as those reported through EvalAI by making a submission in val phase. To generate a submission file for test-std or test-challenge phase, replace --split val with --split test.

rva's People

Contributors

yuleiniu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

rva's Issues

The file 'init_glove.py' is missing.

Hi Yulei,

Thanks for your sharing of this copy of code. I am running the code as a baseline. In the Glove embedding initialization step,

python data/init_glove.py

I think the file 'init_glove.py' is missing.

Some questions about the image features

Hello, i want to ask about the image features which are extracted from the detectron. When i extract them, i find their shapes are (36,1024), but the features' shapes which are extracted by batra-mlp-lab are (36,2048). So my question is how to deal with this problem? Thanks.

the loss is NAN

Follow the operation process of github, the loss will be nan at 5 epoch. I lower the learning rate, but the same situation will occur.
image

Parallelization error

Hi,

Did you also face this parallelization error while training? This seems to be strange since I am using in fact only one gpu-id!

  File " /home/shubham/rva/visdialch/encoders/modules.py", line 233, in forward
    ques_prob_refine = torch.bmm(ques_gs[:, i, :].view(-1, 1, 2), ques_prob).view(-1, 1, 2) # shape: [batch_size, num_rounds, 2]
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:479

Any suggestions please

Extracting features

Under Extracting Features (Optional), you have this step:

Prepare the MSCOCO and Flickr images.

  • Where do I find the Flickr images? Could you remind me if that is the valid and test set?
  • MSCOCO images: did you download the train images from here? These are raw images, right? You didnt take any keypoints/captioning? So we have to download all the train/valid/test set?

Any preprocessing step required after that?

Thanks for your help :)

Question about Gumbel Sampling

Hi,

I find the distribution, i.e. pi in Eq. (19) and Eq. (20) should use log before added to g. However, I have checked your codes, I find you do not use log on pi. Does it matter?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.