Giter Club home page Giter Club logo

cspaf's Introduction

CS-PAF

Credits

This repository is build upon visdial_conv (Agarwal et al.). We express our sincere gratitude to the researchers for providing their code, which has been instrumental in the development of this project.

Environment Configuration

conda create -n cspaf python=3.7
pip install -r requirements.txt
python -c "import nltk; nltk.download('all')"

Data Preparation

Dataset File Source
Visdial v1.0 features_faster_rcnn_x101_train.h5 visdial-challenge-starter-pytorch (Das et al.)
features_faster_rcnn_x101_val.h5
features_faster_rcnn_x101_test.h5
visdial_1.0_word_counts_train.json
glove.npy visdial-principles(Qi et al.)
visdial_1.0_train.json visdial official
visdial_1.0_val.json
visdial_1.0_test.json
visdial_1.0_train_dense_annotations.json
visdial_1.0_val_dense_annotations.json
VisdialConv visdial_1.0_val_crowdsourced.json subsets/visdialconv/(Agarwal et al.)
visdial_1.0_val_dense_annotations_crowdsourced.json
VisPro visdial_1.0_val_vispro.json subsets/vispro/(Agarwal et al.)
visdial_1.0_val_dense_annotations_vispro.json

Train or Finetune

bash -i scripts/cap_hist_early_fusion_disc_train.sh

We use RTX 3090 to train the model, and the batch size per gpu is 12. With a gpu count of 2, we choose a learning rate of 5e-4. The training logs and checkpoints will be saved in directory exps/exp_name.

Evaluate

bash -i scripts/cap_hist_early_fusion_disc_eval.sh

The training logs and checkpoints will be saved in directory exps/exp_name. If you want to get the results generated by EvalAI, you can submit the file exps/exp_name/ranks.json.

Attention Map Visualization (optional)

You can visit the repository Faster-R-CNN-with-model-pretrained-on-Visual-Genome which can generate 2048-d features. If you just want to quickly visualize the results of visdial v1.0, you can also visit the project from our fork version [https://github.com/chenyulu2000/Faster-R-CNN-with-model-pretrained-on-Visual-Genome]. This project has modified some bugs and can generate h5 type files for visdial v1.0 val set, which can be directly used in visual dialog visualization.

python attention_map_vis/extract_questions.py
python attention_map_vis/visualize.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.