Giter Club home page Giter Club logo

avsd's Introduction

Audio-Visual Scene-Aware Dialog

code for the paper: AVSD Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa, Devi Parikh, Dhruv Batra, Anoop Cherian, Tim K. Marks, Chiori Hori

website: video-dialog.com

This code has been developed upon batra-mlp-lab/visdial-challenge-starter-pytorch

Setup

# create and activate environment
conda env create -n avsd -f=env.yml
conda activate avsd

Data

  • download 'split'.json data at: video-dialog.com
  • Extracted video, audio, and dialog features can be downloaded from here

Workflow

  • Build dialogs json file with otions using makejson_with_options.py (output: 'split'_options.json)

  • Adapt JSON format using convert_json_to_visdial_style.py (output: 'split'_options_2.json can be renamed after to 'split'_options.json)

  • Build tokenized captions, dialogs and image paths with prepro.py (output: dialogs.h5 and params.json)

  • Build the image features (if working with images) using prepro_img_vgg16.lua or prepro_img_resnet.lua from the batra-mlp-lab/visdial-challenge-starter-pytorch (output: data_img.h5)

  • Build video features I3D (output: data_video.h5) https://github.com/piergiaj/pytorch-i3d.git

  • Build audio features AENET (output: data_audio.h5) https://github.com/znaoya/aenet.git

  • Training: python train.py

  • evaluation: python evaluate.py --use_gt

If you find this code useful in your research, please consider citing:

@article{DBLP:journals/corr/abs-1901-09107,
  author    = {Huda Alamri and
               Vincent Cartillier and
               Abhishek Das and
               Jue Wang and
               Stefan Lee and
               Peter Anderson and
               Irfan Essa and
               Devi Parikh and
               Dhruv Batra and
               Anoop Cherian and
               Tim K. Marks and
               Chiori Hori},
  title     = {Audio-Visual Scene-Aware Dialog},
  journal   = {CoRR},
  volume    = {abs/1901.09107},
  year      = {2019},
  url       = {http://arxiv.org/abs/1901.09107},
  archivePrefix = {arXiv},
  eprint    = {1901.09107},
  timestamp = {Sat, 02 Feb 2019 16:56:00 +0100},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1901-09107},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

License

BSD

avsd's People

Contributors

hudaalamri avatar vcartillier3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.