Giter Club home page Giter Club logo

pythia-vqa's Introduction

Pythia-VQA

Pythia is a modular framework for vision and language multimodal research from Facebook AI Research (FAIR). Based on it, this repo conducted some baseline about VQA.

Quickstart

In this quickstart, we are going to train LoRRA model on TextVQA. Follow instructions at the bottom to train other models in Pythia.

Installation

  1. Clone Pythia repository
git clone https://github.com/facebookresearch/pythia ~/pythia
  1. Install dependencies and setup
cd ~/pythia
python setup.py develop
.. note::

  1. If you face any issues with the setup, check the Troubleshooting/FAQ section below.
  2. You can also create/activate your own conda environments before running
     above commands.

Getting Data

Datasets currently supported in Pythia require two parts of data, features and ImDB. Features correspond to pre-extracted object features from an object detector. ImDB is the image database for the datasets which contains information such as questions and answers in case of TextVQA.

For TextVQA, we need to download features for OpenImages' images which are included in it and TextVQA 0.5 ImDB. We assume that all of the data is kept inside data folder under pythia root folder. Table in bottom shows corresponding features and ImDB links for datasets supported in pythia.

cd ~/pythia;
# Create data folder
mkdir -p data && cd data;

# Download and extract the features
wget https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz
tar xf open_images.tar.gz

# Get vocabularies
wget http://dl.fbaipublicfiles.com/pythia/data/vocab.tar.gz
tar xf vocab.tar.gz

# Download detectron weights required by some models
wget http://dl.fbaipublicfiles.com/pythia/data/detectron_weights.tar.gz
tar xf detectron_weights.tar.gz

# Download and extract ImDB
mkdir -p imdb && cd imdb
wget https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz
tar xf textvqa_0.5.tar.gz

Training

Once we have the data in-place, we can start training by running the following command:

cd ~/pythia;
python tools/run.py --tasks vqa --datasets textvqa --model lorra --config \
configs/vqa/textvqa/lorra.yml

Inference

For running inference or generating predictions for EvalAI, we can download a corresponding pretrained model and then run the following commands:

cd ~/pythia/data
mkdir -p models && cd models;
wget https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pth
cd ../..
python tools/run.py --tasks vqa --datasets textvqa --model lorra --config \
configs/vqa/textvqa/lorra.yml --resume_file data/models/lorra_best.pth \
--evalai_inference 1 --run_type inference

For running inference on val set, use --run_type val and rest of the arguments remain same. Check more details in pretrained models section.

These commands should be enough to get you started with training and performing inference using Pythia.

Troubleshooting/FAQs

  1. If setup.py causes any issues, please install fastText first directly from the source and then run python setup.py develop. To install fastText run following commands:
git clone https://github.com/facebookresearch/fastText.git
cd fastText
pip install -e .

Tasks and Datasets


+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| Dataset      | Key           | Task       | ImDB Link                                                                              | Features Link                                                                   | Features checksum                  | Notes                      |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| TextVQA      | textvqa       | vqa        | `TextVQA 0.5 ImDB`_                                                                    | `OpenImages`_                                                                   | `b22e80997b2580edaf08d7e3a896e324` |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| VQA 2.0      | vqa2          | vqa        | `VQA 2.0 ImDB`_                                                                        | `COCO`_                                                                         | `ab7947b04f3063c774b87dfbf4d0e981` |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| VizWiz       | vizwiz        | vqa        | `VizWiz ImDB`_                                                                         | `VizWiz`_                                                                       | `9a28d6a9892dda8519d03fba52fb899f` |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| VisualDialog | visdial       | dialog     | Coming soon!                                                                           | Coming soon!                                                                    | Coming soon!                       |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| VisualGenome | visual_genome | vqa        | Automatically downloaded                                                               | Automatically downloaded                                                        | Coming soon!                       | Also supports scene graphs |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| CLEVR        | clevr         | vqa        | Automatically downloaded                                                               | Automatically downloaded                                                        |                                    |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| MS COCO      | coco          | captioning | `COCO Caption`_                                                                        | `COCO`_                                                                         | `ab7947b04f3063c774b87dfbf4d0e981` |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+

.. _TextVQA 0.5 ImDB: https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz
.. _OpenImages: https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz
.. _COCO: https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz
.. _VQA 2.0 ImDB: https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz
.. _VizWiz: https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz
.. _VizWiz ImDB: https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz
.. _COCO Caption: https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz

After downloading the features, verify the download by checking the md5sum using

echo "<checksum>  <dataset_name>.tar.gz" | md5sum -c -

Demo

pythia-vqa's People

Contributors

eurus-holmes avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.