Giter Club home page Giter Club logo

pacs's Introduction

PACS: A Dataset for Physical Audiovisual Common-Sense Reasoning

This repository contains data and code for our paper PACS: A Dataset for Physical Audiovisual CommonSense Reasoning.

Sample Datapoints

Setting up the Repository

It is recommended to create an Anaconda environment:

conda create --name PACS python=3.8.11
conda activate PACS
pip install -r requirements.txt

Then, install the correct version of PyTorch, based on your cuda version here. For example:

pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

Dataset Download

The dataset is available for download here.

Alternatively, if you want to replicate the original download steps, you can run the following code (this will take a while):

cd dataset/scripts
python3 download.py -data_dir PATH_TO_DATA_STORAGE_HERE
python3 preprocess.py -data_dir PATH_TO_DATA_STORAGE_HERE

Baseline Models

To run baseline models, visit the experiments folder. We have currently benchmarked the following models:

Model With Audio (%) Without Audio (%) Δ
Fusion (I+A+V) 51.9 ± 1.1 - -
Fusion (Q+I) - 51.2 ± 0.8 -
Fusion (Q+A) 50.9 ± 0.6 - -
Fusion (Q+V) - 51.5 ± 0.9 -
Late Fusion 55.0 ± 1.1 52.5± 1.6 2.5
CLIP/AudioCLIP 60.0 ± 0.9 56.3 ± 0.7 3.7
UNITER (L) - 60.6 ± 2.2 -
Merlot Reserve (B) 66.5 ± 1.4 64.0 ± 0.9 2.6
Merlot Reserve (L) 70.1 ± 1.0 68.4 ± 0.7 1.8
Majority 50.4 50.4 -
Human 96.3 ± 2.1 90.5 ± 3.1 5.9

Citation

If you used this repository or our dataset, please consider citing us:

@inproceedings{yu2022pacs,
  title={PACS: A Dataset for Physical Audiovisual CommonSense Reasoning},
  author={Yu, Samuel and Wu, Peter and Liang, Paul Pu and Salakhutdinov, Ruslan and Morency, Louis-Philippe},
  booktitle={European Conference on Computer Vision},
  year={2022}
}

pacs's People

Contributors

samuelyu2002 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

ankitshah009

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.