Giter Club home page Giter Club logo

mwafm's Introduction

Audio Question Answering (AQA)

PyTorch code accompanies our Interspeech 2023 paper:

Multi-Scale Attention for Audio Question Answering [arXiv]

Guangyao Li, Yixin Xu and Di Hu


Requirements

python3.6 +
pytorch1.6.0
tensorboardX
ffmpeg

Usage

  1. Clone this repo

    https://github.com/GeWu-Lab/MWAFM.git
  2. Download data

    Clotho-AQA and AQA-MUSIC-AVQA

  3. Data pre-processing

    We follow exact the same setting data format as MUSIC AVQA.

    Notice: We examined the original annotation files of Clotho-AQA and found that the official open-source annotations were not cleansed, resulting in discrepancies where different annotators provided different answers for the same question. As a result, we performed a simple filtering process where we considered a question to have the correct answer if it had at least two identical answers Based on this filtering process, we obtained a new and more accurate annotation file. The files in 'metadata' folder are described as follows

    • 'single_word_[train/val/test].csv', Does not contain samples with answers yes and no.
    • 'single_word_[train/val/test]_clean.csv', Does not contain samples with answers yes and no. (Cleaned data)
    • 'clotho_aqa_[train/val/test]_clean.csv', Contains samples with answers yes and no. (Cleaned data)
    • 'binary_[train/val/test]_clean.csv', Include only samples with answers yes and no. (Cleaned data)
  4. Train and evaluate

    Training

    python main_MWAFM.py --mode train

    Testing

    python main_MWAFM.py --mode test

Citation

If you find this work useful, please consider citing it.


@ARTICLE{Li2023MultiScale,
  title	= {Multi-Scale Attention for Audio Question Answering},
  author	= {Guangyao li, Yixin Xu, Di Hu},
  journal	= {Proc. INTERSPEECH},
  year	= {2023},
}

Acknowledgement

This research was supported by Public Computing Cloud, Renmin University of China.

mwafm's People

Contributors

ayameyao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

mymuli

mwafm's Issues

Meaning of metadata internal file

Thanks a lot for your great contribution, but I'm having some issues reproducing your work. For the metadata part in the code base, does the source data contained inside come from the Clotho-AQA dataset or the AQA-MUSIC-AVQA dataset? Do the files starting with single_word, binary_test, clothho_aqa and other flags have any special meaning? Looking forward to your reply, thank you again!

ClothoAQA Config

Hi, thanks a lot for your great contribution!

I was wondering whether you could provide some more information regarding the configuration used in the paper for ClothoAQA, i.e., which data and mapping (among the ones listed in the metadata folder) you exactly used to train and test your models, and obtain the results shown in Table 2.

Thank you in advance!

Questions about evaluation indicators

Thank you very much for your outstanding contribution to the open source community, but I noticed that your evaluation index is different from the paper that proposed the Clotho-AQA dataset, namely Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering. They claim that they have achieved an accuracy rate higher than 0.6. I don't know if this is because this article mixes the "yes" and "no" binary labels with other multi-dimensional labels. I hope the author can further explain, Thanks again for your contribution!

Meta files of Audio-MUSIC-AVQA

Hi Guangyao,

Thanks for your great work! It seems that the folder metadata/ only contains annotations of ClothoAQA. Could you please share the annotation files of Audio-MUSIC-AVQA?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.