mwafm's Introduction

Audio Question Answering (AQA)

PyTorch code accompanies our Interspeech 2023 paper:

Multi-Scale Attention for Audio Question Answering [arXiv]

Requirements

python3.6 +
pytorch1.6.0
tensorboardX
ffmpeg

Usage

Clone this repo
```
https://github.com/GeWu-Lab/MWAFM.git
```
Download data

Clotho-AQA and AQA-MUSIC-AVQA
Data pre-processing

We follow exact the same setting data format as MUSIC AVQA.

Notice: We examined the original annotation files of Clotho-AQA and found that the official open-source annotations were not cleansed, resulting in discrepancies where different annotators provided different answers for the same question. As a result, we performed a simple filtering process where we considered a question to have the correct answer if it had at least two identical answers Based on this filtering process, we obtained a new and more accurate annotation file. The files in 'metadata' folder are described as follows
- 'single_word_[train/val/test].csv', Does not contain samples with answers yes and no.
- 'single_word_[train/val/test]_clean.csv', Does not contain samples with answers yes and no. (Cleaned data)
- 'clotho_aqa_[train/val/test]_clean.csv', Contains samples with answers yes and no. (Cleaned data)
- 'binary_[train/val/test]_clean.csv', Include only samples with answers yes and no. (Cleaned data)

Train and evaluate

Training

python main_MWAFM.py --mode train

Testing

python main_MWAFM.py --mode test

Citation

If you find this work useful, please consider citing it.


@ARTICLE{Li2023MultiScale,
  title	= {Multi-Scale Attention for Audio Question Answering},
  author	= {Guangyao li, Yixin Xu, Di Hu},
  journal	= {Proc. INTERSPEECH},
  year	= {2023},
}

Acknowledgement

This research was supported by Public Computing Cloud, Renmin University of China.

mwafm's People

Contributors

Stargazers

Watchers

mwafm's Issues

Meaning of metadata internal file

Thanks a lot for your great contribution, but I'm having some issues reproducing your work. For the metadata part in the code base, does the source data contained inside come from the Clotho-AQA dataset or the AQA-MUSIC-AVQA dataset? Do the files starting with single_word, binary_test, clothho_aqa and other flags have any special meaning? Looking forward to your reply, thank you again!

ClothoAQA Config

Hi, thanks a lot for your great contribution!

I was wondering whether you could provide some more information regarding the configuration used in the paper for ClothoAQA, i.e., which data and mapping (among the ones listed in the metadata folder) you exactly used to train and test your models, and obtain the results shown in Table 2.

Thank you in advance!

Questions about evaluation indicators

Thank you very much for your outstanding contribution to the open source community, but I noticed that your evaluation index is different from the paper that proposed the Clotho-AQA dataset, namely Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering. They claim that they have achieved an accuracy rate higher than 0.6. I don't know if this is because this article mixes the "yes" and "no" binary labels with other multi-dimensional labels. I hope the author can further explain, Thanks again for your contribution!

Meta files of Audio-MUSIC-AVQA

Hi Guangyao,

Thanks for your great work! It seems that the folder metadata/ only contains annotations of ClothoAQA. Could you please share the annotation files of Audio-MUSIC-AVQA?

Recommend Projects

gewu-lab / mwafm Goto Github PK

mwafm's Introduction

Audio Question Answering (AQA)

Requirements

Usage

Citation

Acknowledgement

mwafm's People

Contributors

Stargazers

Watchers

Forkers

mwafm's Issues

Meaning of metadata internal file

ClothoAQA Config

Questions about evaluation indicators

Meta files of Audio-MUSIC-AVQA

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent