Giter Club home page Giter Club logo

marnet's Introduction

Multimodal Attentive Representation Learning for Micro-video Multi-label Classification

Abstract

As one of the representative types of user-generated contents (UGCs) in social platforms, micro-videos have been becoming popular in our daily life. Although micro-videos naturally exhibit multimodal features that are rich enough to support representation learning, the complex correlations across modalities render valuable information difficult to integrate. In this paper, we introduced a multimodal attentive representation Network (MARNET) to learn complete and robust representations to benefit micro-video multi-label classification. To address the commonly missing modality issue, we presented a multimodal information aggregation mechanism module to integrate multimodal information, where latent common representations are obtained by modeling the complementarity and consistency in terms of visual-centered modality groupings instead of single modalities. For the label correlation issue, we designed an attentive graph neural network module to adaptively learn the correlation matrix and representations of labels for better compatibility with training data. In addition, a cross-modal multi-head attention module is developed to make the learned common representations label-aware for multi-label classification. Experiments conducted on two micro-video datasets demonstrate the superior performance of MARNET compared with state-of-the-art methods.

Method

MARNet

Results

Classification performance comparison between MARNET and several state-of-the-art methods on the MTSVRC micro-video dataset. MTSVRC

Classification performance comparison between MARNET and several state-of-the-art methods on the manually labeled UCF101 dataset. UCF101

Run

add args config

    train_label_dir: your training label path.
    train_visual_dir: your training vision modality path.
    train_audio_dir: your training audio modality path.
    train_tra_dir: your training trajectory modality path.
    test_label_dir: your test label path.
    test_visual_dir: your test vision modality path.
    test_audio_dir: your test audio modality path.
    test_tra_dir: your test trajectory modality path.
    labelgcn_name: your labelVectorization path.
    adj: your adjacency matrix path.
    logger_name: your tensorboard log dir.
    curve_tensorb: your visualization curve dir.
    log_dir: your log dir.
    correlation_matrix: your generate path of correlation_matrix.
    representions: your representation path.

Train

python train.py your_args_path

Test

python train.py your_args_path

Citation

Jing P, Liu X, Zhang L, et al. Multimodal Attentive Representation Learning for Micro-video Multi-label Classification[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024.

@article{MARNet,
author = {Jing, Peiguang and Liu, Xianyi and Zhang, Lijuan and Li, Yun and Liu, Yu and Su, Yuting},
title = {Multimodal Attentive Representation Learning for Micro-video Multi-label Classification},
year = {2024},
issue_date = {June 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {20},
number = {6},
issn = {1551-6857},
url = {https://doi.org/10.1145/3643888},
doi = {10.1145/3643888},
journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
month = {mar},
articleno = {182},
numpages = {23},
keywords = {Micro-video, multimodal representations, multi-label, graph network}
}

marnet's People

Watchers

goog avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.