Giter Club home page Giter Club logo

emotiw's Introduction

Fusical: Multimodal Fusion for Video Sentiment

Introduction

Despite cultural diversity, emotions are universal. We will undertake the EmotiW 2020 challenge, doing group-level sentiment recognition on videos from across the globe. Given short clips, the goal is to predict whether the video sentiment is positive, neutral, or negative. This problem is interesting because audio-visual sentiment analysis has implications in psychology and mental health.

Our paper was accepted at the 22nd ACM International Conference on Multimodal Interaction: https://dl.acm.org/doi/10.1145/3382507.3417966

Dataset

We worked with the EmotiW 2020 dataset.

Sample image

Architecture

We ensembled models from four modalities: overall scene, pose, audio, and facial.

Ensemble Architecture

Getting Started

To start, please check out our presentation and slide deck.

Code Layout

The code is organized as follows:

  • src/ - preprocessing, generation, and classification code
  • notebooks/ - notebooks for training and prediction

Try it out

Run this notebook to see how our model works on the dataset.

Model Emporium

We provide many of the models we trained here.

Results

Final Ensemble

Best Submission Confusion Matrix

Ablation Study

Ablation Confusion Matrices

Table - Individual Modalities

Results as reported based on the EmotiW 2020 validation dataset.

Modality Accuracy F1-Score
Scene 0.546 0.541
Pose 0.486 0.489
Audio 0.577 0.577
Face 0.4 0.348
Image Captioning 0.505 0.506

Table - Final Ensemble

Dataset Accuracy
Validation 0.640
Test 0.639

"Saliency" Map

Ablation Confusion Matrices

The Team

Boyang Tom Jin, Leila Abdelrahman, Cong Kevin Chen, Amil Khanzada
CS231n - Stanford University

Citing Our Work

@misc{2020fusical,
  author =       {Boyang Tom Jin and Leila Abdelrahman and Cong Kevin Chen and
                  Amil Khanzada},
  title =        {Fusical},
  howpublished = {\url{https://github.com/kevincong95/cs231n-emotiw}},
  year =         {2020}
}

emotiw's People

Contributors

leilaabdel avatar tbj128 avatar amilkh avatar kevincong95 avatar vincentla avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.