Giter Club home page Giter Club logo

multimodal-emotion-recognition's Introduction

End-to-End Multimodal Emotion Recognition using Deep Neural Networks

This package provides training and evaluation code for the end-to-end multimodal emotion recognition paper. If you use this codebase in your experiments please cite:

P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. Schuller and S. Zafeiriou, "End-to-End Multimodal Emotion Recognition using Deep Neural Networks," in IEEE Journal of Selected Topics in Signal Processing, vol. PP, no. 99, pp. 1-1. (http://ieeexplore.ieee.org/document/8070966/)

UPDATE

Implementation of this method in PyTorch (along with pretrain models) can be found in our End2You toolkit

Requirements

Below are listed the required modules to run the code.

  • Python <= 2.7
  • NumPy >= 1.11.1
  • TensorFlow <= 0.12
  • Menpo >= 0.6.2
  • MoviePy >= 0.2.2.11

Content

This repository contains the files:

  • model.py: contains the audio and video networks.
  • emotion_train.py: is in charge of training.
  • emotion_eval.py: is in charge of evaluating.
  • data_provider.py: provides the data.
  • data_generator.py: creates the tfrecords from '.wav' files
  • metrics.py: contains the concordance metric used for evaluation.
  • losses.py: contains the loss function of the training.
  • inception_processing.py: provides functions for visual regularization.

The multimodal model can be downloaded from here : https://www.doc.ic.ac.uk/~pt511/emotion_recognition_model.zip

multimodal-emotion-recognition's People

Contributors

tzirakis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multimodal-emotion-recognition's Issues

Recola training data

hey,
Two questions about the training data, hope you can help. Thanks in advance.

  1. the Recola dataset that I downloaded doesn't include the same amount data that you have. Did you try to include additional data that collected by yourself?

In your code, you have:
train = [25, 15, 16 ,17 ,18 ,21 ,23 ,37 ,39 ,41 ,46 ,50 ,51 ,55 ,56, 60], # 25
valid = [14, 19, 24, 26, 28, 30, 34 ,40, 42, 43, 44, 45, 52, 64, 65],
test = [54, 53, 13, 20, 22, 32, 38, 47, 48, 49, 57, 58, 59, 62, 63] # 54, 53
What I got are:
Train = [ 16,21,23,25,37,39,41,46,56]
Valid = [19,26,28,30,34,42,45,64,65]
Test=[13,20,32,38,47,49,53,59,63]

Did I miss anything here?

  1. Annotated data - are you using the annotation from the Recola database? Or the annotation has been post-processed by you?
    In your code, you have:
    def get_samples(subject_id):
    arousal_label_path = root_dir / 'Ratings_affective_behaviour_CCC_centred/arousal/{}.csv'.format(subject_id)
    valence_label_path = root_dir / 'Ratings_affective_behaviour_CCC_centred/valence/{}.csv'.format(subject_id)
    arousal = np.loadtxt(str(arousal_label_path), delimiter=',')[:, 1][1:] #this only reads the annotation from the 1st external annotator.

but what I got is something like /recola/AVEC_2016/ratings_individual/arousal/dev_1.csv, the format in the csv file is something as below:
time | FM1 | FM2 | FM3 | FF1 | FF2 | FF3
0 | 0 | 0 | 0 | 0 | 0 | 0
arousal = np.loadtxt(str(arousal_label_path), delimiter=',')[:, 1][1:] #this only reads the annotation from the 1st external annotator.

about Fig.2

Greetings, I am very interested in your research, I would like to know please how figure 2 is drawn and how it should be analyzed, thank you

Drop out as first layer of audio model?

Hi,
I was trying to understand the code. If I am not wrong, in models.py, in 'audio model', is the network having first layer as dropout? I have only seen drop out being used in dense layers. I would like to know why it is done like this here.

Thanks,

Question about loss function

Hi, I read about your work and it's great! I have one question related to the loss function that you used in your code, which is 1-ccc.
I have previous experience of using 1-ccc as loss function as well, but it turns out not a very good loss function to use. What I see is that the loss on train set drops gradually, but the loss on validation set keeps fluctuating, and it won't drop finally. In the end, I have to use mse instead.
So I am wondering, have you ever encountered this kind of problem during training with (1-ccc) loss function? If so , how do you solve it?

Method get_jpg_string() usage.

Hello, thanks for your work.

In file data_provider.py we prepare a tfrecords files, but we preprocess only audio data:
features=tf.train.Features(feature={
'sample_id': _int_feauture(i),
'subject_id': _int_feauture(subject_id),
'label': _bytes_feauture(label.tobytes()),
'raw_audio': _bytes_feauture(audio.tobytes()),
}))

But in fle data_generator.py we loading tfrecords with frames data:
features={
'raw_audio': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.string),
'subject_id': tf.FixedLenFeature([], tf.int64),
'frame': tf.FixedLenFeature([], tf.string),
}

The question is: when and where we encode frame data?

Where is the checkpoint file?

Hi,
I am a beginner of deep learning. And I downloaded the pretrained model from README, but I can't find checkpoint.
So, when I use saver.restore(sess, './emotion_recognition_model/')
The error below ocurred
ValueError: The passed save_path is not a valid checkpoint: ./emotion_recognition_model/
image

I hope this message find you well, Your answer will be of great help
Thanks

How to preprocess a subject's video?

According to the paper, the cropped faces of subject's video are taken as input to the video network. My question is about the preprocessing pipeline. Which method or tool do you use to crop faces? How to align these faces? How to deal with the case where the facial landmark detection failed?
In data_provider.py, a sample is composed by the [frame, audio_sample, lable, subject_id] tensors, but in data_generator.py, there is no way to compute the frame tensor.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.