tzirakis / multimodal-emotion-recognition Goto Github PK

This repository contains the code for the paper `End-to-End Multimodal Emotion Recognition using Deep Neural Networks`.

Home Page: http://ieeexplore.ieee.org/document/8070966/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

multimodal-emotion-recognition's Introduction

End-to-End Multimodal Emotion Recognition using Deep Neural Networks

This package provides training and evaluation code for the end-to-end multimodal emotion recognition paper. If you use this codebase in your experiments please cite:

P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. Schuller and S. Zafeiriou, "End-to-End Multimodal Emotion Recognition using Deep Neural Networks," in IEEE Journal of Selected Topics in Signal Processing, vol. PP, no. 99, pp. 1-1. (http://ieeexplore.ieee.org/document/8070966/)

UPDATE

Implementation of this method in PyTorch (along with pretrain models) can be found in our End2You toolkit

Requirements

Below are listed the required modules to run the code.

Python <= 2.7
NumPy >= 1.11.1
TensorFlow <= 0.12
Menpo >= 0.6.2
MoviePy >= 0.2.2.11

Content

This repository contains the files:

model.py: contains the audio and video networks.
emotion_train.py: is in charge of training.
emotion_eval.py: is in charge of evaluating.
data_provider.py: provides the data.
data_generator.py: creates the tfrecords from '.wav' files
metrics.py: contains the concordance metric used for evaluation.
losses.py: contains the loss function of the training.
inception_processing.py: provides functions for visual regularization.

The multimodal model can be downloaded from here : https://www.doc.ic.ac.uk/~pt511/emotion_recognition_model.zip

multimodal-emotion-recognition's People

Contributors

Stargazers

Watchers

multimodal-emotion-recognition's Issues

Recola training data

hey,
Two questions about the training data, hope you can help. Thanks in advance.

the Recola dataset that I downloaded doesn't include the same amount data that you have. Did you try to include additional data that collected by yourself?

In your code, you have:
train = [25, 15, 16 ,17 ,18 ,21 ,23 ,37 ,39 ,41 ,46 ,50 ,51 ,55 ,56, 60], # 25
valid = [14, 19, 24, 26, 28, 30, 34 ,40, 42, 43, 44, 45, 52, 64, 65],
test = [54, 53, 13, 20, 22, 32, 38, 47, 48, 49, 57, 58, 59, 62, 63] # 54, 53
What I got are:
Train = [ 16,21,23,25,37,39,41,46,56]
Valid = [19,26,28,30,34,42,45,64,65]
Test=[13,20,32,38,47,49,53,59,63]

Did I miss anything here?

Annotated data - are you using the annotation from the Recola database? Or the annotation has been post-processed by you?
In your code, you have:
def get_samples(subject_id):
arousal_label_path = root_dir / 'Ratings_affective_behaviour_CCC_centred/arousal/{}.csv'.format(subject_id)
valence_label_path = root_dir / 'Ratings_affective_behaviour_CCC_centred/valence/{}.csv'.format(subject_id)
arousal = np.loadtxt(str(arousal_label_path), delimiter=',')[:, 1][1:] #this only reads the annotation from the 1st external annotator.

but what I got is something like /recola/AVEC_2016/ratings_individual/arousal/dev_1.csv, the format in the csv file is something as below:
time | FM1 | FM2 | FM3 | FF1 | FF2 | FF3
0 | 0 | 0 | 0 | 0 | 0 | 0
arousal = np.loadtxt(str(arousal_label_path), delimiter=',')[:, 1][1:] #this only reads the annotation from the 1st external annotator.

about Fig.2

Greetings, I am very interested in your research, I would like to know please how figure 2 is drawn and how it should be analyzed, thank you

ModuleNotFoundError: No module named 'inception_processing'

hi how can i solve it?
im new here

Drop out as first layer of audio model?

Hi,
I was trying to understand the code. If I am not wrong, in models.py, in 'audio model', is the network having first layer as dropout? I have only seen drop out being used in dense layers. I would like to know why it is done like this here.

Thanks,

Question about loss function

Hi, I read about your work and it's great! I have one question related to the loss function that you used in your code, which is 1-ccc.
I have previous experience of using 1-ccc as loss function as well, but it turns out not a very good loss function to use. What I see is that the loss on train set drops gradually, but the loss on validation set keeps fluctuating, and it won't drop finally. In the end, I have to use mse instead.
So I am wondering, have you ever encountered this kind of problem during training with (1-ccc) loss function? If so , how do you solve it?

Method get_jpg_string() usage.

Hello, thanks for your work.

In file data_provider.py we prepare a tfrecords files, but we preprocess only audio data:
features=tf.train.Features(feature={
'sample_id': _int_feauture(i),
'subject_id': _int_feauture(subject_id),
'label': _bytes_feauture(label.tobytes()),
'raw_audio': _bytes_feauture(audio.tobytes()),
}))

But in fle data_generator.py we loading tfrecords with frames data:
features={
'raw_audio': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.string),
'subject_id': tf.FixedLenFeature([], tf.int64),
'frame': tf.FixedLenFeature([], tf.string),
}

The question is: when and where we encode frame data?

Where is the checkpoint file?

Hi,
I am a beginner of deep learning. And I downloaded the pretrained model from README, but I can't find checkpoint.
So, when I use saver.restore(sess, './emotion_recognition_model/')
The error below ocurred
ValueError: The passed save_path is not a valid checkpoint: ./emotion_recognition_model/

I hope this message find you well, Your answer will be of great help
Thanks

How to preprocess a subject's video?

According to the paper, the cropped faces of subject's video are taken as input to the video network. My question is about the preprocessing pipeline. Which method or tool do you use to crop faces? How to align these faces? How to deal with the case where the facial landmark detection failed?
In data_provider.py, a sample is composed by the [frame, audio_sample, lable, subject_id] tensors, but in data_generator.py, there is no way to compute the frame tensor.