Giter Club home page Giter Club logo

rhemo's Introduction

Learning Speech Emotion Representations in the Quaternion Domain

This repository supports the paper "Learning Speech Emotion Representations in the Quaternion Domain" submitted to IEEE/ACM Transactions of Audio, Speech and Language processing. Here you can find easy instructions for the download of the required data and our pre-trained weights, for training from scratch RH-emo on IEMOCAP and for the application of our approach to a generic speech emotion recognition dataset.

Installation

Our code is based on Python 3.7.

To install all required dependencies run:

pip install -r requirements.txt

Data download and preprocessing

  • Follow these instructions to download the Iemocap dataset: https://sail.usc.edu/iemocap/
  • Put the path to the downloaded dataset in the input_iemocap_folder variable in the preprocessing_config.ini file.
  • Run the following command to pre-process the dataset:
python3 preprocessing.py

It is possible to download our pre-trained RH-emo weights with this command:

python3 download_weights.py

These weights are also available for manual download here.

If you use our pretrained weights skip the following section.

RH-emo pretraining

Once downloaded and preprocessed Iemocap it is possible to run the RH-emo pretraining from scratch with this command:

python3 exp_instance.py --ids [1] --last 2 --gpu_id 0

This script will run the training training_RHemo.py with our best hyperparameters, which are specified in the configuration file experiments/1_RHemo_train_onlyrecon.txt. Two consecutive trainings are launched: without and with the emotion classification term in the loss function, as explained in the paper. When the trainings finish, a metrics spreadsheet is saved in the results folder. The results will match the ones exposed in the original paper.

Method application

With a pretrained RH-emo network it is possible to use quaternion-valued networks for speech emotion recognition starting from monoaural spectrograms. It is sufficient to call the function get_embeddings() on a pretrained RH-emo as a preprocessing step before the forward propagation. We provide quaternion implementations of the AlexNet, ResNet50 and VGG16 networks. An example in pseudocode:

import torch
from models import *

quaternion_processing = True

model = resnet50(quat=quaternion_processing)
if quaternion_processing:
    r2he = simple_autoencoder_2_vad()
    r2he.load_state_dict(pretrained_dict_r2he, strict=False)

for e in epochs:
  for i, (sounds, truth) in enumerate(dataloader):
        optimizer.zero_grad()
        if quaternion_processing:
            with torch.no_grad():
                sounds, _, _, _, _ = r2he.get_embeddings(sounds)
        pred = model(sounds)
        loss = loss_function(pred, truth)
        loss.backward()
        optimizer.step()

You can run our speech emotion recognition training on Iemocap with this command:

python3 exp_instance.py --ids [2] --last 3 --gpu_id 0

The script will launch 3 consecutive trainings using the quaternion AlexNet, ResNet50 and VGG16 with Iemocap and will return a metrics spreadsheet that can be found in the results folder. The results will match the ones exposed in the original paper.

rhemo's People

Contributors

ericguizzo avatar dcomminiello avatar

Stargazers

Angxiao Yue avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.