Giter Club home page Giter Club logo

emo-stargan's Introduction

Emo-StarGAN

This repository contains the source code of the paper Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion, accepted in Interspeech 2023. An overview of the method and the results can be found here.

Concept of our method. For details we refer to our paper at .....

Highlights:

Samples

Samples can be found here.

Demo

The demo can be found at Demo/EmoStarGAN Demo.ipynb.

Pre-requisites:

  1. Python >= 3.9
  2. Install the python dependencies mentioned in the requirements.txt

Training:

Before Training

  1. Before starting the training, please specify the number of target speakers in num_speaker_domains and other details such as training and validation data in the config file.
  2. Download VCTK and ESD datasets. For VCTK dataset preprocessing is needed, which can be carried out using Preprocess/getdata.py. The dataset paths need to be adjusted in train train_list.txt and validation val_list.txt lists present in Data/.
  3. Download and copy the emotion embeddings weights to the folder Utils/emotion_encoder
  4. Download and copy the vocoder weights to the folder Utils/Vocoder

Train

python train.py --config_path ./Configs/speaker_domain_config.yml

Model Weights

The Emo-StarGAN model weight can be downloaded from here.

Common Errors

When the speaker index in train_list.txt or val_list.txt is greater than the number of speakers ( the hyperparameter num_speaker_domains mentioned in speaker_domain_config.yml), the following error is encountered:

[train]:   0%| | 0/66 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [0,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

Also note that the speaker index starts with 0 (not with 1!) in the training and validation lists.

References and Acknowledgements

emo-stargan's People

Contributors

arnabdas8901 avatar suhitaghosh10 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

emo-stargan's Issues

Readme

Hello,
Thank you for your great work!

Is this possible to add some guidelines for running experiment ?

Best regards,

config.yml

Dear suhitaghosh10, thank you very much for sharing your research and publish results!
I'm trying to launch the Demo EmoStarGAN Demo.ipynb notes and it seems like config.yml file is missing, could you please provide the config file?
Thank you very much once again

Speaker and Emotional domain training

You've got 2 config files: speaker domain config for training on VCTK dataset, emotional domain config for training on ESD, yet training instructions mention ONLY speaker domain training.

Would you explain, please?

Task Conversion

Can this model be used for voice conversion tasks that preserve accents?

Is splitting by 5 sec neccesary?

I understand you took it from StarGanVC2, but still how does variable length (3..10 seconds) influence training vs fixed 5 sec length?

GRU and Transformer Encoder Layer

Hello , thank you for your research, very interesting results.
I noticed you have some line of code in the generator part which you actually don't use, but published.
Could you please elaborate a bit on GRU and Encoder methods and how you used them?

Evaluation index issues

I would like to ask how to calculate the evaluation indicators MAE and Acc in the paper, and is there any corresponding code?Thank you very much for your answer.

Emotion transfer inference

Good day, thank you very much for your results!

I have a question concerning the emotion inference, would be nice to have a brief illustration of emotion transfer inference in your Demo code, do you plan to include it as well?

Kind regards,
V.D.

emotional transfer for target speakers with only neutral data

Hello!

First of all, congrats for your awesome work!

I am playing with your demo and trying to convert an emotional source utterance (from ESD) to a target speaker that only has neutral data (from VCTK) while preserving the source's emotion, but unfortunately I found the outputs to be always neutral.

I was wondering if you tried any experiment in this direction, since from the paper, as far as I understood, for a target neutral speaker you only tried with a neutral source utterance (VCTK -> VCTK).

(When the output is from a speaker that has already seen emotional data, the conversion is indeed neat!)

Thanks!

Hello, I would like to inquire about the experimental results.

Hello, thank you very much for sharing the code. I noticed that you have two papers with the same task, and the other one is: StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings. But the experimental results are shown in Table 2,The ACC results are different, with some being slightly worse. What is the reason?

How well does it generalise for A2M?

Looking at the results of subjective evaluation there's no SMOS (similarity) results. How well does your model actually genrealise for A2M tasks?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.