transformer-tts's Introduction

Transformer-TTS

Implementation of "Neural Speech Synthesis with Transformer Network"
This is implemented for FastSpeech

Training

Download and extract the LJ Speech dataset
Make preprocessed folder in LJSpeech directory and make char_seq & phone_seq & melspectrogram folder in it
Set data_path in hparams.py as the LJSpeech folder
Using prepare_data.ipynb, prepare melspectrogram and text (converted into indices) tensors.
python train.py

Training curve (Orange: character / Blue: phoneme)

Stop prediction loss (train / val)
Guided attention loss (train / val)
L1 loss (train / val)

Alignments (Left: character / Right: phoneme)

Encoder Alignments

- Decoder Alignments

- Encoder-Decoder Alignments

- Melspectrogram (target / before / after POSTNET)

- Stop prediction

Audio Samples

You can hear the audio samples here

Notice

Unlike the original paper, I didn't use the encoder-prenet following espnet
I apply additional "guided attention loss" to the two heads of the last two layers
Batch size is important, so I use gradient accumulation
You can also use DataParallel. Change the n_gpus, batch_size, accumulation appropriately.

TODO

Dynamic batch

Fastspeech

For fastspeech, generated melspectrograms and attention matrix should be saved for later.
1-1. Set teacher_path in hparams.py and make alignments and targets directories there.
1-2. Using prepare_fastspeech.ipynb, prepare alignmetns and targets.
To draw attention plots for every each head, I change return values of the "torch.nn.functional.multi_head_attention_forward()"

#before
return attn_output, attn_output_weights.sum(dim=1) / num_heads  

#after  
return attn_output, attn_output_weights

Among num_layers*num_heads attention matrices, the one with the highest focus rate is saved.

Reference

1.NVIDIA/tacotron2: https://github.com/NVIDIA/tacotron2
2.espnet/espnet: https://github.com/espnet/espnet
3.soobinseo/Transformer-TTS: https://github.com/soobinseo/Transformer-TTS

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

x-ccs / transformer-tts Goto Github PK