Giter Club home page Giter Club logo

tacotron2's Introduction

Tacotron2

im

NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM PREDICTIONS https://arxiv.org/pdf/1712.05884.pdf

WaveNet: A Generative Model for Raw Audio https://arxiv.org/abs/1609.03499

Contents

  • Simple LJ Speech DataLoader
  • Mel Spectrogram Prediction network (text to Spectrogram)
  • [TODO] WaveNet Vocoder (Spectrogram to raw audio)

Status

  • Spectrogram network is functional but not fully trained. The model takes ~3 hours per epoch on an M6000 gpu.

Setup

  1. install pytorch and torchvision:
conda install pytorch -c pytorch
  1. install other requirements:
pip install -r requirements.txt

Usage

train Spectrogram Prediction Network

python train.py

view logs in Tensorboard

tensorboard --logdir runs

im

im

Wavenet Resources

https://r9y9.github.io/wavenet_vocoder/ https://twitter.com/heiga_zen/status/832145314559750145 http://musyoku.github.io/2016/09/18/wavenet-a-generative-model-for-raw-audio/ https://www.slideshare.net/danilosoba1/generative-model-based-texttospeech

tacotron2's People

Contributors

a-jacobson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tacotron2's Issues

Information sharing

Hello @A-Jacobson.

Great work with your implementation and more importantly with you clear representation of the model in your README (100% better that the one presented in the paper x) ).

So I am actually also working on Tacotron 2 implementation (in tensorflow) and there are few things I wanted to check with you, maybe we could help each other out. (implementation here)

  • Does your attention mechanism work? Mine (based on the original tacotron) doesn't seem to capture the alignment correctly.
  • Does your loss decrease insanely fast?

Again, impressive work.

RuntimeError: expand(torch.cuda.FloatTensor{[12, 1]}, size=[12]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

Hi,
I am encountering this error in the decoding-helpers file. When I googled, it suggested me to install pytorch from source and so I did.
It is still showing me this error.
Can you help me?
stop_tokens[t] = stop_token
RuntimeError: expand(torch.cuda.FloatTensor{[12, 1]}, size=[12]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.