Light

chenchy / speechsynthesis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lifefeel/speechsynthesis

0.0 1.0 0.0 46 KB

음성합성 관련 자료 모음

License: Apache License 2.0

speechsynthesis's Introduction

Text-to-Speech Synthesis

딥러닝을 이용한 음성합성 관련 자료 모음

Lectures & Seminars

책 읽어주는 딥러닝 (김태훈, 2017.11)
- Tacotron에 대해 쉽게 이해할 수 있도록 DEVIEW 2017에서 발표한 영상
모두의 연구소 WaveNet 스터디 영상 (김승일, 2017.10)
- WaveNet에 대해 이해한 것을 설명 및 온라인 토론내용이 담긴 영상
Generative Model-Based Text-to-Speech Synthesis (Heiga Zen, 2017.02)
- WaveNet 논문 저자 중 1명인 Heiga Zen이 소개하는 TTS 전반적인 기술 및 WaveNet 소개 영상
딥러닝, 사랑하는 사람의 목소리로 말하다 - 팝톡 블로그, 2018.03.27.
- AIA 생명의 캠페인 동영상 '마지막 인사' 및 음성합성기술에 대한 블로그 포스트

Dataset

CMU_ARCTIC (en)
- CMU의 Language Technologies Institute에서 음성합성 연구를 위해 만든 US English 데이터셋
The LJ Speech Dataset (en)
- Keith Ito란 사람의 웹사이트에 올라와 있지만 어디서, 왜 만들었는지에 대한 내용은 찾지 못함
Blizzard 2012 (en)
- Blizzard Challenge 2012라는 코퍼스기반 음성합성 챌린지에서 사용된 데이터셋
CSTR VCTK Corpus (en)
- English Multi-speaker Corpus for CSTR Voice Cloning Toolkit

Tools

Festival Speech Synthesis System
- University of Edinburgh에서 개발한 오픈소스 Text-to-Speech 시스템. 최신버전은 Festival 2.5이며 2017.12.25.에 릴리즈 됨. 온라인데모에서 음성별로 들어볼 수 있음.

한국어 코퍼스

KSS Dataset: Korean Single speaker Speech Dataset

WaveNet

Paper

WaveNet: A Generative Model for Raw Audio (2016.09)

Articles

WaveNet: A Generative Model for Raw Audio (DeepMind Blog)

Source Code

Multi-GPU

WaveNet 학습시간이 너무 오래 걸려서 멀티 GPU를 이용하지 않으면 답이 나오지 않는 것 같다. 그와 관련된 코드 링크를 정리하였다.

https://github.com/nakosung/tensorflow-wavenet/tree/multigpu (Tensorflow)
- WaveNet multi GPU 구현 버전
https://github.com/nakosung/tensorflow-wavenet/tree/model_parallel (Tensorflow)
- WaveNet model parallelism 구현 버전

Fast WaveNet

Paper

Fast Wavenet Generation Algorithm (2016.11)

Articles

Source Code

Parallel WaveNet

Paper

Parallel WaveNet: Fast High-Fidelity Speech Synthesis (2017.11)

Articles

High-fidelity speech synthesis with WaveNet (DeepMind Blog)

Source Code

https://github.com/kensun0/Parallel-Wavenet (not a complete implement)

WaveRNN

Paper

Efficient Neural Audio Synthesis (2018.02)

Deep Voice

Paper

Deep Voice: Real-time Neural Text-to-Speech (2017.02)

Deep Voice 2

Paper

Deep Voice 2: Multi-Speaker Neural Text-to-Speech (2017.05)

Deep Voice 3

Paper

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning (2017.10)

Source Code

Tacotron

Paper

Tacotron: Towards End-to-End Speech Synthesis (2017.05)

Source Code

https://github.com/keithito/tacotron
https://github.com/Kyubyong/tacotron
https://github.com/barronalex/Tacotron
https://carpedm20.github.io/tacotron/ (Multi-speaker Tacotron in TensorFlow)
- Tactron 1과 Deep Voice 2의 Multi-speaker를 구현한 프로젝트

Tacotron 2

Paper

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (2017.12)

Articles

Tacotron 2: Generating Human-like Speech from Text (Google Research Blog)

Source Code

https://github.com/riverphoenix/tacotron2 (구현됨)
https://github.com/Rayhane-mamah/Tacotron-2 (구현중)
https://github.com/selap91/Tacotron2 (구현중)
https://github.com/CapstoneInha/Tacotron2-rehearsal
https://github.com/A-Jacobson/tacotron2 (PyTorch)
https://github.com/maozhiqiang/tacotron_cn (구현 확인 필요/중국어)
https://github.com/LGizkde/Tacotron2_Tao_Shujie (체크 필요)
https://github.com/ruclion/tacotron_with_style_control (Style Control)

HybridNet

HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models (2018.02) - Yanqi Zhou et al.
- WaveNet을 이용해 오디오 컨텍스트를 뽑아내고, 그 컨텍스트로부터 LSTM을 이용해 다음 샘플들을 더 빠르게 생성하도록 했다고 함. WaveNet보다 MOS가 높고, 오디오 생성속도는 동일 음질수준 대비 2~4배까지 빠르다고 함. (예: 40-layer WAVENET vs. 20-layer WAVENET + 1 LSTM)

ClariNet

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech (2018.07) - Wei Ping et al.
- Gaussian autoregressive WaveNet을 teacher-net으로 하고 Gaussian inverse autoregressive flow을 student-net으로 하여 highly picked distribution에 대해 Regularized KL divergence를 최소화 했다고 함.
- End-to-end로 음성을 생성하는 text-to-wave 아키텍쳐를 제안.

Articles

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech - Baidu Research, 2018.07.20.

Demo

Sound demos for "ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech"

Audio Systhesis

Nsynth

Voice Cloning

ISPEECH VOICE CLONING DEMOS
- 유명한 사람들의 voice cloning 데모를 들어볼 수 있음

Paper

Neural Voice Cloning with a Few Samples (2018.02)

API Service

Google Cloud Text-to-Speech API - WaveNet
- WaveNet을 이용한 음성합성기술을 API로 제공. 가격은 WaveNet과 비 WaveNet이 표면적으로는 4배가량 차이가 남. 무료 사용량의 차이도 있으므로 실제로는 4배 이상으로 보이며 가격적으로만 봐도 하드웨어적인 오버헤드가 크다는 것을 알 수 있음.
- 다양한 음성을 제공하지만 아쉽게도 아직은 한국어는 1개의 음성만 지원하며 한국어 WaveNet 음성은 아직 없음. (2018.07.31. 기준)

SSML

Speech Synthesis Markup Language (SSML)
- W3C에서 정의한 Speech Synthesis 마크업 언어. 합성할 텍스트에 발음, 볼륨, 음높이, 속도 등을 제어할 수 있도록 정의할 수 있음. 구글 TTS API에서도 SSML을 지원함.

Speed Up 전략

Fast Generation for Convolutional Autoregressive Models (2017.04) - Prajit Ramachandran et al.
- 이 기법을 Wavenet과 PixelCNN++ 모델에 적용하여 각각 최대 21배, 183배의 속도향상이 있었다고 함. 어디까지나 특정 상황에 대한 성능향상 최대치 이므로 실제 환경에서는 속도향상이 생각보다 크지 않을 수 있다는 것에 주의 필요.

speechsynthesis's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.