Light

appleholic / multiband_melgan Goto Github PK

View Code? Open in Web Editor NEW

44.0 4.0 6.0 43.47 MB

An unofficial implementation of https://arxiv.org/abs/2005.05106

License: MIT License

Python 100.00%

multiband-melgan pytorch audio neural-vocoder

multiband_melgan's Introduction

Multi-Band MelGAN

It's an naive implementation of Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech.

Goals

Comparable Quality with other vocoders
Mobile Inference Example

TODOs

Make inference code & example.
Enhance vocoder quality.

Prerequisite

install pytorch_sound
- More detail about installation is on repository.

git clone -b v0.0.4 https://github.com/appleholic/pytorch_sound
cd pytorch_sound
pip install -e .

Preprocess vctk
- After run it, you can find 'meta' directory in "OUT DIR"
- Download Link

python pytorch_sound/scripts/preprocess.py vctk [VCTK DIR] [OUT DIR] [[Sample Rate: default 22.05k]]

Install multiband melgan

pip install -e .

Environment

Machine
- pytorch 1.5.0
- rtx titan 1 GPU / ryzen 3900x / 64GB
Dataset
- VCTK

Train

python multiband_melgan/train_mb.py run [META DIR] [SAVE DIR] [SAVE PREFIX] [[other arguments...]]

Example

import torch
import librosa
from multiband_melgan.inferencer import Inferencer

# make inferencer
inf = Inferencer()

# load sample audio
sample_path = ''
wav, sr = librosa.load(sample_path, sr=22050)
wav_tensor = torch.FloatTensor(wav).unsqueeze(0).cuda()

# convert to mel
mel = inf.encode(wav_tensor)  # (N, Cm, Tm)

# convert back to wav
pred_wav = inf.decode(mel, is_denoise=True)  # (N, Tw)

Reference

Others

Model
- multiband generator (22.05k) :
  - checkpoint file size : 6.6M
  - numb. parameters : 1710008
Audio Parameters

Sample Rate : 22.05k

Window Length & fft : 1024

Hop length : 256

Mel dim : 80

Mel min/max : 80 / 7600

Crop Size (in training) : 8192 samples

Author

Ilji Choi @appleholic

multiband_melgan's People

Contributors

Stargazers

Watchers

Forkers

linzai1992 nzpeng soobo-seo elliotthwang haoxiaoyang444 kelseyicotton

multiband_melgan's Issues

Add Speech Enhancement Pipeline

Enhance Result

Change mel spectrogram parameters
Use other parameters for stft losses

Have you got any generated samples ?

Hi, I wonder if you have any generated samples from your models.

BTW, what kind of mobile devices do you want to inference samples on ?

Thank you !

Add inference code and evaluate it

Add Evaluation Process and README.md

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.