ksw0306 / flowavenet Goto Github PK

View Code? Open in Web Editor NEW

493.0 493.0 112.0 59 KB

A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"

License: MIT License

Python 100.00%

clarinet generative-flow glow pytorch wavenet

flowavenet's People

Contributors

Stargazers

Watchers

Forkers

ml-lab linzai1992 xinkez kastnerkyle g-wang hyzhan mannykayy wonmin1 zhang-jian hiyoung-asr dendisuhubdy will-rice rikrd samangel93 luvpine carrot93 uptodiff wh-forker yueyedeai qoboty macroustc qianqq schabc chapter544 xzm2004260 sunxh16 gauss256 mgrabbani codeaudit bailehang vsooda toannhu linjucs esmaeilinia qq1074123922 ivannz tony32769 wezteoh geneing zhanglipku switchzts zhuyiche beyondboy sycns nashannashui meelement binyi10 eos21 zhihaodu sungroh donrv o7s8r6 yzy081 pukkapies suldier huguanglong lturing mgsong templeblock liusongxiang entn-at yzliu90 xqianghao jfsantos yufish linlinsongyun chaoso pgmbayes chairgraveyard punjteer aiyi2099 rin2401 1ytic shaun95 afd77 renaissance25 saber5433 ssgalitsky chwbin nagapetyan dacson batikim09 loong1989 stjordanis jy-choi-git nicolas-herault wyn314 hologerry silyfox zrb250 appalachianwine dongmh lianfei jihyunlee96 husnoo christophyoon josephbiko 5l1v3r1 xiongmeijing dejavu6

flowavenet's Issues

DataParralel takes too long

Hello,

I am trying to run the training part on multiple GPUs (4 Tesla V100), using the command

python train.py --model_name flowavenet --batch_size 8 --n_block 8 --n_flow 6 --n_layer 2 --block_per_split 4 --num_gpu 4

It runs everything without an error and outputs

num_gpu > 1 detected. converting the model to DataParallel...

It was frozen with this output for more than 1 hour. I checked the usage of the GPUs and all of them were used, but I didn't see any change. I have several questions: do I have some problem with the code or I have to wait more for the training to start? Will decrease in batch_size increase the speed of conversion to DataParallel?

Note* I run training on LJ-Speech-Dataset

Also, can you give us the download links of the pretrained models? It would be very helpful.

BrokenPipeError: [Errno 32] Broken pipe in train.py

Hi, I'm on a Windows 10 system with anaconda and python 3.6. I ran the preprocessing succesfully but on calling train.py I get the following error. This issues is definitely related to windows (see https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing) but I don't know how to solve it.

(FloWaveNet) C:\Users\admin\FloWaveNet>python train.py --model_name flowavenet --batch_size 8 --n_block 8 --n_flow 6 --n_layer 2 --causal no
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\admin\FloWaveNet\train.py", line 228, in <module>
    training_epoch_loss = train(epoch, model, optimizer)
  File "C:\Users\admin\FloWaveNet\train.py", line 91, in train
    for batch_idx, (x, c) in enumerate(train_loader):
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\site-packages\torch\utils\data\dataloader.py", line 501, in __iter__
    return _DataLoaderIter(self)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\site-packages\torch\utils\data\dataloader.py", line 289, in __init__
    w.start()
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\context.py", line 322, in _Popen
Traceback (most recent call last):
  File "train.py", line 228, in <module>
    return Popen(process_obj)
    training_epoch_loss = train(epoch, model, optimizer)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
  File "train.py", line 91, in train
    prep_data = spawn.get_preparation_data(process_obj._name)
    for batch_idx, (x, c) in enumerate(train_loader):
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\site-packages\torch\utils\data\dataloader.py", line 501, in __iter__
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    return _DataLoaderIter(self)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\site-packages\torch\utils\data\dataloader.py", line 289, in __init__
    _check_not_importing_main()
    w.start()
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\process.py", line 105, in start
    is not going to be frozen to produce an executable.''')
    self._popen = self._Popen(self)
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\context.py", line 223, in _Popen

    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

What is about the param (block_per_split)

I haven't seen the parameter in previous version.
Can anyone explain the effect of the parameter (block_per_split)?
The auther proposed the new version for Multi-scale architecture in 1 Feb.
I am so curious about the new feature.

Because I already trained the model at previous version.
I hesitate whether I need retrain the model.
(Training time is so long...)

Please add to the README how many seconds it takes to synthesize a sentence

With GPU and without, on already trained model.

I want to use some open source text-to-speech on AWS Lambda, I am looking into some solutions that can render audio faster than real time.

RuntimeError: CUDA error: out of memory in train.py

Hi, I get an out of memory message nearly instantly after training start. I use a Geforce GTX 1080TI with 11 gigabyte memory and Cuda 9.0. My question is:

What setup did you use for training?
How can I reduce the memory load during training?

Thanks for your help.

Edit: When I reduce the batch size to 6 I have a 10,4 GB of my GPU memory used which is fine. So I guess that is the solution.

Stack trace is:

(flowavenet) C:\Users\admin\FloWaveNet>python train.py --model_name flowavenet --batch_size 8 --n_block 8 --n_flow 6 --n_layer 2 --causal no
Traceback (most recent call last):
  File "train.py", line 229, in <module>
    training_epoch_loss = train(epoch, model, optimizer)
  File "train.py", line 108, in train
    log_p, logdet = model(x, c)
  File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\FloWaveNet\model.py", line 195, in forward
    out, c, logdet_new = block(out, c)
  File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\FloWaveNet\model.py", line 150, in forward
    out, c, det = flow(out, c)
  File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\FloWaveNet\model.py", line 114, in forward
    out, det = self.coupling(out, c)
  File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\FloWaveNet\model.py", line 74, in forward
    log_s, t = self.net(in_a, c_a).chunk(2, 1)
  File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\FloWaveNet\modules.py", line 117, in forward
    h, s = f(h, c)
  File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\FloWaveNet\modules.py", line 71, in forward
    h_gate = self.gate_conv(tensor)
  File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\FloWaveNet\modules.py", line 21, in forward
    out = self.conv(tensor)
  File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\conv.py", line 176, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

Do not understand the ZeroConv1d

Thanks for the nice job!
I cannot really understand the ZeroConv1d in modules.py.
Since the weight and bias are initialized to exactly zeros, will the output of the ZeroConv1d always be zeros? Or I mis-understand?
Could you please clear me a little bit? Thank you in advance.
Best.

Poor synthetic audio quality

The training is as follows:

Use the LJSpeech dataset, Synthesized audio

Command to generate synthesis from Tacotron-2 MELS ?

I am unable to figure out how to generate a synthesis with these parameters using the MELS spectrogram generated by Tacotron-2

parser.add_argument('--data_path', type=str, default='./DATASETS/ljspeech/', help='Dataset Path')
parser.add_argument('--sample_path', type=str, default='./samples', help='Sample Path')
parser.add_argument('--model_name', type=str, default='flowavenet', help='Model Name')
parser.add_argument('--num_samples', type=int, default=10, help='# of audio samples')
parser.add_argument('--load_step', type=int, default=0, help='Load Step')
parser.add_argument('--temp', type=float, default=0.8, help='Temperature')
parser.add_argument('--load', '-l', type=str, default='./params', help='Checkpoint path to resume / test.')
parser.add_argument('--n_layer', type=int, default=2, help='Number of layers')
parser.add_argument('--n_flow', type=int, default=6, help='Number of layers')
parser.add_argument('--n_block', type=int, default=8, help='Number of layers')
parser.add_argument('--cin_channels', type=int, default=80, help='Cin Channels')
parser.add_argument('--block_per_split', type=int, default=4, help='Block per split')
parser.add_argument('--num_workers', type=int, default=0, help='Number of workers')
parser.add_argument('--log', type=str, default='./log', help='Log folder.')

Where is the parameter for the MELS?

Does anyone have an example commandline?

Many Thanks
Joshua

Equation (3) and (5) in paper and implementation

Very nice work and thanks for sharing the code.

I have some questions about Equation (3) and (5) in your paper at https://arxiv.org/pdf/1811.02155.pdf.

The forward transformation (Equation (3)) and reverse transformation (Equation (5)) are different than the ones in GLOW paper (Row 3, Table 1, https://arxiv.org/pdf/1807.03039.pdf).

Their reverse function, such as x = (y_a - t)/s, is your forward function (implementation https://github.com/ksw0306/FloWaveNet/blob/master/model.py#L76) and Equation (3).
Their forward function, such as y = s*x_a + t, matches your reverse function in implementation at https://github.com/ksw0306/FloWaveNet/blob/master/model.py#L90, but not Equation (5) on your paper.

Did I miss anything? Thanks

How long it take to finish training?

I've trained this project with 10000 sentence speech about 2secs/sentecne ,namely 2.78 hours of training data for about 18hours ,with 43 epoches and 98860 steps now. But the loss decrease to -4.7929 while no clear voice can get in the generated wavs.

Periodic noise in flow based model?

Hi,
Thank you for your nice job. But I found that either your FloWaveNet or NVIDIA's WaveGlow has a problem of containing periodic noise.

Maybe this is caused by the squeeze operations, because I found that the frequencies of the periodic noise in WaveGlow are multiples of sample_rate // squeeze_factor. (for example, 16khz audio with squeeze_factor 8 may have periodic noise with 2khz, 4khz, 6khz and so on)

So do you any idea about how to solve this problem?

[Deleted]

RuntimeError: Unknown error -1

hi, I run the following command, but got error during training. I don't konw why the training stopped.

python train.py --model_name flowavenet --batch_size 1 --n_block 8 --n_flow 6 --n_layer 2 --causal no

Global Step : 47160, [4, 100] [Log pdf, Log p(z), Log Det] : [-4.071 -1.3882 5.4592]
Global Step : 47160, [4, 200] [Log pdf, Log p(z), Log Det] : [-3.8072 -1.3883 5.1956]
Global Step : 47160, [4, 300] [Log pdf, Log p(z), Log Det] : [-3.833 -1.3862 5.2192]
Global Step : 47160, [4, 400] [Log pdf, Log p(z), Log Det] : [-3.8035 -1.386 5.1895]
Global Step : 47160, [4, 500] [Log pdf, Log p(z), Log Det] : [-3.9065 -1.3802 5.2868]
Global Step : 47160, [4, 600] [Log pdf, Log p(z), Log Det] : [-3.7542 -1.3834 5.1376]
Global Step : 47160, [4, 700] [Log pdf, Log p(z), Log Det] : [-3.7565 -1.3889 5.1454]
Global Step : 47160, [4, 800] [Log pdf, Log p(z), Log Det] : [-3.9123 -1.3848 5.2971]
Global Step : 47160, [4, 900] [Log pdf, Log p(z), Log Det] : [-3.8986 -1.3856 5.2842]
Global Step : 47160, [4, 1000] [Log pdf, Log p(z), Log Det] : [-4.0164 -1.3854 5.4018]
Global Step : 47160, [4, 1100] [Log pdf, Log p(z), Log Det] : [-3.9558 -1.3868 5.3426]
Global Step : 47160, [4, 1200] [Log pdf, Log p(z), Log Det] : [-4.0773 -1.3795 5.4568]
Global Step : 47160, [4, 1300] [Log pdf, Log p(z), Log Det] : [-3.8799 -1.383 5.2629]
Evaluation Loss : -3.8945
Traceback (most recent call last):
File "train.py", line 241, in
save_checkpoint(model, optimizer, global_step, epoch)
File "train.py", line 185, in save_checkpoint
"global_epoch": global_epoch}, checkpoint_path)
File "/home/hu/anaconda3/envs/hhh/lib/python3.6/site-packages/torch/serialization.py", line 209, in save
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/hu/anaconda3/envs/hhh/lib/python3.6/site-packages/torch/serialization.py", line 134, in _with_file_like
return body(f)
File "/home/hu/anaconda3/envs/hhh/lib/python3.6/site-packages/torch/serialization.py", line 209, in
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/hu/anaconda3/envs/hhh/lib/python3.6/site-packages/torch/serialization.py", line 288, in _save
serialized_storages[key]._write_file(f, _should_read_directly(f))
RuntimeError: Unknown error -1

Do not understand where the text comes from

Hi there,
I want to change text for the output, generated wav file, any idea how? I was looking at the source but cannot figure out how does it work..

Thankies.

CUDA State error

Global Step : 1300, [1, 1300] [Log pdf, Log p(z), Log Det] : [-3.2562 -1.4938  4.75  ]
Global Step : 1400, [1, 1400] [Log pdf, Log p(z), Log Det] : [-3.154  -1.4319  4.5859]
Global Step : 1500, [1, 1500] [Log pdf, Log p(z), Log Det] : [-3.3218 -1.4467  4.7684]
Global Step : 1600, [1, 1600] [Log pdf, Log p(z), Log Det] : [-3.3319 -1.4162  4.7481]
Global Step : 1700, [1, 1700] [Log pdf, Log p(z), Log Det] : [-3.532  -1.4159  4.9479]
Global Step : 1800, [1, 1800] [Log pdf, Log p(z), Log Det] : [-3.0115 -1.549   4.5605]
Global Step : 1900, [1, 1900] [Log pdf, Log p(z), Log Det] : [-3.572  -1.4092  4.9812]
Global Step : 2000, [1, 2000] [Log pdf, Log p(z), Log Det] : [-3.595  -1.4123  5.0073]
Global Step : 2100, [1, 2100] [Log pdf, Log p(z), Log Det] : [-3.3661 -1.4437  4.8097]
Global Step : 2200, [1, 2200] [Log pdf, Log p(z), Log Det] : [-3.43  -1.416  4.846]
1 Epoch Training Loss : -2.8819
Global Step : 2250, [1, 100] [Log pdf, Log p(z), Log Det] : [-3.1498 -1.4728  4.6226]
Global Step : 2250, [1, 200] [Log pdf, Log p(z), Log Det] : [-3.1846 -1.4745  4.6591]
Evaluation Loss : -3.1582
Epoch 1 Model Saved! Loss : -3.1582
Traceback (most recent call last):
  File "H:/workspace/FloWaveNet/train.py", line 244, in <module>
    synthesize(model)
  File "H:/workspace/FloWaveNet/train.py", line 170, in synthesize
    y_gen = model.reverse(z, c).squeeze()
  File "H:\workspace\FloWaveNet\model.py", line 204, in reverse
    c = self.upsample(c)
  File "H:\workspace\FloWaveNet\model.py", line 230, in upsample
    c = f(c)
  File "C:\Python35\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Python35\lib\site-packages\torch\nn\modules\conv.py", line 691, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: CuDNN error: CUDNN_STATUS_INTERNAL_ERROR

Why is the audio corresponding to the mel feature needed when synthesizing?

I only want to input the mel-feature generated from tacotron2. How should I modify the script "synthesize.py"?

problem when synthesize

thanks for this nice job

but I have some promble when synthesize, there is always sound reverberation in synthesize audio compared with raw audio, does someone have same problem with me(batch size = 4,1000k step). I guess the "change order"module may lead in this problem?
after 200k step, I found loss almost remain unchanged in (-3.4,-3.7), and the result of synthesize is similar too from 200k to 1000k，so I want to ask it is reasonable? and if not, which scale of loss is reasonable