enhuiz / vall-e Goto Github PK

View Code? Open in Web Editor NEW

2.9K 93.0 411.0 1.95 MB

An unofficial PyTorch implementation of the audio LM VALL-E

License: MIT License

Python 99.89% Shell 0.11%

vall-e valle text-to-speech pytorch tts audio-lm

vall-e's Introduction

VALL-E

An unofficial PyTorch implementation of VALL-E, based on the EnCodec tokenizer.

Get Started

A toy Google Colab example: . Please note that this example overfits a single utterance under the data/test and is not usable. The pretrained model is yet to come.

Requirements

Since the trainer is based on DeepSpeed, you will need to have a GPU that DeepSpeed has developed and tested against, as well as a CUDA or ROCm compiler pre-installed to install this package.

Install

pip install git+https://github.com/enhuiz/vall-e

Or you may clone by:

git clone --recurse-submodules https://github.com/enhuiz/vall-e.git

Note that the code is only tested under Python 3.10.7.

Train

Put your data into a folder, e.g. data/your_data. Audio files should be named with the suffix .wav and text files with .normalized.txt.
Quantize the data:

python -m vall_e.emb.qnt data/your_data

Generate phonemes based on the text:

python -m vall_e.emb.g2p data/your_data

Customize your configuration by creating config/your_data/ar.yml and config/your_data/nar.yml. Refer to the example configs in config/test and vall_e/config.py for details. You may choose different model presets, check vall_e/vall_e/__init__.py.
Train the AR or NAR model using the following scripts:

python -m vall_e.train yaml=config/your_data/ar_or_nar.yml

You may quit your training any time by just typing quit in your CLI. The latest checkpoint will be automatically saved.

Export

Both trained models need to be exported to a certain path. To export either of them, run:

python -m vall_e.export zoo/ar_or_nar.pt yaml=config/your_data/ar_or_nar.yml

This will export the latest checkpoint.

Synthesis

python -m vall_e <text> <ref_path> <out_path> --ar-ckpt zoo/ar.pt --nar-ckpt zoo/nar.pt

TODO

AR model for the first quantizer
Audio decoding from tokens
NAR model for the rest quantizers
Trainers for both models
Implement AdaLN for NAR model.
Sample-wise quantization level sampling for NAR training.
Pre-trained checkpoint and demos on LibriTTS
Synthesis CLI

Notice

EnCodec is licensed under CC-BY-NC 4.0. If you use the code to generate audio quantization or perform decoding, it is important to adhere to the terms of their license.

Citations

@article{wang2023neural,
  title={Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers},
  author={Wang, Chengyi and Chen, Sanyuan and Wu, Yu and Zhang, Ziqiang and Zhou, Long and Liu, Shujie and Chen, Zhuo and Liu, Yanqing and Wang, Huaming and Li, Jinyu and others},
  journal={arXiv preprint arXiv:2301.02111},
  year={2023}
}

@article{defossez2022highfi,
  title={High Fidelity Neural Audio Compression},
  author={Défossez, Alexandre and Copet, Jade and Synnaeve, Gabriel and Adi, Yossi},
  journal={arXiv preprint arXiv:2210.13438},
  year={2022}
}

vall-e's People

Contributors

Stargazers

Watchers

Forkers

aswado ishine thedeeno nocturnalguru fangzheng354 yuan-manx radientbrain skyyap maxmax2016 nicbet misztersoul kingfener atlury gresci existentialrecursionist illustromancer techthiyanes lzcsjtu whitefu smirlangit moorehousew jhq223 raulgonzalezdev lumpidu furmanlukasz anggadaz hadesnull123 rickyhong leslie-wong raikarsagar josh-zhu spyrotsk yiwei0730 kollynce nangongmujd andrflor neeraj-satyaki rohithkodali smile-struggler appumistri brunotech hyojunguy egaebel igledaniel mrg7 piandpower entn-at mariodev12 arstropica dion264 allthingssecurity k5-inoue dbarbedillo rimonyaari lpuchon-sygic pravinshahi0007 shaun95 if-ai tiagoooliveira kaananli zilard metaver5o iuriimattos2 matthewsanders aidasdir josepowera poveteen kauld dobita21 huyxuhao sidx04-test a-why-not-fork-repositories-good-luck teamazizo system1system2 sixinternet kiyeopyang mistobaan rebotnix kyrylogy ukaserge sagi paperwave nongalandtech pithematic brodyaga-dev smilefounder martincastellano madbiazin hirajanwin poornasainagendra orjwanzaafarani mydatascience merumeru-rururu mbrukman mister-jones atlonxp halcy p-serna simix77 dancaron

vall-e's Issues

Audio preprocessing needed

I think before quantization, there should be an audio preprocessing step: normalize the volume, trim the silence, and split long audio.

I tried following the Vall-E demo flow at https://blog.paperspace.com/training-vall-e-from-scratch-on-your-own-voice-samples/ and sample the model to generate an audio file with my own voice but the result was noise. I noticed and changed the following:

In the original code max_phones in config.py was set to 50. It filtered out all the training files but one so I changed it to 250.
I ran a few training rounds and played with config/libri/ar.yml and config/libri/nar.ymel values. For example - I extended max_iter to 2048 and changed the batch_size to 4 or 8 or 16 and even 32 but the model.loss remained quite high (~3 for ar and ~5 for nar).

Did anyone manage to get decent results and if so, did you apply any changes to the original paperspace demo project provided at https://console.paperspace.com/github/gradient-ai/vall-e?machine=Free-GPU?

Thanks!
Ilan

Several questions of solution

Hi, I'm very impressed of your efforts on VALL-E, making me for having several questions of your system.

First, in your training system, the AR model and NAR model training is going through independently. Why don't you made this system alternately?

Second is, this may be a question that I didn't understand the paper properly. Do prompt and input sequence share the layer needed to change token to embedded?

Third, I understand that the prompt and input sequence in the AR system are one total sequence, but is it the same for NAR? Also, the two sequences seem to be distinguished by EOS token, does this apply to NAR as well?

torch.load() returned a dict when inference

when run inference code, loading the model, but torch.load() returned a dict
get error:

File "vall-e/vall_e/main.py", line 30, in main
ar = torch.load(args.ar_ckpt).to(args.device)
AttributeError: 'dict' object has no attribute 'to'

what kind of GPU needs for this model?

I got the following error on a 1060 6G box:

RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

I get this error on colab

!python -m vall_e 'hello world this is getting interesting' data/test/test.wav toy.wav
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/content/vall-e/vall_e/main.py", line 43, in
main()
File "/content/vall-e/vall_e/main.py", line 29, in main
phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
File "/content/vall-e/vall_e/main.py", line 29, in
phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
KeyError: 'DH'

Any advice on solving multi-GPU training failure?

Hi, thanks for your kindly sharing your code!!!

I have tried to use Librispeech dataset for training, and according to other issues, I run:
CUDA_VISIBLE_DEVICES=1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node 5 -m vall_e.train yaml=config/LibriSpeech/ar.yml

But firstly the code stuck for minutes, with information:

2023-02-19 14:44:33 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
2023-02-19 14:44:33 - vall_e.utils.trainer - INFO - GR=3;LR=3 - 
New epoch starts.
2023-02-19 14:44:33 - vall_e.utils.trainer - INFO - GR=4;LR=4 - 
New epoch starts.
2023-02-19 14:44:33 - vall_e.utils.trainer - INFO - GR=2;LR=2 - 
New epoch starts.
2023-02-19 14:44:33 - vall_e.utils.trainer - INFO - GR=1;LR=1 - 
New epoch starts.

Then the training failed, with errors as follows:

[E ProcessGroupNCCL.cpp:821] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=139, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801567 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:821] [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=139, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801540 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:821] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=139, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801537 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:821] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=139, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801594 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=139, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801537 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=139, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801540 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=139, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801594 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=139, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801567 milliseconds before timing out.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2038811 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 2038810) of binary: /data3/gx/anaconda3/envs/vall-e/bin/python
Traceback (most recent call last):
  File "/data3/gx/anaconda3/envs/vall-e/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data3/gx/anaconda3/envs/vall-e/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data3/gx/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/distributed/launch.py", line 195, in <module>
    main()
  File "/data3/gx/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/distributed/launch.py", line 191, in main
    launch(args)
  File "/data3/gx/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/distributed/launch.py", line 176, in launch
    run(args)
  File "/data3/gx/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/data3/gx/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data3/gx/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
========================================================
vall_e.train FAILED
--------------------------------------------------------
Failures:
[1]:
  time      : 2023-02-19_15:14:41
  host      : amax
  rank      : 2 (local_rank: 2)
  exitcode  : -6 (pid: 2038812)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 2038812
[2]:
  time      : 2023-02-19_15:14:41
  host      : amax
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 2038813)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 2038813
[3]:
  time      : 2023-02-19_15:14:41
  host      : amax
  rank      : 4 (local_rank: 4)
  exitcode  : -6 (pid: 2038814)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 2038814
--------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-02-19_15:14:41
  host      : amax
  rank      : 0 (local_rank: 0)
  exitcode  : -6 (pid: 2038810)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 2038810
========================================================

I run this code in five RTX 3090 cards. Could you please tell me how to handle these errors? Or could you give some advice on what reason may cause these errors? Thank you!

have someone ever tried this repo on other languages and got good performance？

have someone ever tried this repo on other languages and got good performance？
50 hours of toy data seem didn't get intelligibility.

Getting error while quantizing data

I downloaded Obama's speech and its transcription, and put that into the data folder.
I used this data: https://drive.google.com/drive/folders/17mHURkxigU5cbmPkBOCWaxfaoNBn9hU0?usp=share_link

When I run the command

!python -m vall_e.emb.qnt data/your_data

I am not able to generate the other files, instead it stops running and shows me this output;

0it [00:00, ?it/s]

What could be the reason behind this? Is my data not in the correct format? PLEASE HELP!

colab

Hi @enhuiz 👋 I am trying to make a simple colab please help me https://github.com/camenduru/vall-e-colab

Unable to quantize the data

After I run python -m vall_e.emb.qnt data/your_data, the error shown is Error while finding module specification for 'vall_e.emb.qnt' (ModuleNotFoundError: No module named 'vall_e.emb')

I followed the instruction by putting my data into data/sky, making the data files having 2 files (test and sky files). Audio files in data/sky is named with 1.wav and text files is named with 1.normalized.txt. I am not sure which part I am missing.

Error "No valid path is found for training"

(vall-e) loong@Loong-Surface:~/Codes/vall-e$ python -m vall_e.train yaml=config/test/ar.yml
1it [00:00, 3407.23it/s]
Traceback (most recent call last):
  File "/home/loong/miniconda3/envs/vall-e/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/loong/miniconda3/envs/vall-e/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/loong/miniconda3/envs/vall-e/lib/python3.10/site-packages/vall_e/train.py", line 128, in <module>
    main()
  File "/home/loong/miniconda3/envs/vall-e/lib/python3.10/site-packages/vall_e/train.py", line 33, in main
    train_dl, subtrain_dl, val_dl = create_train_val_dataloader()
  File "/home/loong/miniconda3/envs/vall-e/lib/python3.10/site-packages/vall_e/data.py", line 266, in create_train_val_dataloader
    train_dataset, val_dataset = create_datasets()
  File "/home/loong/miniconda3/envs/vall-e/lib/python3.10/site-packages/vall_e/data.py", line 247, in create_datasets
    train_dataset = VALLEDatset(
  File "/home/loong/miniconda3/envs/vall-e/lib/python3.10/site-packages/vall_e/data.py", line 105, in __init__
    raise ValueError("No valid path is found for training.")
ValueError: No valid path is found for training.

Env:
Python 3.10
WSL2 (Ubuntu 22.04) on Win 11 Pro

ValueError: No valid path is found for training., Colab error

I am getting this error in collab

I have created a new folder under data called custom

After that I have created new YML files under a new folder

Here is the ar.yml

I am getting this error now any suggestions

Support for VALL-E X

https://vallex-demo.github.io/

Title

Loading data too slow

How long do you usually load the data? I have approximately 3000h data which takes me more than 1 hour to load the data and no idea which part caused this. Any ideas to speed it up? Does it help to apply larger nj?

Training Problem

Hello,
I have tested training with several kinds of data.
But even if I change data size and config setting values (batch_size, eval_every, etc), generated ar.pt and nar.pt files are always same sizes. (24,192KB, 27,406KB)
Please let me know how to change setting values exactly.
Thanks
Petar

Training with my own audios, does it works?

Discussed in #28

^{Originally posted by bgondell January 24, 2023}
I cloned the collab and uploaded a pair of audios with my voice, but when tried to generate a "hello world", did not work.

Not sure if it's a very specific demo, or can be trained for more examples.

Thanks!!

Command:
!python -m vall_e 'hello world' data/bruno/file_391.wav toy.wav

Output:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/content/vall-e/vall_e/__main__.py", line 43, in <module>
    main()
  File "/content/vall-e/vall_e/__main__.py", line 29, in main
    phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
  File "/content/vall-e/vall_e/__main__.py", line 29, in <listcomp>
    phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
KeyError: 'HH'

Failed to find any .qnt.pt file in [PosixPath('data/put')]

mkulas@debian:~/valle/data$ find . | sed -e "s/[^-][^\/]*\// |/g" -e "s/|\([^ ]\)/|-\1/"
.
 |-put
 | |-put.txt
 | |-ar.yml
 | |-put.wav
 | |-nar.yml
 |-.normalized.txt
 |-put.txt
 |-test
 | |-ar.yml
 | |-nar.yml
 |-put.wav
 |-audio.wav
 |-LibriTTS
 | |-nar-quarter.yml
 | |-ar.yml
 | |-nar.yml
 | |-ar-quarter.yml

$ python -m vall_e.emb.qnt data/put/put.wav
0it [00:00, ?it/s]

$ python -m vall_e.emb.g2p data/put/put.txt
0it [00:00, ?it/s]

$ python -m vall_e.train yaml=data/put/ar.yml

0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/mkulas/valle/lib/python3.10/site-packages/vall_e/train.py", line 130, in <module>
    main()
  File "/home/mkulas/valle/lib/python3.10/site-packages/vall_e/train.py", line 33, in main
    train_dl, train_for_val_dl, val_dl, test_dl = create_train_val_dataloader()
  File "/home/mkulas/valle/lib/python3.10/site-packages/vall_e/data.py", line 280, in create_train_val_dataloader
    train_dataset, val_dataset, test_dataset = create_datasets()
  File "/home/mkulas/valle/lib/python3.10/site-packages/diskcache/core.py", line 1877, in wrapper
    result = func(*args, **kwargs)
  File "/home/mkulas/valle/lib/python3.10/site-packages/vall_e/data.py", line 254, in create_datasets
    train_paths, val_paths = _load_train_val_paths()
  File "/home/mkulas/valle/lib/python3.10/site-packages/vall_e/data.py", line 226, in _load_train_val_paths
    raise RuntimeError(f"Failed to find any .qnt.pt file in {cfg.data_dirs}.")
RuntimeError: Failed to find any .qnt.pt file in [PosixPath('data/put')].

About performing distributed training

Hello, and thanks for sharing these great codes. Is it possible to use this trainer on multiple GPUs? I see that it is based on deepspeed but I can't find any configuration files for distributed training. Could you help me on this? Thanks!

Error when training

I cannot manage to make this work on windows.

Running the following command python -m vall_e.train yaml=config/test/ar.yml

First, I was getting error RuntimeError: Distributed package doesn't have NCCL built in

Seems like NCCL backend of pytorch distributed pacakages is not working on windows.

Found out a workaround to use gloo backend and added the following code in data.py:

def get_free_port():
    sock = socket.socket()
    sock.bind(("", 0))
    return sock.getsockname()[1]

os.environ["RANK"]="0"
os.environ["WORLD_SIZE"]="1"
os.environ["MASTER_ADDR"]="localhost"
os.environ["MASTER_PORT"]=str(get_free_port())
os.environ["LOCAL_RANK"]="0"

torch.distributed.init_process_group(backend="gloo", rank=0, world_size=1)

Then it returns the following error:
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

This is where I hit the brickwall

Platform: windows 11
Python: 3.10.9
torch: 1.11.0+cu113

Getting "No valid path is found for training". Using WSL + followed the Collab

python3 -m vall_e.train yaml=config/test/ar.yml
1it [00:00, 207.85it/s]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/varunmayya/.local/lib/python3.10/site-packages/vall_e/train.py", line 128, in
main()
File "/home/varunmayya/.local/lib/python3.10/site-packages/vall_e/train.py", line 33, in main
train_dl, subtrain_dl, val_dl = create_train_val_dataloader()
File "/home/varunmayya/.local/lib/python3.10/site-packages/vall_e/data.py", line 266, in create_train_val_dataloader
train_dataset, val_dataset = create_datasets()
File "/home/varunmayya/.local/lib/python3.10/site-packages/vall_e/data.py", line 247, in create_datasets
train_dataset = VALLEDatset(
File "/home/varunmayya/.local/lib/python3.10/site-packages/vall_e/data.py", line 105, in init
raise ValueError("No valid path is found for training.")
ValueError: No valid path is found for training.

perform masking is a lower triangular matrix in AR model, is it a bug?(在AR模型中后续数据的掩码遮盖矩阵正好是一个下三角矩阵，是否是代码的bug？)

In vall_e/vall_e/base.py, for each transformer block of the AR model, the masking tensor should not be just a lower triangular matrix because text and prompts in tensor v can attend to each other. Therefore, the zero mask should be like a trapezoid rather than a triangle.
(在 vall_e/vall_e/base.py 中，掩码不应该只是一个下三角，因为张量v中的文本和语音提示可以相互关注，因此零掩码应该像梯形而不是三角形)

Cannot create "hello hello hello" wav file.

I tried colab example as it is and I changed output text "hello world" to "hello hello hello".
But the generated wav file is not "hello hello hello" but "hello world".

How can I generate other sentence likes "hello", "world hello", "world world world"?

Issue Getting Samples

When running

python3 -m vall_e.train yaml=data/lies/ar.yml

I am getting the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/vall_e/train.py", line 130, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/vall_e/train.py", line 33, in main
    train_dl, train_for_val_dl, val_dl, test_dl = create_train_val_dataloader()
  File "/usr/local/lib/python3.10/dist-packages/vall_e/data.py", line 282, in create_train_val_dataloader
    train_dl = _create_dl(train_dataset, training=True)
  File "/usr/local/lib/python3.10/dist-packages/vall_e/data.py", line 205, in _create_dl
    return DataLoader(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 344, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-type]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 107, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

These are the files in my data/lies dir:

ar.yml  lies.normalized.txt  lies.phn.txt  lies.qnt.pt  lies.wav  nar.yml

And this is my ar.yml file:

data_dirs: [data/lies]

model: ar-quarter
batch_size: 1
eval_batch_size: 1
save_ckpt_every: 500
eval_every: 500
max_iter: 1000

Out of memory with batch size = 1

mistake of using wrong gpu config, closed

split training of the AR and NAR model

Thanks for your time.
Seems that in the code, the input codebook of the NAR model (seems the ground truth) is not the output from AR model?

Doesn't generate any sentences other than hello world

Hi,

I've followed all the steps in the Colab notebook. In the end, I wanted to generate my own sentence, like this:

!python -m vall_e 'why this is not working' data/test/test.wav toy.wav

But it always returns errors like this:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/content/vall-e/vall_e/__main__.py", line 43, in <module>
    main()
  File "/content/vall-e/vall_e/__main__.py", line 29, in main
    phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
  File "/content/vall-e/vall_e/__main__.py", line 29, in <listcomp>
    phns = torch.tensor([symmap[p] for p in g2p.encode(args.text)])
KeyError: 'AY1'

What could be wrong?

how to fine-tune model

I've trained a model from scratch, the inference results are decent but I'd like to fine-tune the model further.
I've prepared another dataset but I'm unable to resume training from the checkpoint with the new dataset. The training process still refers to the old dataset.
Is there a way to fine-tune from an existing model?

Trainer Error ( vall_e.train generating following errors)

Would you please check it our and guide me to remove this error.

How to use the absolute path of the data for training

I want to use the absolute path of my data for training, but when I do this, the following error will occur, how to solve it?

0%|                                                                                                                                  | 0/1 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "XXX/anaconda3/envs/valle/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "XXX/anaconda3/envs/valle/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "XXX/project/2023/230118-valle/vall_e/train.py", line 128, in <module>
    main()
  File "XXX/project/2023/230118-valle/vall_e/train.py", line 119, in main
    trainer.train(
  File "XXX/project/2023/230118-valle/vall_e/utils/trainer.py", line 205, in train
    eval_fn(engines=engines)
  File "XXX/project/2023/230118-valle/vall_e/utils/distributed.py", line 69, in wrapped
    return fn(*args, **kwargs)
  File "XXX/project/2023/230118-valle/vall_e/train.py", line 116, in eval_fn
    run_eval(engines, "subtrain", subtrain_dl)
  File "XXX/anaconda3/envs/valle/lib/python3.10/site-packages/torch-1.13.1-py3.10-linux-x86_64.egg/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "XXX/project/2023/230118-valle/vall_e/train.py", line 97, in run_eval
    relpath = path.relative_to(cfg.data_root)
  File "XXX/anaconda3/envs/valle/lib/python3.10/pathlib.py", line 818, in relative_to
    raise ValueError("{!r} is not in the subpath of {!r}"
ValueError: '/data/public/libriTTS/LibriTTS/train-clean-100/103/1241/103_1241_000000_000001.qnt.pt' is not in the subpath of 'data' OR one path is relative and the other is absolute.

WorkNCCL timeout when running with torchrun

How to Train Vall-e for a Singers Voice

@enhuiz Can you please outline a process how can we train the model for a particular singer.

number of songs required ? ( Singers Vocals )
length of vocals files.
idean trainin parameters for ar.yml and nar.yml

Training stuck at "new epoch starts"

Hi and thanks for the great work! I have finished all the preliminary steps and uses python -m vall_e.train yaml=config/test/ar.yml to train. It outputs something like this:

{'data_dirs': ['data/test'], 'model': 'ar-quarter', 'batch_size': 1, 'eval_batch_size': 1, 'save_ckpt_every': 500, 'eval_every': 500, 'max_iter': 1000, 'cfg_name': PosixPath('test/ar')} {}
2it [00:00, 1906.94it/s]
2023-02-28 00:43:47 - vall_e.data - INFO - GR=0;LR=0 - 
{'</s>': 1, '<s>': 2, 'AH0': 3, 'D': 4, 'ER1': 5, 'HH': 6, 'L': 7, 'OW1': 8, 'W': 9, '_': 10}
2023-02-28 00:43:47 - vall_e.data - INFO - GR=0;LR=0 - 
{'test': 0}
2023-02-28 00:43:47 - vall_e.data - INFO - GR=0;LR=0 - 
#samples (train): 2.
2023-02-28 00:43:47 - vall_e.data - INFO - GR=0;LR=0 - 
#samples (val): 0.
[2023-02-28 00:43:47,269] [INFO] [comm.py:657:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-02-28 00:43:47 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 - 
Added key: store_based_barrier_key:1 to store for rank: 0
2023-02-28 00:43:47 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 - 
Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
2023-02-28 00:43:51 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 - 
Added key: store_based_barrier_key:2 to store for rank: 0
2023-02-28 00:43:51 - torch.distributed.distributed_c10d - INFO - GR=0;LR=0 - 
Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes.
[2023-02-28 00:43:51,787] [INFO] [logging.py:75:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Using /mnt/lustre/sjtu/home/ywg12/.cache/torch_extensions/py310_cu102 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/lustre/sjtu/home/ywg12/.cache/torch_extensions/py310_cu102/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.10433101654052734 seconds
[2023-02-28 00:43:52,152] [INFO] [logging.py:75:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adam as basic optimizer
[2023-02-28 00:43:52,155] [INFO] [logging.py:75:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-02-28 00:43:52,155] [INFO] [logging.py:75:log_dist] [Rank 0] Creating fp16 optimizer with dynamic loss scale
[2023-02-28 00:43:52,165] [INFO] [logging.py:75:log_dist] [Rank 0] DeepSpeed Final Optimizer = adam
[2023-02-28 00:43:52,166] [INFO] [logging.py:75:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupDecayLR
[2023-02-28 00:43:52,166] [INFO] [logging.py:75:log_dist] [Rank 0] DeepSpeed LR Scheduler = <deepspeed.runtime.lr_schedules.WarmupDecayLR object at 0x7fa2e7319ed0>
[2023-02-28 00:43:52,166] [INFO] [logging.py:75:log_dist] [Rank 0] step=0, skipped=0, lr=[0.001], mom=[(0.9, 0.999)]
[2023-02-28 00:43:52,166] [INFO] [config.py:1009:print] DeepSpeedEngine configuration:
[2023-02-28 00:43:52,166] [INFO] [config.py:1013:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-02-28 00:43:52,166] [INFO] [config.py:1013:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   amp_enabled .................. False
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   amp_params ................... False
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   bfloat16_enabled ............. False
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   checkpoint_parallel_write_pipeline  False
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   checkpoint_tag_validation_enabled  True
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   checkpoint_tag_validation_fail  False
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fa2e7319ae0>
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   communication_data_type ...... None
[2023-02-28 00:43:52,167] [INFO] [config.py:1013:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   curriculum_enabled_legacy .... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   curriculum_params_legacy ..... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   data_efficiency_enabled ...... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   dataloader_drop_last ......... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   disable_allgather ............ False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   dump_state ................... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   dynamic_loss_scale_args ...... None
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   eigenvalue_enabled ........... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   eigenvalue_gas_boundary_resolution  1
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   eigenvalue_layer_num ......... 0
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   eigenvalue_max_iter .......... 100
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   eigenvalue_stability ......... 1e-06
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   eigenvalue_tol ............... 0.01
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   eigenvalue_verbose ........... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   elasticity_enabled ........... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   fp16_auto_cast ............... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   fp16_enabled ................. True
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   fp16_master_weights_and_gradients  False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   global_rank .................. 0
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   grad_accum_dtype ............. None
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   gradient_accumulation_steps .. 1
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   gradient_clipping ............ 100.0
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   gradient_predivide_factor .... 1.0
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   initial_dynamic_scale ........ 65536
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   load_universal_checkpoint .... False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   loss_scale ................... 0
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   memory_breakdown ............. False
[2023-02-28 00:43:52,168] [INFO] [config.py:1013:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   optimizer_legacy_fusion ...... False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   optimizer_name ............... adam
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   optimizer_params ............. None
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   pld_enabled .................. False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   pld_params ................... False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   prescale_gradients ........... False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   scheduler_name ............... WarmupDecayLR
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   scheduler_params ............. {'warmup_min_lr': 1e-06, 'warmup_max_lr': 0.0002, 'warmup_num_steps': 1000, 'total_num_steps': 1000, 'warmup_type': 'linear'}
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   sparse_attention ............. None
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   sparse_gradients_enabled ..... False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   steps_per_print .............. 10
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   train_batch_size ............. 1
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   train_micro_batch_size_per_gpu  1
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   use_node_local_storage ....... False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   wall_clock_breakdown ......... False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   world_size ................... 1
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   zero_allow_untested_optimizer  False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   zero_enabled ................. False
[2023-02-28 00:43:52,169] [INFO] [config.py:1013:print]   zero_optimization_stage ...... 0
[2023-02-28 00:43:52,169] [INFO] [config.py:998:print_user_config]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "gradient_accumulation_steps": 1, 
    "optimizer": {
        "type": "Adam", 
        "lr": 1e-06
    }, 
    "scheduler": {
        "type": "WarmupDecayLR", 
        "params": {
            "warmup_min_lr": 1e-06, 
            "warmup_max_lr": 0.0002, 
            "warmup_num_steps": 1000, 
            "total_num_steps": 1000, 
            "warmup_type": "linear"
        }
    }, 
    "gradient_clipping": 100.0, 
    "fp16": {
        "enabled": true
    }
}
Using /mnt/lustre/sjtu/home/ywg12/.cache/torch_extensions/py310_cu102 as PyTorch extensions root...
Emitting ninja build file /mnt/lustre/sjtu/home/ywg12/.cache/torch_extensions/py310_cu102/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.12616920471191406 seconds
[2023-02-28 00:43:52,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from ckpts/test/ar/model/default/mp_rank_00_model_states.pt...
[2023-02-28 00:43:52,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from ckpts/test/ar/model/default/mp_rank_00_model_states.pt.
[2023-02-28 00:43:52,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from ckpts/test/ar/model/default/mp_rank_00_model_states.pt...
[2023-02-28 00:43:52,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from ckpts/test/ar/model/default/mp_rank_00_model_states.pt.
fatal: Not a git repository (or any parent up to mount point /mnt/lustre)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /mnt/lustre)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2023-02-28 00:43:52 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
{
  "batch_size": 1,
  "cache_dataloader": false,
  "cache_dir": ".cache/test/ar",
  "cfg_name": "test/ar",
  "cfg_relpath": null,
  "ckpt_dir": "ckpts/test/ar",
  "ckpt_root": "ckpts",
  "data_dirs": "[PosixPath('data/test')]",
  "data_root": "data",
  "device": "cuda",
  "dis_warmup_max_lr": 0.0004,
  "ds_cfg": {
    "train_micro_batch_size_per_gpu": 1,
    "gradient_accumulation_steps": 1,
    "optimizer": {
      "type": "Adam",
      "lr": 1e-06
    },
    "scheduler": {
      "type": "WarmupDecayLR",
      "params": {
        "warmup_min_lr": 1e-06,
        "warmup_max_lr": 0.0002,
        "warmup_num_steps": 1000,
        "total_num_steps": 1000,
        "warmup_type": "linear"
      }
    },
    "gradient_clipping": 100.0,
    "fp16": {
      "enabled": true
    }
  },
  "eval_batch_size": 1,
  "eval_every": 500,
  "fp16_cfg": {
    "enabled": true
  },
  "git_commit": "",
  "git_status": "",
  "gradient_accumulation_steps": 1,
  "gradient_clipping": 100.0,
  "log_dir": "logs/test/ar/1677516227",
  "log_root": "logs",
  "max_grad_norm": null,
  "max_iter": 1000,
  "max_num_val": 20,
  "max_phones": 50,
  "max_prompts": 3,
  "max_val_ar_steps": 300,
  "min_phones": 10,
  "model": "ar-quarter",
  "nj": 8,
  "num_tokens": 1024,
  "p_additional_prompt": 0.8,
  "relpath": "test/ar",
  "sample_rate": 24000,
  "sampling_temperature": 1.0,
  "save_artifacts_every": 100,
  "save_ckpt_every": 500,
  "save_on_oom": true,
  "save_on_quit": true,
  "spkr_name_getter": "lambda p: p.parts[-2]",
  "start_time": 1677516227,
  "token_dim": 256,
  "use_fp16": true,
  "warmup_max_lr": 0.0002,
  "warmup_min_lr": 1e-06,
  "warmup_num_steps": 1000
}
2023-02-28 00:43:52 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.

Then it somehow stuck there forever. It kept stuck no matter what I pressed. If I Ctrl-C, the program just quits with no error message. This is strange as I would never know where the program halts and how long it will leave me waiting.

Colab pls?

any way of having a simple colab to run this?

Training New Language, Problem

I cloned colab and i change config.py - G2P.py and dataset, when training process finished and synthesize prompt with audio every time output has toy.wav sound as test.wav but in first part of toy.wav something hear near to my text prompt. why this happen and how can I solve this?

How to use multiple-GPU in training?

I saw the solve in close issue
python -m torch.distributed.launch --nproc_per_node 2 -m vall_e.train yaml=config/your_data/ar.yml
use this command can use double gpus
but the speed didn't fast than the one gpu

can't quantize data

When I tried to quantize the data and enter the command

python -m vall_e.emb.qnt data/your_data

I got this error:

Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/ubuntu/vall-e/vall_e/emb/qnt.py", line 3, in <module> from functools import cache ImportError: cannot import name 'cache' from 'functools' (/usr/lib/python3.8/functools.py)

Training Error

Thanks authors for the amazing work.
I got this error when run the training:

2023-01-28 06:14:22 - vall_e.utils.trainer - INFO - GR=0;LR=0 - 
New epoch starts.
/opt/conda/envs/valle/lib/python3.10/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Traceback (most recent call last):
  File "/opt/conda/envs/valle/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/envs/valle/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/vall-e/vall_e/train.py", line 128, in <module>
    main()
  File "/workspace/vall-e/vall_e/train.py", line 119, in main
    trainer.train(
  File "/workspace/vall-e/vall_e/utils/trainer.py", line 155, in train
    stats = engines.step(feeder=train_feeder, batch=batch)
  File "/workspace/vall-e/vall_e/utils/engines.py", line 171, in step
    raise e
  File "/workspace/vall-e/vall_e/utils/engines.py", line 133, in step
    maybe_loss_and_engine_stats = feeder(
  File "/workspace/vall-e/vall_e/train.py", line 39, in train_feeder
    _ = model(
  File "/opt/conda/envs/valle/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/valle/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
    return func(*args, **kwargs)
  File "/opt/conda/envs/valle/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1836, in forward
    loss = self.module(*inputs, **kwargs)
  File "/opt/conda/envs/valle/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/vall-e/vall_e/vall_e/ar.py", line 49, in forward
    return super().forward(
  File "/workspace/vall-e/vall_e/vall_e/base.py", line 428, in forward
    self.proms_emb(proms_list),
  File "/opt/conda/envs/valle/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/vall-e/vall_e/vall_e/base.py", line 269, in forward
    x = einsum("l k d, n l k -> n d", w, x)
  File "/opt/conda/envs/valle/lib/python3.10/site-packages/torch/functional.py", line 378, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`

My torch version is 1.13.1, and gpu information:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00   Driver Version: 450.142.00   CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   23C    P8    11W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Do you know how to remove this error?

How to Pretrain on LibriTTS

Hi,

Its great to see the implementation of a recent work and appreciable. I was able to setup the training with custom data for single speaker. Following are some of the queries:

What is the sample rate of the training set which is supported? The synthesized audio seems to be having 24k sample rate and single channel ? In emb/qnt.py : only first channel is chosen but there isnt any check on sample rate.
For Pretraining on multiple speaker with LibriTTS data: What is the recommendation on number of epochs and batch size?
Can we pretrain on single speaker dataset? i.e using LJSpeech data.
Is direct finetuning on limited single speaker data recommended? Any suggestions here would help.

Thanks in advance
Sagar

Synthesizer query

python -m vall_e <text> <ref_path> <out_path> --ar-ckpt zoo/ar.pt --nar-ckpt zoo/nar.pt

Would you please elaborate
<ref_path> What should be the reference path here.
<out_path> the path where output is to be saved.

If would be great if you can provide the complete example to run synthesize

There is an error when installing Deepspeed.

I am a person who is studying coding. I'm not good enough, but I'm trying. Thank you for posting good data, and other programs are installed well, but only Deepspeed has an error. My operating system is Windows 10, and my graphics card is built-in. I went to the Deepspeed homepage and tried all sorts of methods, but it didn't work. Please help me. I even searched it, but I have no idea.

anaconda powershell

(base) PS C:\python\vall-e> pip install deepspeed
Collecting deepspeed
Using cached deepspeed-0.7.7.tar.gz (712 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [14 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Public\Documents\ESTsoft\CreatorTemp\pip-install-kejson7y\deepspeed_22029de4adec4e19bdd6eb3d6bf7b01a\setup.py", line 164, in
ext_modules.append(builder.builder())
File "C:\Users\Public\Documents\ESTsoft\CreatorTemp\pip-install-kejson7y\deepspeed_22029de4adec4e19bdd6eb3d6bf7b01a\op_builder\builder.py", line 599, in builder
assert_no_cuda_mismatch()
File "C:\Users\Public\Documents\ESTsoft\CreatorTemp\pip-install-kejson7y\deepspeed_22029de4adec4e19bdd6eb3d6bf7b01a\op_builder\builder.py", line 89, in assert_no_cuda_mismatch
cuda_major, cuda_minor = installed_cuda_version()
File "C:\Users\Public\Documents\ESTsoft\CreatorTemp\pip-install-kejson7y\deepspeed_22029de4adec4e19bdd6eb3d6bf7b01a\op_builder\builder.py", line 41, in installed_cuda_version
assert cuda_home is not None, "CUDA_HOME does not exist, unable to compile CUDA op(s)"
AssertionError: CUDA_HOME does not exist, unable to compile CUDA op(s)
[WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
DS_BUILD_OPS=1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(base) PS C:\python\vall-e>

git bash

user@DESKTOP-NF5O47M MINGW64 /c/python
$ DS_BUILD_OPS=1 pip install deepspeed
Collecting deepspeed
Using cached deepspeed-0.7.7.tar.gz (712 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [14 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Public\Documents\ESTsoft\CreatorTemp\pip-install-5thu072j\deepspeed_1ff8ac6fbc8a4942a69e6ab2cf187056\setup.py", line 164, in
ext_modules.append(builder.builder())
File "C:\Users\Public\Documents\ESTsoft\CreatorTemp\pip-install-5thu072j\deepspeed_1ff8ac6fbc8a4942a69e6ab2cf187056\op_builder\builder.py", line 599, in builder
assert_no_cuda_mismatch()
File "C:\Users\Public\Documents\ESTsoft\CreatorTemp\pip-install-5thu072j\deepspeed_1ff8ac6fbc8a4942a69e6ab2cf187056\op_builder\builder.py", line 89, in assert_no_cuda_mismatch
cuda_major, cuda_minor = installed_cuda_version()
File "C:\Users\Public\Documents\ESTsoft\CreatorTemp\pip-install-5thu072j\deepspeed_1ff8ac6fbc8a4942a69e6ab2cf187056\op_builder\builder.py", line 41, in installed_cuda_version
assert cuda_home is not None, "CUDA_HOME does not exist, unable to compile CUDA op(s)"
AssertionError: CUDA_HOME does not exist, unable to compile CUDA op(s)
[WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
DS_BUILD_OPS=1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

user@DESKTOP-NF5O47M MINGW64 /c/python
$

Missing `test_data_dirs` value from config

The last commit: 2e9f503 enforces the presence of an existing paths argument for the VALLEDatset class constructor.
However, I think that the constructor throws the following error at the start of training due to a missing path variable from cfg.test_data_dirs.

Traceback (most recent call last):
  File "/home/xxxxxx/miniconda3/envs/valle/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/xxxxxx/miniconda3/envs/valle/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/xxxxxx/dev/repos/vall-e/vall_e/train.py", line 130, in <module>
    main()
  File "/home/xxxxxx/dev/repos/vall-e/vall_e/train.py", line 33, in main
    train_dl, train_for_val_dl, val_dl, test_dl = create_train_val_dataloader()
  File "/home/xxxxxx/dev/repos/vall-e/vall_e/data.py", line 283, in create_train_val_dataloader
    train_dataset, val_dataset, test_dataset = create_datasets()
  File "/home/xxxxxx/dev/repos/vall-e/vall_e/data.py", line 272, in create_datasets
    test_dataset = VALLEDatset(
  File "/home/xxxxxx/dev/repos/vall-e/vall_e/data.py", line 109, in __init__
    raise ValueError("No valid path is found. ")
ValueError: No valid path is found.

What does the test_data_dirs represent in the config class?
Is this another argument we should be passing in the cli?

Loss value

Hi.

What loss value is good or not? Both models are quarter. For each model total batch_size is 240. For ar model loss on 220k step ~0.4-0.5. For nar model on 72k ~0.7-1. I have about 72 hour of one speaker. I try to infer that, but there just same audio as promt audio (same speaker) but a bit noisier in the begining. Maybe someone have ideas or already check the model on one speaker?

Unable to pre-compile async_io

pip install git+https://github.com/enhuiz/vall-e
Collecting git+https://github.com/enhuiz/vall-e
Cloning https://github.com/enhuiz/vall-e to c:\users\iac\appdata\local\temp\pip-req-build-4ilqvm3r
Running command git clone --filter=blob:none --quiet https://github.com/enhuiz/vall-e 'C:\Users\IAC\AppData\Local\Temp\pip-req-build-4ilqvm3r'
Resolved https://github.com/enhuiz/vall-e to commit 2e9f503
Running command git submodule update --init --recursive -q
Preparing metadata (setup.py) ... done
Collecting coloredlogs>=15.0.1
Using cached coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Collecting deepspeed>=0.7.7
Using cached deepspeed-0.8.0.tar.gz (749 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [14 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\IAC\AppData\Local\Temp\pip-install-t11ikfm8\deepspeed_018d2278b6c348838129b17a55a708a3\setup.py", line 156, in
abort(f"Unable to pre-compile {op_name}")
File "C:\Users\IAC\AppData\Local\Temp\pip-install-t11ikfm8\deepspeed_018d2278b6c348838129b17a55a708a3\setup.py", line 48, in abort
assert False, msg
AssertionError: Unable to pre-compile async_io
[WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
DS_BUILD_OPS=1
←[93m [WARNING] ←[0m async_io requires the dev libaio .so object and headers but these were not found.
←[93m [WARNING] ←[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
←[93m [WARNING] ←[0m One can disable async_io with DS_BUILD_AIO=0
←[31m [ERROR] ←[0m Unable to pre-compile async_io
[end of output]

Is it possible to use this for conversion rather than TTS?

Any info would be appreciated.

update colab to allow user to upload their own voice

it would be great if there was a capability for the user to upload their own wav files and then type any sentence and have it said in their voice.
can you update the current colab to include these features or atleast point out how that can be done with current colab
@enhuiz

great work , really appreciate all the effort.