Giter Club home page Giter Club logo

lora-svc's Introduction

Singing Voice Conversion based on Whisper & neural source-filter BigVGAN

GitHub Repo stars GitHub forks GitHub issues GitHub
Black technology based on the three giants of artificial intelligence:

OpenAI's whisper, 680,000 hours in multiple languages

Nvidia's bigvgan, anti-aliasing for speech generation

Microsoft's adapter, high-efficiency for fine-tuning

LoRA is not fully implemented in this project, but it can be found here: LoRA TTS & paper

use pretrain model to fine tune

lora-svc-baker.mp4

Dataset preparation

Necessary pre-processing:

  • 1 accompaniment separation, UVR
  • 2 cut audio, less than 30 seconds for whisper, slicer

then put the dataset into the data_raw directory according to the following file structure

data_raw
├───speaker0
│   ├───000001.wav
│   ├───...
│   └───000xxx.wav
└───speaker1
    ├───000001.wav
    ├───...
    └───000xxx.wav

Install dependencies

  • 1 software dependency

    pip install -r requirements.txt

  • 2 download the Timbre Encoder: Speaker-Encoder by @mueller91, put best_model.pth.tar into speaker_pretrain/

  • 3 download whisper model multiple language medium model, Make sure to download medium.pt,put it into whisper_pretrain/

    Tip: whisper is built-in, do not install it additionally, it will conflict and report an error

  • 4 download pretrain model maxgan_pretrain_32K.pth, and do test

    python svc_inference.py --config configs/maxgan.yaml --model maxgan_pretrain_32K.pth --spk ./configs/singers/singer0001.npy --wave test.wav

Data preprocessing

use this command if you want to automate this:

python3 prepare/easyprocess.py

or step by step, as follows:

  • 1, re-sampling

    generate audio with a sampling rate of 16000Hz

    python prepare/preprocess_a.py -w ./data_raw -o ./data_svc/waves-16k -s 16000

    generate audio with a sampling rate of 32000Hz

    python prepare/preprocess_a.py -w ./data_raw -o ./data_svc/waves-32k -s 32000

  • 2, use 16K audio to extract pitch

    python prepare/preprocess_f0.py -w data_svc/waves-16k/ -p data_svc/pitch

  • 3, use 16K audio to extract ppg

    python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p data_svc/whisper

  • 4, use 16k audio to extract timbre code

    python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/speaker

  • 5, extract the singer code for inference

    python prepare/preprocess_speaker_ave.py data_svc/speaker/ data_svc/singer

  • 6, use 32k audio to generate training index

    python prepare/preprocess_train.py

  • 7, training file debugging

    python prepare/preprocess_zzz.py -c configs/maxgan.yaml

data_svc/
└── waves-16k
│    └── speaker0
│    │      ├── 000001.wav
│    │      └── 000xxx.wav
│    └── speaker1
│           ├── 000001.wav
│           └── 000xxx.wav
└── waves-32k
│    └── speaker0
│    │      ├── 000001.wav
│    │      └── 000xxx.wav
│    └── speaker1
│           ├── 000001.wav
│           └── 000xxx.wav
└── pitch
│    └── speaker0
│    │      ├── 000001.pit.npy
│    │      └── 000xxx.pit.npy
│    └── speaker1
│           ├── 000001.pit.npy
│           └── 000xxx.pit.npy
└── whisper
│    └── speaker0
│    │      ├── 000001.ppg.npy
│    │      └── 000xxx.ppg.npy
│    └── speaker1
│           ├── 000001.ppg.npy
│           └── 000xxx.ppg.npy
└── speaker
│    └── speaker0
│    │      ├── 000001.spk.npy
│    │      └── 000xxx.spk.npy
│    └── speaker1
│           ├── 000001.spk.npy
│           └── 000xxx.spk.npy
└── singer
    ├── speaker0.spk.npy
    └── speaker1.spk.npy

Train

  • 0, if fine-tuning based on the pre-trained model, you need to download the pre-trained model: maxgan_pretrain_32K.pth

    set pretrain: "./maxgan_pretrain_32K.pth" in configs/maxgan.yaml,and adjust the learning rate appropriately, eg 1e-5

  • 1, start training

    python svc_trainer.py -c configs/maxgan.yaml -n svc

  • 2, resume training

    python svc_trainer.py -c configs/maxgan.yaml -n svc -p chkpt/svc/***.pth

  • 3, view log

    tensorboard --logdir logs/

final_model_loss

Inference

use this command if you want a GUI that does all the commands below:

python3 svc_gui.py

or step by step, as follows:

  • 1, export inference model

    python svc_export.py --config configs/maxgan.yaml --checkpoint_path chkpt/svc/***.pt

  • 2, use whisper to extract content encoding, without using one-click reasoning, in order to reduce GPU memory usage

    python whisper/inference.py -w test.wav -p test.ppg.npy

  • 3, extract the F0 parameter to the csv text format

    python pitch/inference.py -w test.wav -p test.csv

  • 4, specify parameters and infer

    python svc_inference.py --config configs/maxgan.yaml --model maxgan_g.pth --spk ./data_svc/singers/your_singer.npy --wave test.wav --ppg test.ppg.npy --pit test.csv

    when --ppg is specified, when the same audio is reasoned multiple times, it can avoid repeated extraction of audio content codes; if it is not specified, it will be automatically extracted;

    when --pit is specified, the manually tuned F0 parameter can be loaded; if not specified, it will be automatically extracted;

    generate files in the current directory:svc_out.wav

    args --config --model --spk --wave --ppg --pit --shift
    name config path model path speaker wave input wave ppg wave pitch pitch shift
  • 5, post by vad

    python svc_inference_post.py --ref test.wav --svc svc_out.wav --out svc_post.wav

Source of code and References

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

AdaSpeech: Adaptive Text to Speech for Custom Voice

https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/01-nsf

https://github.com/mindslab-ai/univnet [paper]

https://github.com/openai/whisper/ [paper]

https://github.com/NVIDIA/BigVGAN [paper]

lora-svc's People

Contributors

0mis avatar dcvalish avatar dlseed avatar kakaruhayate avatar maxmax2016 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lora-svc's Issues

高音崩掉了

语音数据集1000句训练10个EPOCH,转歌声时使用了Pitch shift,结果高音部分崩掉,这是因为Base model没有见过高音吗
image

能否提供数据集准备教程及训练时的注意事项与心得?

感谢大神将SVC往LoRA方向带!提两个建议:
1.能否提供一个数据集准备的简易教程,即从收集到的原始音频开始,如何一步步处理为符合训练要求的多个npy文件?
2.能否提供一些训练时的主意事项和心得?比如目前需要多少训练数据、数据质量上的要求、需要训练多久、如何判断训练已经完成、能否在56歌声模型上继续训练等

有关数据集

如果我的应用场景,推理时输入音频只会有歌声,需不需要加入各种语音数据作为大数据base模型?(假设fine tune的小样本目标音色数据集只有语音数据)

Help!

What does this mean?

Capture

about the current demo

Thank you for your great job and your sharing.
I have some questions:

  1. For the demos in Bilibili, did you use the code you released, or you generated the demo using more complited model with hubert and ppg used?
  2. Have you did any experiments on cross-domain svc, I mean vc between speakers who only have speech data and speakers who only have singing dataset?
  3. You said "Big data [more and more wave] make things to be interesing!". What did you exactly mean? cross-languages?
  4. Could you share some results without background music?
    Thank you again!

执行频率扩展时遇到严重性能问题

在执行频率扩展时遇到了严重的性能问题,占用情况如下图所示:
image
吃掉了我20G+的内存,跑满cpu,在C盘疯狂读写,持续15min+未结束。

  • 电脑配置及环境:
    • 处理器 13th Gen Intel(R) Core(TM) i5-13600KF 3.50 GHz
    • 机带 RAM 32.0 GB (31.8 GB 可用)
    • 版本 Windows 11 专业版
    • 版本 22H2
    • 操作系统版本 22621.1555
    • pycharm + anacoda + python3.9

感觉可以接一个hifigan-bwe超采样下?

如题,现在采样率有点低(16kHz),我看仓库里放了很多效果器,直接应用效果不是很理想

试了下使用hifi-gan-bwe进行超采样

目前使用了三个预训练模型进行超采样:
输入源是opencpop中2018.wav
推理直出
svc_out.zip
hifi-gan-bwe-10-42890e3-vctk-48kHz
svc_out_bwe.zip
hifi-gan-bwe-12-b086d8b-vctk-16kHz-48kHz
svc_out_bwe16_48.zip
hifi-gan-bwe-05-cd9f4ca-vctk-48kHz
svc_out_bwevctk48.zip

或许也可以参考DDSP-SVC中使用预训练声码器增强?

cannot reproduce your results

It's really a great job, but I cannot reproduce the result, it sounds not good, here is some information
image

individualAudio2.mp4
individualAudio1.mp4

so, what could be the problem?

执行音质增强时提示:HifiGAN model file is not found!

image
如上图,按照说明放置了相关文件,但是执行时报错如下:

(lora-svc) PS G:\AI\lora-svc> python svc_val_nsf_hifigan.py
| Hparams chains:  ['nsf_hifigan/configs/basics/base.yaml', 'nsf_hifigan/configs/basics/fs2.yaml', 'nsf_hifigan/configs/acoustic/nomidi.yaml']
| Hparams: 
K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 128, audio_sample_rate: 44100, base_config: ['nsf_hifigan/configs/basics/fs2.yaml'], 
binarization_args: {'shuffle': True, 'with_txt': True, 'with_wav': False, 'with_align': True, 'with_spk_embed': False, 'with_f0': True, 'with_f0cwt': True}, binarizer_cls: data_gen.acoustic.AcousticBinarizer, binary_data_dir: data/opencpop/binary, check_val_every_n_epoch: 10, clip_grad_norm: 1, 
content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, 
cwt_std_scale: 0.8, datasets: ['opencpop'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, 
decay_steps: 50000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l2, 
dilation_cycle_length: 4, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, 
dur_predictor_kernel: 3, dur_predictor_layers: 2, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, 
encoder_type: fft, endless_ds: True, f0_embed_type: continuous, ffn_act: gelu, ffn_padding: SAME, 
fft_size: 2048, fmax: 16000, fmin: 40, g2p_dictionary: nsf_hifigan/na.txt, gamma: 0.5, 
gaussian_start: True, gen_dir_name: , gen_tgt_spk_id: -1, hidden_size: 256, hop_size: 512, 
infer: False, keep_bins: 128, lambda_commit: 0.25, lambda_energy: 0.0, lambda_f0: 0.0, 
lambda_ph_dur: 0.0, lambda_sent_dur: 0.0, lambda_uv: 0.0, lambda_word_dur: 0.0, load_ckpt: , 
log_interval: 100, loud_norm: False, lr: 0.0004, max_beta: 0.02, max_epochs: 1000, 
max_eval_sentences: 1, max_eval_tokens: 60000, max_frames: 8000, max_input_tokens: 1550, max_sentences: 48, 
max_tokens: 80000, max_updates: 320000, mel_loss: ssim:0.5|l1:0.5, mel_vmax: 1.5, mel_vmin: -6.0, 
min_level_db: -120, norm_type: gn, num_ckpt_keep: 3, num_heads: 2, num_sanity_val_steps: 1,
num_spk: 1, num_test_samples: 0, num_valid_plots: 10, optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98,
original_g2p_dictionary: nsf_hifigan/na.txt, out_wav_norm: False, permanent_ckpt_interval: 40000, permanent_ckpt_start: 120000, pitch_ar: False,
pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l1, pitch_norm: log, pitch_type: frame,
pndm_speedup: 10, pre_align_args: {'use_tone': True, 'forced_align': 'mfa', 'use_sox': False, 'txt_processor': 'en', 'allow_no_txt': False, 'denoise': False}, pre_align_cls: , predictor_dropout: 0.5, predictor_gpredictor_predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256,
pretrain_fs_ckpt: , processed_data_dir: , profile_infer: False, raw_data_dir: data/opencpop/raw, ref_norm_layer: bn,
rel_pos: True, reset_phone_dict: True, residual_channels: 384, residual_layers: 20, save_best: False,
save_ckpt: True, save_codes: ['configs', 'modules', 'src', 'utils'], save_f0: True, save_gt: False, schedule_type: linear,
seed: 1234, sort_by_len: True, spec_max: [0], spec_min: [-5], spk_cond_steps: [],
stop_token_weight: 5.0, task_cls: src.naive_task.NaiveTask, test_ids: [], test_input_dir: , test_num: 0,
test_prefixes: ['2044', '2086', '2092', '2093', '2100'], test_set_name: test, timesteps: 1000, train_set_name: train, use_denoise: False,
use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, use_key_shift_embed: False, use_midi: False,
use_nsf: True, use_pitch_embed: True, use_pos_embed: True, use_speed_embed: False, use_spk_embed: False,
use_spk_id: False, use_split_spk_id: False, use_uv: False, use_var_enc: False, val_check_interval: 2000,
valid_num: 0, valid_set_name: valid, validate: False, vocoder: NsfHifiGAN, vocoder_ckpt: nsf_hifigan_pretrain/nsf_hifigan/model,
warmup_updates: 2000, wav2spec_eps: 1e-6, weight_decay: 0, win_size: 2048, work_dir: ,

Traceback (most recent call last):
  File "G:\AI\lora-svc\svc_val_nsf_hifigan.py", line 56, in <module>
    vocoder = NsfHifiGAN()
  File "G:\AI\lora-svc\nsf_hifigan\src\vocoders\nsf_hifigan.py", line 18, in __init__
    assert os.path.exists(model_path), 'HifiGAN model file is not found!'
AssertionError: HifiGAN model file is not found!

hubert提取特征报错

使用hubert base模型提取音频特征的时候,遇到下面 一个报错:

File "5_compute_hubert_dis.py", line 56, in compute_hubert
models, save_cfg, task = checkpoint_utils.load_model_ensemble_and_task([model_path], suffix="")
File "/mnt/NFS1/speech/hegang1/anc3/envs/vc2/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 279, in load_model_ensemble_and_task
state = load_checkpoint_to_cpu(filename, arg_overrides)
File "/mnt/NFS1/speech/hegang1/anc3/envs/vc2/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 232, in load_checkpoint_to_cpu
state = _upgrade_state_dict(state)
File "/mnt/NFS1/speech/hegang1/anc3/envs/vc2/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 420, in _upgrade_state_dict
state["args"].task = "translation"
AttributeError: 'NoneType' object has no attribute 'task'

请问如何解决?

关于训练集疑问

网络上一般比较容易获得带有伴奏的歌曲,但是用人声分离总会或多或少得到不太干净的vocal,您是否试过用人声分离得到的vocal来训练?

一些疑问

1、有没有训过多人的?
我在最一开始看vits代码的时候,就好奇,为什么说话人信息(vits原作是id,这边是说话人vector)要加在后验编码器和decoder上,直到后来看到他基于这个做语音转换。
但是既然我们是通过pretrained hubert/ppg来输入source语义的话,不需要用vits的多说话人互相转音色功能,g的输入应该去掉才是?毕竟左边的后验+decoder分支应该是通用的vocoder功能吧,不需要跟说话人相关。
另外,如果是单人的话,说话人向量可能直接不需要了
2、DistributedBucketSampler最大调整到20s,不是很确定训练时显存占用如何。batch size不能开太大,否则比更小的音频时长,更容易爆显存,毕竟是在decoder前才做的slice。如果切得更碎一点,把20s降低,是不是更好?
3、看上去带nsf的decoder输入音高的功能还在开发中,先期待一发

音色融合请教

你好请问一下,音色融合的想法是什么?是对speaker embedding的线性组合吗?

关于歌声转换的疑问

您好,目前训练出的模型,对于正常语音转换效果很好,但是歌声的话,如果有乐器音,乐器音会被屏蔽,是否需要先做人声分离,后面再贴回去?

SpeakerAdapter

想请问下,SpeakerAdapter这个模块作用大吗?特别是在fine-tune 的时候

Can someone help me through discord?

I apologize, but some of the instructions are vague, and I need help with being able to do this. If someone can help me through discord or through here, that would be appreciated.

Missing speaker file

For the inference on the lora-svc-for-pretrain branch, we need the singer's speakers file,

python svc_inference.py --config config/maxgan.yaml --model maxgan_g.pth --spk ./config/singers/singer0001.npy --wave test.wav

How we can generate this files? (./config/singers/singer0001.npy )

万恶的spec缓存

data_util会把spec缓存成pt文件,然后之后直接读取
假如我同一个wav文件夹,要跑不同的config-spec配置同时训练,直接打架了,总是shape是对不上的;更离谱的是,因为有phone长度的裁剪,运行训练是不会报错的
最后训出来发现infer长度不对,语义也都不对
一度怀疑人生,查了半天,发现spec和phone长度比例是我调整的新老hop_size的比例,总算得出问题所在

所以有2个问题
1、spec有必要缓存吗,pytorch-gpu求spec,很耗时吗?
2、如果要缓存的话,建议写成preprocess预处理的形式,可以存在不同的文件夹下,filelist文件里指定路径,否则每换一个config,都要在data_utils.py里面改缓存路径(而且pytorch训练可能每个epoch后都会重新读取data_utils.py文件,非常坑爹)。
在dataloader里面边缓存边训练,太脏了,显存占用开始的时候也会大幅抖动。。万一晚上睡觉前挂上,早起发现爆显存了,就亏大了。。

Language ppgs

Hi,我看您列的数据集很多,也说更多数据更有趣。
我有些问题想请教一下,或许您有比较多的经验
1.我看代码里HuBert是中文的,那假如我的target speaker是英文,这里是不是就要对应换成英文的pretrained Hubert?
2.在训练过程中,您framework图里模型HuBert/VQVAE/Speaker权重是固定住了呗?直接用固定权重的pretrained model提取(看代码是直接在dataloader读出来了)
3.但训练数据可能包含中英文,是不是要根据语言找对应的HuBert,而且Hubert的特征还要dim一样?
4.Song的数据Hubert也能识别嘛?不需要在song数据上finetune嘛?
5.Crossdomain或者indomain 有了speech数据&song数据都会变得更好?

这个是显存不够吗?

Traceback (most recent call last):
File "svc_trainer.py", line 46, in
train(0, args, args.checkpoint_path, hp, hp_str)
File "/home/yango/code/Thin-Plate-Spline-Motion-Model/third_party/DINet/third_party/lora-svc/utils/train.py", line 109, in train
validate(hp, args, model_g, model_d, valloader, stft, writer, step, device)
File "/home/yango/code/Thin-Plate-Spline-Motion-Model/third_party/DINet/third_party/lora-svc/utils/validation.py", line 22, in validate
mel_fake = stft.mel_spectrogram(fake_audio.squeeze(1))
File "/home/yango/code/Thin-Plate-Spline-Motion-Model/third_party/DINet/third_party/lora-svc/utils/stft.py", line 90, in mel_spectrogram
center=self.center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False)
File "/home/yango/.conda/envs/dinet/lib/python3.7/site-packages/torch/functional.py", line 607, in stft
normalized, onesided, return_complex)
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

ZeroDivisionError: float division by zero

!python svc_trainer.py -c config/maxgan.yaml -n lora -p model_pretrain/maxgan_pretrain.pth

Batch size per GPU : 4
2023-04-01 21:56:46.421916: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-01 21:56:47.377551: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-04-01 21:56:48,342 - INFO - NumExpr defaulting to 2 threads.
----------0----------
2023-04-01 21:56:48,782 - INFO - Resuming from checkpoint: model_pretrain/maxgan_pretrain.pth
----------25----------
Validation loop: 0it [00:00, ?it/s]
Traceback (most recent call last):
File "/content/lora-svc/svc_trainer.py", line 44, in
train(0, args, args.checkpoint_path, hp, hp_str)
File "/content/lora-svc/utils/train.py", line 109, in train
validate(hp, args, model_g, model_d, valloader, stft, writer, step, device)
File "/content/lora-svc/utils/validation.py", line 37, in validate
mel_loss = mel_loss / len(valloader.dataset)
ZeroDivisionError: float division by zero

I think the lora-svc/filelists/eval.txt file is the problem
when i copy some samples from lora-svc/filelists/train.txt to lora-svc/filelists/eval.txt it starts training
That is ok?

人声克隆

这个项目如果做说话人的声音克隆有什么地方需要注意修改的嘛?如果想做vc而不是svc

ValueError: zero-size array to reduction operation maximum which has no identity

vc_preprocess_speaker_lora.py:37: RuntimeWarning: Mean of empty slice. speaker_ave = speaker_ave + pitch.mean() C:\Users\phill\miniconda3\envs\lora-svc\lib\site-packages\numpy\core\_methods.py:190: RuntimeWarning: invalid value encountered in divide ret = ret.dtype.type(ret / rcount) Traceback (most recent call last): File "svc_preprocess_speaker_lora.py", line 39, in <module> if (speaker_max < pitch.max()): File "C:\Users\phill\miniconda3\envs\lora-svc\lib\site-packages\numpy\core\_methods.py", line 40, in _amax return umr_maximum(a, axis, None, out, keepdims, initial, where) ValueError: zero-size array to reduction operation maximum which has no identity

Please help.

这里可能为空, 如果为空的时候需要跳过

if (speaker_max < pitch.max()):

/content/lora-svc/svc_preprocess_speaker_lora.py:37: RuntimeWarning: Mean of empty slice.
speaker_ave = speaker_ave + pitch.mean()
/usr/local/lib/python3.9/dist-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/content/lora-svc/svc_preprocess_speaker_lora.py", line 39, in
if (speaker_max < pitch.max()):
File "/usr/local/lib/python3.9/dist-packages/numpy/core/_methods.py", line 40, in _amax
return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.