voicepaw / so-vits-svc-fork Goto Github PK

View Code? Open in Web Editor NEW

8.3K 69.0 1.1K 20.3 MB

so-vits-svc fork with realtime support, improved interface and more features.

License: Other

JavaScript 0.08% Python 94.92% Jupyter Notebook 3.39% Dockerfile 0.09% Batchfile 1.51%

sovits vits voice-conversion so-vits-svc hubert softvc realtime voice-changer deep-learning pytorch

so-vits-svc-fork's Introduction

SoftVC VITS Singing Voice Conversion Fork

简体中文

A fork of so-vits-svc with realtime support and greatly improved interface. Based on branch 4.0 (v1) (or 4.1) and the models are compatible.

Updates to this repository have been limited to maintenance since Spring 2023. It is difficult to narrow the list of alternatives here, but please consider trying other projects if you are looking for a voice changer with even better performance (especially in terms of latency other than quality). However, this project may be ideal for those who want to try out voice conversion for the moment (because it is easy to install).

Features not available in the original repo

Realtime voice conversion (enhanced in v1.1.0)
Integrates QuickVC
Fixed misuse of ContentVec in the original repository.¹
More accurate pitch estimation using CREPE.
GUI and unified CLI available
~2x faster training
Ready to use just by installing with pip.
Automatically download pretrained models. No need to install fairseq.
Code completely formatted with black, isort, autoflake etc.

Installation

Option 1. One click easy installation

This BAT file will automatically perform the steps described below.

Option 2. Manual installation (using pipx, experimental)

1. Installing pipx

Windows (development version required due to pypa/pipx#940):

py -3 -m pip install --user git+https://github.com/pypa/pipx.git
py -3 -m pipx ensurepath

Linux/MacOS:

python -m pip install --user pipx
python -m pipx ensurepath

2. Installing so-vits-svc-fork

pipx install so-vits-svc-fork --python=3.10
pipx inject so-vits-svc-fork torch torchaudio --pip-args="--upgrade" --index-url=https://download.pytorch.org/whl/cu121 # https://download.pytorch.org/whl/nightly/cu121

Option 3. Manual installation

Creating a virtual environment

Windows:

py -3.10 -m venv venv
venv\Scripts\activate

Linux/MacOS:

python3.10 -m venv venv
source venv/bin/activate

Anaconda:

conda create -n so-vits-svc-fork python=3.10 pip
conda activate so-vits-svc-fork

Installing without creating a virtual environment may cause a PermissionError if Python is installed in Program Files, etc.

Install this via pip (or your favourite package manager that uses pip):

python -m pip install -U pip setuptools wheel
pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu121 # https://download.pytorch.org/whl/nightly/cu121
pip install -U so-vits-svc-fork

Notes

If no GPU is available or using MacOS, simply remove pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu121. MPS is probably supported.
If you are using an AMD GPU on Linux, replace --index-url https://download.pytorch.org/whl/cu121 with --index-url https://download.pytorch.org/whl/nightly/rocm5.7. AMD GPUs are not supported on Windows (#120).

Update

Please update this package regularly to get the latest features and bug fixes.

pip install -U so-vits-svc-fork
# pipx upgrade so-vits-svc-fork

Usage

Inference

GUI

GUI launches with the following command:

svcg

CLI

Realtime (from microphone)

svc vc

File

svc infer source.wav

Pretrained models are available on Hugging Face or CIVITAI.

Notes

If using WSL, please note that WSL requires additional setup to handle audio and the GUI will not work without finding an audio device.
In real-time inference, if there is noise on the inputs, the HuBERT model will react to those as well. Consider using realtime noise reduction applications such as RTX Voice in this case.
Models other than for 4.0v1 or this repository are not supported.
GPU inference requires at least 4 GB of VRAM. If it does not work, try CPU inference as it is fast enough. ²

Training

Before training

If your dataset has BGM, please remove the BGM using software such as Ultimate Vocal Remover. 3_HP-Vocal-UVR.pth or UVR-MDX-NET Main is recommended. ³
If your dataset is a long audio file with a single speaker, use svc pre-split to split the dataset into multiple files (using librosa).
If your dataset is a long audio file with multiple speakers, use svc pre-sd to split the dataset into multiple files (using pyannote.audio). Further manual classification may be necessary due to accuracy issues. If speakers speak with a variety of speech styles, set --min-speakers larger than the actual number of speakers. Due to unresolved dependencies, please install pyannote.audio manually: pip install pyannote-audio.
To manually classify audio files, svc pre-classify is available. Up and down arrow keys can be used to change the playback speed.

Cloud

⁴

If you do not have access to a GPU with more than 10 GB of VRAM, the free plan of Google Colab is recommended for light users and the Pro/Growth plan of Paperspace is recommended for heavy users. Conversely, if you have access to a high-end GPU, the use of cloud services is not recommended.

Local

Place your dataset like dataset_raw/{speaker_id}/**/{wav_file}.{any_format} (subfolders and non-ASCII filenames are acceptable) and run:

svc pre-resample
svc pre-config
svc pre-hubert
svc train -t

Notes

Dataset audio duration per file should be <~ 10s.
Need at least 4GB of VRAM. ⁵
It is recommended to increase the batch_size as much as possible in config.json before the train command to match the VRAM capacity. Setting batch_size to auto-{init_batch_size}-{max_n_trials} (or simply auto) will automatically increase batch_size until OOM error occurs, but may not be useful in some cases.
To use CREPE, replace svc pre-hubert with svc pre-hubert -fm crepe.
To use ContentVec correctly, replace svc pre-config with -t so-vits-svc-4.0v1. Training may take slightly longer because some weights are reset due to reusing legacy initial generator weights.
To use MS-iSTFT Decoder, replace svc pre-config with svc pre-config -t quickvc.
Silence removal and volume normalization are automatically performed (as in the upstream repo) and are not required.
If you have trained on a large, copyright-free dataset, consider releasing it as an initial model.
For further details (e.g. parameters, etc.), you can see the Wiki or Discussions.

Further help

For more details, run svc -h or svc <subcommand> -h.

> svc -h
Usage: svc [OPTIONS] COMMAND [ARGS]...

  so-vits-svc allows any folder structure for training data.
  However, the following folder structure is recommended.
      When training: dataset_raw/{speaker_name}/**/{wav_name}.{any_format}
      When inference: configs/44k/config.json, logs/44k/G_XXXX.pth
  If the folder structure is followed, you DO NOT NEED TO SPECIFY model path, config path, etc.
  (The latest model will be automatically loaded.)
  To train a model, run pre-resample, pre-config, pre-hubert, train.
  To infer a model, run infer.

Options:
  -h, --help  Show this message and exit.

Commands:
  clean          Clean up files, only useful if you are using the default file structure
  infer          Inference
  onnx           Export model to onnx (currently not working)
  pre-classify   Classify multiple audio files into multiple files
  pre-config     Preprocessing part 2: config
  pre-hubert     Preprocessing part 3: hubert If the HuBERT model is not found, it will be...
  pre-resample   Preprocessing part 1: resample
  pre-sd         Speech diarization using pyannote.audio
  pre-split      Split audio files into multiple files
  train          Train model If D_0.pth or G_0.pth not found, automatically download from hub.
  train-cluster  Train k-means clustering
  vc             Realtime inference from microphone

External Links

Video Tutorial

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_34j 💻 🤔 📖 💡 🚇 🚧 👀 ⚠️ ✅ 📣 🐛	_{GarrettConway} 💻 🐛 📖 👀	_BlueAmulet 🤔 💬 💻 🚧	_{ThrowawayAccount01} 🐛	_緋 📖 🐛	_Lordmau5 🐛 💻 🤔 🚧 💬 📓	_DL909 🐛
_Satisfy256 🐛	_{Pierluigi Zagaria} 📓	_{ruckusmattster} 🐛	_Desuka-art 🐛	_heyfixit 📖	_{Nerdy Rodent} 📹	_谢宇 📖
_ColdCawfee 🐛	_sbersier 🤔 📓 🐛	_Meldoner 🐛 🤔 💻	_mmodeusher 🐛	_AlonDan 🐛	_Likkkez 🐛	_{Duct Tape Games} 🐛
_{Xianglong He} 🐛	_75aosu 🐛	_tonyco82 🐛	_yxlllc 🤔 💻	_outhipped 🐛	_{escoolioinglesias} 🐛 📓 📹	_Blacksingh 🐛
_{Mgs. M. Thoyib Antarnusa} 🐛	_Exosfeer 🐛 💻	_guranon 🐛 🤔 💻	_{Alexander Koumis} 💻	_acekagami 🌍	_Highupech 🐛	_Scorpi 💻
_Maximxls 💻	_Star3Lord 🐛 💻	_Forkoz 🐛 💻	_{Zerui Chen} 💻 🤔	_{Roee Shenberg} 📓 🤔 💻	_Justas 🐛 💻	_Onako2 📖
_4ll0w3v1l 💻	_j5y0V6b 🛡️	_{marcellocirelli} 🐛	_{Priyanshu Patel} 💻	_{Anna Gorshunova} 🐛 💻

This project follows the all-contributors specification. Contributions of any kind welcome!

#206 ↩
#469 ↩
https://ytpmv.info/how-to-use-uvr/ ↩
If you register a referral code and then add a payment method, you may save about $5 on your first month's monthly billing. Note that both referral rewards are Paperspace credits and not cash. It was a tough decision but inserted because debugging and training the initial model requires a large amount of computing power and the developer is a student. ↩
#456 ↩

so-vits-svc-fork's People

Contributors

Stargazers

Watchers

Forkers

entropyriser zunan-islands garrettconway nishikinoyan 332plim frangled123 cris140 shrey0526 freekatz zfbok forsakenrei evensharper amorjnyh directlinkss mizera-mondo cyoyo-geek cian0 sirbitesalot covetcode kekewind chiaoso lewistham9x eve2ptp oxforevero lordmau5 hufy-dev newmedia2 rne1223 coderx7 g-force78 geocine blueamulet kandy22 yellowlime77 nekonekokawaii opcheese asksasasa83 mitayuming ultragenma heyfixit xieyumc interaktionab shikiexe tinaxinoo lphll mykeehu hungrykonata blz-galaxy yypsybs galasal miosavart o171b moerehman louischubb ryanmarchettii nullstd fsd7 go22220203 philipluk f0restw0w somewheresy splinter21 seemfree notskynet zerodevi1 alvarocardenasc zerohackz muruganr96 morokoh akramsystems chrispviews akasakaid fjxhkj zevist02 maskkit byhamzahwijaya epromite zeemyo d34d633f mirrorka suyash2702 justinjohn0306 ne0escape hikako entn-at moeyard dotweb3 sx-tts guranon setmaster ishine kokizzu nihiliaan tylernguyen-guarda kyutarou alinccc melancholyslime yifenglv46 yxlllc remiliast

so-vits-svc-fork's Issues

real time in colab

Is there a way to use real time in colab?

Not installing properly

I can't seem to get it to install.
Doing "pip install so-vits-svc-fork", but when I do 'svcg' or any other svc command, I get "'svcg' is not recognized as an internal or external command,
operable program or batch file.". I've tried anaconda as well, but getting the same thing.
Usually this means that It's not in the path, but doing "pip list" doesn't show it, so I assume It's not installing properly?

Any help would be appreciated.

error in training on colab

The tensorboard extension is already loaded. To reload it, use:

  %reload_ext tensorboard
Reusing TensorBoard on port 6006 (pid 4018), started 0:07:57 ago. (Use '!kill 4018' to kill it.)
[09:58:13] Version: 1.4.1
[09:58:16] Version: 1.4.1
[09:58:18] {'train': {'log_interval': 200, 'eval_interval': 800, 'seed': 1234, 'epochs': 10000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 6, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 10240, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'use_sr': True, 'max_speclen': 512, 'port': '8001', 'keep_ckpts': 3}, 'data': {'training_files': 'filelists/44k/train.txt', 'validation_files': 'filelists/44k/val.txt', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': 22050}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256, 'ssl_dim': 256, 'n_speakers': 200}, 'spk': {'matt': 0}, 'model_dir': 'drive/MyDrive/so-vits-svc-fork/logs/44k'}
2023-03-26 09:58:18.101134: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-26 09:58:19.110665: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-26 09:58:19.110804: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-26 09:58:19.110825: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[09:58:19] NumExpr defaulting to 2 threads.
[09:58:20] Added key: store_based_barrier_key:1 to store for rank: 0
[09:58:20] Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
[09:58:24] 'emb_g.weight'
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/utils.py", line 413, in load_checkpoint
    new_state_dict[k] = saved_state_dict[k]
KeyError: 'emb_g.weight'
[09:58:24] emb_g.weight is not in the checkpoint
[09:58:24] Loaded checkpoint 'drive/MyDrive/so-vits-svc-fork/logs/44k/G_0.pth' (iteration 0)
[09:58:24] Loaded checkpoint 'drive/MyDrive/so-vits-svc-fork/logs/44k/D_0.pth' (iteration 0)
[09:58:24] Start training
  0% 0/10000 [00:00<?, ?it/s][09:58:27] Version: 1.4.1
[09:58:27] Version: 1.4.1
[09:58:35] Reducer buckets have been rebuilt in this iteration.
[09:58:36] /usr/local/lib/python3.9/dist-packages/torch/autograd/__init__.py:197: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [32, 1, 4], strides() = [4, 1, 1]
bucket_view.sizes() = [32, 1, 4], strides() = [4, 4, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:325.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

[09:58:37] Train Epoch: 1 [0%]
[09:58:37] Losses: [2.5402114391326904, 2.7362656593322754, 9.689486503601074, 52.52246856689453, 7.239556312561035], step: 0, lr: 0.0001
[09:58:37] Saving checkpoints...
[09:58:38] Version: 1.4.1
[09:58:42] Saving model and optimizer state at iteration 1 to drive/MyDrive/so-vits-svc-fork/logs/44k/G_0.pth
[09:58:44] Saving model and optimizer state at iteration 1 to drive/MyDrive/so-vits-svc-fork/logs/44k/D_0.pth
[09:58:47] Cleaning old checkpoints...
[09:58:47] Reducer buckets have been rebuilt in this iteration.
  0% 0/10000 [00:28<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/svc", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/__main__.py", line 128, in train
    train(config_path=config_path, model_path=model_path)
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/train.py", line 49, in train
    mp.spawn(
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/train.py", line 158, in run
    train_and_evaluate(
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/train.py", line 200, in train_and_evaluate
    for batch_idx, items in enumerate(train_loader):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.9/dist-packages/torch/_utils.py", line 543, in reraise
    raise exception
EOFError: Caught EOFError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/data_utils.py", line 102, in __getitem__
    return self.get_audio(self.audiopaths[index][0])
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/data_utils.py", line 51, in get_audio
    spec = torch.load(spec_filename)
  File "/usr/local/lib/python3.9/dist-packages/torch/serialization.py", line 795, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.9/dist-packages/torch/serialization.py", line 1002, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

Newest version UI keeps trying to start and end ASIO4All (1.2.9+)

Describe the bug
Audio is very choppy. It appears that it keeps opening and closing Asio4All (which I remember was installed with FL Studio)
Seems to come from some change in the GUI code for the speakers and devices not updating in 1.2.9
https://lord.moe/1679589472-54728.mp4

To Reproduce
Have Asio4All installed and try to launch the GUI

Additional context
Wasn't happening in 1.2.8 so whatever was changed in 1.2.9 broke this
Managed to install 1.2.8 and it works fine there

is it possible to make a amd version using HSA COMPUTE with OpenCL or a directx12 version of this

How to inference to all voice files in the directory

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Additional context
Add any other context about the problem here.

"svc infer" This command can only be one file at a time. Is there a way to infer to multiple files at once?

Error when trying to train cluster

Using svc train-cluster gives the following error:

C:\Users\LXC PC\Desktop\sovits\venv\Scripts>svc train-cluster
51it [00:00, 3288.89it/s]
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\joblib\externals\loky\process_executor.py", line 428, in _process_worker
    r = call_item()
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\joblib\externals\loky\process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\joblib\_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\joblib\parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\so_vits_svc_fork\cluster\train_cluster.py", line 57, in train_cluster_
    return input_path.stem, train_cluster(input_path, **kwargs)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\so_vits_svc_fork\cluster\train_cluster.py", line 35, in train_cluster
    kmeans = MiniBatchKMeans(
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\sklearn\cluster\_kmeans.py", line 2028, in fit
    self._check_params_vs_input(X)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\sklearn\cluster\_kmeans.py", line 1872, in _check_params_vs_input
    super()._check_params_vs_input(X, default_n_init=3)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\sklearn\cluster\_kmeans.py", line 859, in _check_params_vs_input
    raise ValueError(
ValueError: n_samples=8562 should be >= n_clusters=10000.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\LXC PC\Desktop\sovits\venv\Scripts\svc.exe\__main__.py", line 7, in <module>
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\so_vits_svc_fork\__main__.py", line 579, in train_cluster
    main(input_dir=input_dir, output_path=output_path, n_clusters=n_clusters)
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\so_vits_svc_fork\cluster\train_cluster.py", line 59, in main
    parallel_result = Parallel(n_jobs=-1)(
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\joblib\parallel.py", line 1098, in __call__
    self.retrieve()
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\joblib\parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\LXC PC\Desktop\sovits\venv\lib\site-packages\joblib\_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 446, in result
    return self.__get_result()
  File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 391, in __get_result
    raise self._exception
ValueError: n_samples=8562 should be >= n_clusters=10000.

Speaker not automatically set to 0 if not found when cluster_ratio != 0

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Additional context
Add any other context about the problem here.

ONNX not working

Describe the bug
Onnx code is dead and not working

To Reproduce
svc onnx

Additional context
Add any other context about the problem here.

Pre-resample not running?

Trying to run svc pre-resample but it doesn't start:

(venv) C:\Users\LXC PC\Desktop\sovits\venv\Scripts>svc pre-resample
Preprocessing: 0it [00:00, ?it/s]

svc pre-resample resamples to 22 kHz and then to 44 kHz

I have wav files in the dataset_raw in mono 44.1 kHz sample rate. When I run svc pre-resample, the files are saved in dataset folder and when I played them I noticed that they sound lower quality.
When I checked with a spectrogram, I noticed that the samples were half quality.

Workaround: copy the wavs from dataset_raw to dataset folder.

Meaningless warning occurs if file name contains hyphen in `pre-flist_config`.

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Additional context
Add any other context about the problem here.

docs(notebook)-issue: Some possible issues in colab notebook of #84.

Describe the bug

Some possible issues in Colab Notebook of #84. Step Training Cluster model and Use trained model

To Reproduce

Issue of Training Cluster model

Directly run Step Training Cluster model:

!svc train-cluster --output-path drive/MyDrive/so-vits-svc-fork/logs/44k

It occurred RuntimeError: File drive/MyDrive/so-vits-svc-fork/logs/44k cannot be opened., which means the output cluster file does not specify a default file name.

May need to be modified to
!svc train-cluster --output-path drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt

Issue of Use trained model (Use trained model (with cluster))

Run

!svc infer-cluster {NAME}.wav -r 0.1 -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -k drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt
display(Audio(f"{NAME}.out.wav", autoplay=True))

Return

Error: No such command 'infer-cluster'.

This code may need to be changed to:

!svc infer -cluster {NAME}.wav -s speaker -r 0.1 -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -k drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt
display(Audio(f"{NAME}.out.wav", autoplay=True))

Parameter -cluster needs to be separated from the previous command and it may need to specify -s parameter otherwise
it may occurred ValueError: Speaker None not in ['speaker']

Additional context

Log

Minibatch step 51/478: mean batch inertia: 65.0876693725586, ewa inertia: 65.07667463195396
Minibatch step 52/478: mean batch inertia: 64.66484069824219, ewa inertia: 64.9390382259093
Minibatch step 53/478: mean batch inertia: 65.48490142822266, ewa inertia: 65.12146770344484
Minibatch step 54/478: mean batch inertia: 64.38145446777344, ewa inertia: 64.87415257507425
Converged (lack of improvement in inertia) at step 54/478
Training clusters: 100% 1/1 [00:29<00:00, 29.71s/it]
Traceback (most recent call last):
  File "/usr/local/bin/svc", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/__main__.py", line 623, in train_cluster
    main(
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/cluster/train_cluster.py", line 86, in main
    torch.save(checkpoint, output_path)
  File "/usr/local/lib/python3.9/dist-packages/torch/serialization.py", line 422, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "/usr/local/lib/python3.9/dist-packages/torch/serialization.py", line 309, in _open_zipfile_writer
    return container(name_or_buffer)
  File "/usr/local/lib/python3.9/dist-packages/torch/serialization.py", line 287, in __init__
    super(_open_zipfile_writer_file, self).__init__(torch._C.PyTorchFileWriter(str(name)))
RuntimeError: File drive/MyDrive/so-vits-svc-fork/logs/44k cannot be opened.

[02:42:08] Version: 1.3.0
Usage: svc [OPTIONS] COMMAND [ARGS]...
Try 'svc -h' for help.

Error: No such command 'infer-cluster'.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-18-347b9ce145ed>](https://localhost:8080/#) in <module>
      1 ##@title Use trained model (with cluster)
      2 get_ipython().system('svc infer-cluster {NAME}.wav -r 0.1 -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -k drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt')
----> 3 display(Audio(f"{NAME}.out.wav", autoplay=True))

[/usr/local/lib/python3.9/dist-packages/IPython/lib/display.py](https://localhost:8080/#) in __init__(self, data, filename, url, embed, rate, autoplay, normalize, element_id)
    112     def __init__(self, data=None, filename=None, url=None, embed=None, rate=None, autoplay=False, normalize=True, *,
    113                  element_id=None):
--> 114         if filename is None and url is None and data is None:
    115             raise ValueError("No audio data found. Expecting filename, url, or data.")
    116         if embed is False and url is None:

ValueError: rate must be specified when data is a numpy array or list of audio samples.

Running Environment

Fri Mar 24 02:06:17 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

does the Realtime work on voice calls

AMD GPUs

Is it possible to train the model using an AMD GPU?

"keep_ckpts" not working

In the config file, the default "keep_ckpts": 3 does not work. When training, all checkpoints will be saved. Changing the value to any other number will not work as well.

Crash with '.DS_Store': Format not recognised.

I have solved this problem and I write this issue for who have the same question as me.
I'm an Non-english users and I use Macbook, I ran 'svc pre-resample' after put the file I want to use to train my model. the computer started a Preprocess and soon crashed. here is the report:

joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/librosa/core/audio.py", line 176, in load
y, sr_native = __soundfile_load(path, offset, duration, dtype)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/librosa/core/audio.py", line 209, in __soundfile_load
context = sf.SoundFile(path)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/soundfile.py", line 658, in init
self._file = self._open(file, mode_int, closefd)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/soundfile.py", line 1216, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening 'dataset_raw/**/.DS_Store': Format not recognised.
(I hide some information to protect my privacy)

And here is the solution:
run in turns:
cd dataset_raw/xx/.DS_Store (which means you go to the folder where you put the audio files)
find . -name '.DS_Store' -type f -delete (which means you delete the .DS_Store file)

I think this issues is caused by the program mistakenly identifying .DS_Dtore file as a part of audio files. I believe this would be a general issue for MacOS user. Hopefully the developers will fix it in time.

AlsaOpen error

Describe the bug
Audio does not play using svcg

To Reproduce
Steps to reproduce the behaviour:

svcg
select a .wav file under "input audio path"
press the "play" button

Additional context
Add any other context about the problem here.
The following error appears when trying to play:

Expression 'AlsaOpen( &alsaApi->baseHostApiRep, params, streamDir, &self->pcm )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 1904
Expression 'PaAlsaStreamComponent_Initialize( &self->playback, alsaApi, outParams, StreamDirection_Out, NULL != callback )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2175
Expression 'PaAlsaStream_Initialize( stream, alsaHostApi, inputParameters, outputParameters, sampleRate, framesPerBuffer, callback, streamFlags, userData )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2839

the .out.wav file is produced when clicking "infer" which can then be played normally.

tqdm raises an error with pebble in some environments (not all)

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Additional context
Add any other context about the problem here.

Inference Bug after upgrading from 0.3.0

Hi! The GUI was working perfectly until I upgraded from 0.3.0 to 0.6.2. Now anytime I try to start an inference, it errors out and closes the interface. Here is the error I get:

(colab0) C:\Users\Haven>svcg

[12:51:04] INFO [12:51:04] Event infer, values {'model_path': gui.py:285
'C:/Users/Haven/Desktop/UI_for_So-Vits-SVC-main/folderDumpText/offlinesovit/so-vits-svc
/logs/44k/G_192500.pth', 'model_path_browse':
'C:/Users/Haven/Desktop/UI_for_So-Vits-SVC-main/folderDumpText/offlinesovit/so-vits-svc
/logs/44k/G_192500.pth', 'config_path':
'C:/Users/Haven/Desktop/UI_for_So-Vits-SVC-main/folderDumpText/offlinesovit/so-vits-svc
/configs/config.json', 'config_path_browse':
'C:/Users/Haven/Desktop/UI_for_So-Vits-SVC-main/folderDumpText/offlinesovit/so-vits-svc
/configs/config.json', 'cluster_model_path': '', 'cluster_model_path_browse': '',
'speaker': 'sp', 'silence_threshold': -30.0, 'transpose': 0.0, 'auto_predict_f0': True,
'cluster_infer_ratio': 0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5,
'absolute_thresh': False, 'input_path': 'C:/Users/Haven/Desktop/aii.wav', 'Browse':
'C:/Users/Haven/Desktop/aii.wav', 'auto_play': True, 'crossfade_seconds': 0.1,
'block_seconds': 1.0, 'realtime_algorithm': '2 (Divide by speech)', 'input_device':
'Microsoft Sound Mapper - Input', 'output_device': 'Microsoft Sound Mapper - Output',
'use_gpu': True}
Traceback (most recent call last):
File "C:\Users\Haven\anaconda3\envs\colab0\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Haven\anaconda3\envs\colab0\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Haven\anaconda3\envs\colab0\Scripts\svcg.exe_main.py", line 7, in
File "C:\Users\Haven\anaconda3\envs\colab0\lib\site-packages\so_vits_svc_fork\gui.py", line 301, in main
from .inference_main import infer
File "C:\Users\Haven\anaconda3\envs\colab0\lib\site-packages\so_vits_svc_fork\inference_main.py", line 12, in
from .inference.infer_tool import RealtimeVC, RealtimeVC2, Svc
ImportError: cannot import name 'RealtimeVC' from 'so_vits_svc_fork.inference.infer_tool' (C:\Users\Haven\anaconda3\envs\colab0\lib\site-packages\so_vits_svc_fork\inference\infer_tool.py)

I've confirmed by pip uninstall and forcing the 0.3.0 version that the older version still works for me though!

cli() not called in main

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Additional context
Add any other context about the problem here.

fp16 save not working

Describe the bug
the model does not seem to save when "fp16_run": true

To Reproduce
Steps to reproduce the behavior:

Additional context
Add any other context about the problem here.

Prepare model for inference

Is your feature request related to a problem? Please describe.
The first time inference occurs takes the longest, where as the next inferences afterwards are faster. Since normally the first time is in the sounddevice callback, it's likely that audio will not be processed in time and will end up delayed.

Describe the solution you'd like
After loading the model, run an initial inference with some dummy data, perhaps torch.zeros of appropriate sizes.

Additional context
On my computer with a RTX 3050, the first time inference takes about 3 seconds to complete. Otherwise I get a Realtime coef of ~28

Change to "Default File" automatically if the "Infer" button is pressed and the preset is "Default VC".

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

sizes number mismatch

Describe the bug
Training fails at step 0

Error:
`           INFO     [17:01:47] Start training                                                               train.py:154
  0%|                                                                                        | 0/10000 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "d:\sw\miniconda3\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\sw\miniconda3\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "d:\sw\miniconda3\Scripts\svc.exe\__main__.py", line 7, in <module>
  File "d:\sw\miniconda3\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "d:\sw\miniconda3\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "d:\sw\miniconda3\lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "d:\sw\miniconda3\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "d:\sw\miniconda3\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "d:\sw\miniconda3\lib\site-packages\so_vits_svc_fork\__main__.py", line 96, in train
    train(config_path=config_path, model_path=model_path)
  File "d:\sw\miniconda3\lib\site-packages\so_vits_svc_fork\train.py", line 49, in train
    mp.spawn(
  File "d:\sw\miniconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "d:\sw\miniconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
    while not context.join():
  File "d:\sw\miniconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "d:\sw\miniconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
    fn(i, *args)
  File "d:\sw\miniconda3\lib\site-packages\so_vits_svc_fork\train.py", line 158, in run
    train_and_evaluate(
  File "d:\sw\miniconda3\lib\site-packages\so_vits_svc_fork\train.py", line 200, in train_and_evaluate
    for batch_idx, items in enumerate(train_loader):
  File "d:\sw\miniconda3\lib\site-packages\torch\utils\data\dataloader.py", line 634, in __next__
    data = self._next_data()
  File "d:\sw\miniconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data
    return self._process_data(data)
  File "d:\sw\miniconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data
    data.reraise()
  File "d:\sw\miniconda3\lib\site-packages\torch\_utils.py", line 644, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "d:\sw\miniconda3\lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "d:\sw\miniconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "d:\sw\miniconda3\lib\site-packages\so_vits_svc_fork\data_utils.py", line 145, in __call__
    spec_padded[i, :, : spec.size(1)] = spec
RuntimeError: expand(torch.FloatTensor{[2, 790, 844]}, size=[2, 790]): the number of sizes provided (2) must be greater or equal to the number of dimensions in the tensor (3)`

To Reproduce

svc train

Additional context
all previous steps ran smoothly

Definition of RTF is wrong in this repo

Describe the bug
https://arxiv.org/pdf/2210.15975.pdf
RTFs are addable and are inference time/real time.

Failed install

Describe the bug
Building wheels for collected packages: fairseq
Building wheel for fairseq (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for fairseq (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [1423 lines of output]

To Reproduce
Python version 3.10.6
Windows 10, running cmd as administrator

Some gui parameters not passed properly

From version 1.3.3 onwards, using old trained models will result in audio distortions.

I use the following settings:

[14:17:52] INFO     [14:17:52] Event infer, values {'model_path': 'C:/Users/User/Desktop/hapi/G_21600.pth',   gui.py:509
                    'model_path_browse': 'C:/Users/User/Desktop/hapi/G_21600.pth', 'config_path':
                    'C:/Users/User/Desktop/hapi/config.json', 'config_path_browse':
                    'C:/Users/User/Desktop/hapi/config.json', 'cluster_model_path':
                    'C:/Users/User/Desktop/hapi/kmeans.pt', 'cluster_model_path_browse':
                    'C:/Users/User/Desktop/hapi/kmeans.pt', 'speaker': 'hapiraw', 'silence_threshold': -35.0,
                    'transpose': 8.0, 'auto_predict_f0': False, 'f0_method': 'crepe', 'cluster_infer_ratio':
                    0.5, 'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5, 'absolute_thresh':
                    True, 'input_path': 'C:/Users/User/Desktop/oblivion/kai.wav', 'input_path_browse':
                    'C:/Users/User/Desktop/oblivion/kai.wav', 'auto_play': False, 'crossfade_seconds': 0.05,
                    'block_seconds': 0.35, 'additional_infer_before_seconds': 0.15,
                    'additional_infer_after_seconds': 0.1, 'realtime_algorithm': '1 (Divide constantly)',
                    'input_device': 'Microsoft Sound Mapper - Input (MME)', 'output_device': 'SteelSeries
                    Sonar - Media (Stee (MME)', 'passthrough_original': False, 'presets': 'Default VC (GPU,
                    GTX 1060)', 'preset_name': '', 'use_gpu': True}
[14:17:53] INFO     [14:17:53] current directory is C:\Users\User\Desktop\sovits\venv\Scripts  hubert_pretraining.py:116
           INFO     [14:17:53] HubertPretrainingTask Config {'_name': 'hubert_pretraining',    hubert_pretraining.py:117
                    'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir':
                    'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False,
                    'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000,
                    'min_sample_size': 32000, 'single_target': False, 'random_crop': True,
                    'pad_audio': False}
           INFO     [14:17:53] HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0,                 hubert.py:250
                    'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768,
                    'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu,
                    'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1,
                    'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1,
                    'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True,
                    'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 +
                    [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False,
                    'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection':
                    static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1,
                    'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static,
                    'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space':
                    1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995],
                    'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False,
                    'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '',
                    'pos_enc_type': 'abs', 'fp16': False}

I believe it has something to do with the commit that changed the defaults to dio?

Here are some samples to show the distortions. I used the exact same settings for both inferences:

1.3.5: https://voca.ro/1eR1lyRN585P

1.3.3: https://voca.ro/1kniA71h8KMa

As you can hear, 1.3.5 has some weird pitch shifts and vocals.

Using all RAM at once, then erroring

Describe the bug
When running svc pre-resample with the newest version (1.4.2) it now uses up all my RAM and then errors out with the following error

E:\Development\so-vits-svc-4.0\Robbie Rotten>svc pre-resample
[14:47:25] INFO     [14:47:25] Version: 1.4.2                                                             __main__.py:20Preprocessing: 100%|███████████████████████████████████████████████████████████████████| 36/36 [00:30<00:00,  1.19it/s]
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\joblib\externals\loky\process_executor.py", line 428, in _process_worker
    r = call_item()
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\joblib\externals\loky\process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\joblib\_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\joblib\parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\joblib\parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\so_vits_svc_fork\preprocess_resample.py", line 63, in _preprocess_one
    audio, _ = librosa.effects.trim(audio, top_db=20)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\librosa\util\decorators.py", line 88, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\librosa\effects.py", line 495, in trim
    non_silent = _signal_to_frame_nonsilent(
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\librosa\effects.py", line 440, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y, frame_length=frame_length, hop_length=hop_length)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\librosa\util\decorators.py", line 88, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\librosa\feature\spectral.py", line 955, in rms
    power = np.mean(np.abs(x) ** 2, axis=-2, keepdims=True)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 2.32 GiB for an array with shape (304586, 2048, 1) and data type float32
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\Scripts\svc.exe\__main__.py", line 7, in <module>
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\so_vits_svc_fork\__main__.py", line 420, in pre_resample
    preprocess_resample(
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\so_vits_svc_fork\preprocess_resample.py", line 111, in preprocess_resample
    Parallel(n_jobs=n_jobs)(
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\joblib\parallel.py", line 1098, in __call__
    self.retrieve()
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\joblib\parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\Dustin\AppData\Roaming\Python\Python310\site-packages\joblib\_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Program Files\Python310\lib\concurrent\futures\_base.py", line 451, in result
    return self.__get_result()
  File "C:\Program Files\Python310\lib\concurrent\futures\_base.py", line 403, in __get_result
    raise self._exception
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 2.32 GiB for an array with shape (304586, 2048, 1) and data type float32

To Reproduce
Run svc pre-resample with a bunch of files in a sub-folder of dataset/

Additional context
Rolled back to 1.4.1 and it works perfectly fine on those same clips, only used a bit of my RAM

Changing Model and Config path in GUI does not update Speaker name.

In the GUI, changing the Model Path and Config Path does not change the Speaker name. The name remains as the initial loaded model's name.

Stack Trace:

C:\Users\LXC PC\Desktop\sovits\venv\Scripts>svcg
[13:35:50] INFO     [13:35:50] Loaded config from C:/Users/LXC                                                gui.py:404
                    PC/Desktop/sovits/venv/Scripts/configs/44k/config.json
[13:36:01] INFO     [13:36:01] Event model_path, values {'model_path': 'C:/Users/LXC                          gui.py:432
                    PC/Desktop/sovits/venv/Scripts/logs/21/G_13600.pth', 'model_path_browse': 'C:/Users/LXC
                    PC/Desktop/sovits/venv/Scripts/logs/21/G_13600.pth', 'config_path': 'C:/Users/LXC
                    PC/Desktop/sovits/venv/Scripts/configs/44k/config.json', 'config_path_browse': '',
                    'cluster_model_path': '', 'cluster_model_path_browse': '', 'speaker': 'hapiraw',
                    'silence_threshold': -35.0, 'transpose': 12.0, 'auto_predict_f0': False, 'f0_method':
                    'dio', 'cluster_infer_ratio': 0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1,
                    'chunk_seconds': 0.5, 'absolute_thresh': True, 'input_path': '', 'input_path_browse': '',
                    'auto_play': False, 'crossfade_seconds': 0.05, 'block_seconds': 0.35,
                    'additional_infer_before_seconds': 0.15, 'additional_infer_after_seconds': 0.1,
                    'realtime_algorithm': '1 (Divide constantly)', 'input_device': 'Microsoft Sound Mapper -
                    Input', 'output_device': 'Microsoft Sound Mapper - Output', 'passthrough_original':
                    False, 'presets': 'Default VC (GPU, GTX 1060)', 'preset_name': '', 'use_gpu': True}
           INFO     [13:36:01] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000028C5EF5EC40> gui.py:438
                    to C:\Users\LXC PC\Desktop\sovits\venv\Scripts\logs\21
           INFO     [13:36:01] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000028C5EF5EDC0> gui.py:438
                    to C:\Users\LXC PC\Desktop\sovits\venv\Scripts\logs\21
           INFO     [13:36:01] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000028C5EF5EF40> gui.py:438
                    to C:\Users\LXC PC\Desktop\sovits\venv\Scripts\logs\21
           INFO     [13:36:01] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000028C5EF73A30> gui.py:438
                    to C:\Users\LXC PC\Desktop\sovits\venv\Scripts\logs\21
[13:36:09] INFO     [13:36:09] Event config_path, values {'model_path': 'C:/Users/LXC                         gui.py:432
                    PC/Desktop/sovits/venv/Scripts/logs/21/G_13600.pth', 'model_path_browse': 'C:/Users/LXC
                    PC/Desktop/sovits/venv/Scripts/logs/21/G_13600.pth', 'config_path': 'C:/Users/LXC
                    PC/Desktop/sovits/venv/Scripts/configs/21/config.json', 'config_path_browse':
                    'C:/Users/LXC PC/Desktop/sovits/venv/Scripts/configs/21/config.json',
                    'cluster_model_path': '', 'cluster_model_path_browse': '', 'speaker': 'hapiraw',
                    'silence_threshold': -35.0, 'transpose': 12.0, 'auto_predict_f0': False, 'f0_method':
                    'dio', 'cluster_infer_ratio': 0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1,
                    'chunk_seconds': 0.5, 'absolute_thresh': True, 'input_path': '', 'input_path_browse': '',
                    'auto_play': False, 'crossfade_seconds': 0.05, 'block_seconds': 0.35,
                    'additional_infer_before_seconds': 0.15, 'additional_infer_after_seconds': 0.1,
                    'realtime_algorithm': '1 (Divide constantly)', 'input_device': 'Microsoft Sound Mapper -
                    Input', 'output_device': 'Microsoft Sound Mapper - Output', 'passthrough_original':
                    False, 'presets': 'Default VC (GPU, GTX 1060)', 'preset_name': '', 'use_gpu': True}
           INFO     [13:36:09] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000028C5EF5EC40> gui.py:438
                    to C:\Users\LXC PC\Desktop\sovits\venv\Scripts\configs\21
           INFO     [13:36:09] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000028C5EF5EDC0> gui.py:438
                    to C:\Users\LXC PC\Desktop\sovits\venv\Scripts\configs\21
           INFO     [13:36:09] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000028C5EF5EF40> gui.py:438
                    to C:\Users\LXC PC\Desktop\sovits\venv\Scripts\configs\21
           INFO     [13:36:09] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000028C5EF73A30> gui.py:438
                    to C:\Users\LXC PC\Desktop\sovits\venv\Scripts\configs\21

Realtime voice conversion not working when "Use GPU" is ticked

Describe the bug
Whenever I start the live voice changer with "Use GPU" enabled, it doesn't load, and hangs with this in the console:
2023-03-22 02:21:56 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2023-03-22 02:21:56 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}

To Reproduce
Steps to reproduce the behavior: Load a model path and config path, tick "Use GPU", press "(Re)Start Voice Changer"

Additional context
I'm running the program on Windows 10 through anaconda3, and I have a GTX 1070 GPU. I've tested different models with the same results. Using CPU works. There's never any usage shown on my GPU, nor does the memory used rise when attempting to load using GPU.

colab doesn't seem to have created the so-vits-svc-fork directory

測試一下覺得您的fork很有趣
COLAB 似乎git clone 有錯誤,麻煩修改一下,謝謝

另外請問您有 Tsukuyomi-chan Mode PTH 文件可分享嗎? 想測試看看
不好意思中文您是否看得懂? 因為晚了我來不及使用翻譯功能,我很抱歉!

Will you guys update to include 4.0 (v2) models as well?

Just curious, because I trained models through 4.0 (v2), and I wanna use the gui in order to use CREPE.

Failed to load Model + Config on a fresh venv install

Created a fresh venv, ran:

pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu117
pip install -U so-vits-svc-fork

In the GUI, specified Model path and Config file to a pre-trained model.

Trying the infer/start real time inference gives the following error:

(venv) C:\Users\LXC PC\Desktop\testvits\venv\Scripts>svcg
[09:07:13] INFO     [09:07:13] Version: 1.3.0                                                             __main__.py:47
[09:07:19] INFO     [09:07:19] Event model_path, values {'model_path': 'C:/Users/LXC                          gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path': '',
                    'config_path_browse': '', 'cluster_model_path': '', 'cluster_model_path_browse': '',
                    'speaker': '', 'silence_threshold': -35.0, 'transpose': 12.0, 'auto_predict_f0': False,
                    'f0_method': 'dio', 'cluster_infer_ratio': 0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1,
                    'chunk_seconds': 0.5, 'absolute_thresh': True, 'input_path': '', 'input_path_browse': '',
                    'auto_play': False, 'crossfade_seconds': 0.05, 'block_seconds': 0.35,
                    'additional_infer_before_seconds': 0.15, 'additional_infer_after_seconds': 0.1,
                    'realtime_algorithm': '1 (Divide constantly)', 'input_device': 'Microsoft Sound Mapper -
                    Input', 'output_device': 'Primary Sound Driver', 'passthrough_original': False,
                    'presets': 'Default VC (GPU, GTX 1060)', 'preset_name': '', 'use_gpu': True}
           INFO     [09:07:19] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000020621B09760> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:19] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFC2430> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:19] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFC25B0> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:19] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFCA160> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
[09:07:22] INFO     [09:07:22] Event config_path, values {'model_path': 'C:/Users/LXC                         gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/config.json',
                    'config_path_browse': 'C:/Users/LXC
                    PC/Desktop/testvits/venv/Scripts/testhapi/config.json', 'cluster_model_path': '',
                    'cluster_model_path_browse': '', 'speaker': '', 'silence_threshold': -35.0, 'transpose':
                    12.0, 'auto_predict_f0': False, 'f0_method': 'dio', 'cluster_infer_ratio': 0.0,
                    'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5, 'absolute_thresh': True,
                    'input_path': '', 'input_path_browse': '', 'auto_play': False, 'crossfade_seconds': 0.05,
                    'block_seconds': 0.35, 'additional_infer_before_seconds': 0.15,
                    'additional_infer_after_seconds': 0.1, 'realtime_algorithm': '1 (Divide constantly)',
                    'input_device': 'Microsoft Sound Mapper - Input', 'output_device': 'Primary Sound
                    Driver', 'passthrough_original': False, 'presets': 'Default VC (GPU, GTX 1060)',
                    'preset_name': '', 'use_gpu': True}
           INFO     [09:07:22] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000020621B09760> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:22] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFC2430> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:22] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFC25B0> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:22] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFCA160> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
[09:07:50] INFO     [09:07:50] Event start_vc, values {'model_path': 'C:/Users/LXC                            gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/config.json',
                    'config_path_browse': 'C:/Users/LXC
                    PC/Desktop/testvits/venv/Scripts/testhapi/config.json', 'cluster_model_path': '',
                    'cluster_model_path_browse': '', 'speaker': 'hapiraw', 'silence_threshold': -35.0,
                    'transpose': 0.0, 'auto_predict_f0': False, 'f0_method': 'dio', 'cluster_infer_ratio':
                    0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5, 'absolute_thresh':
                    True, 'input_path': '', 'input_path_browse': '', 'auto_play': False, 'crossfade_seconds':
                    0.05, 'block_seconds': 0.35, 'additional_infer_before_seconds': 0.15,
                    'additional_infer_after_seconds': 0.1, 'realtime_algorithm': '1 (Divide constantly)',
                    'input_device': 'CABLE Output (VB-Audio Virtual ', 'output_device': 'Realtek HD Audio 2nd
                    output (Realtek(R) Audio)', 'passthrough_original': False, 'presets': 'Default VC (GPU,
                    GTX 1060)', 'preset_name': '', 'use_gpu': True}
[09:07:54] ERROR    [09:07:54] Error in realtime:                                                             gui.py:598
           ERROR    [09:07:54] [WinError 6] The handle is invalid                                             gui.py:602
                    pebble.common.RemoteTraceback: Traceback (most recent call last):
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\pebble\common.py", line
                    174, in process_execute
                        return function(*args, **kwargs)
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\inference_main.py", line 109,
                    in realtime
                        svc_model = Svc(
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\inference\infer_tool.py",
                    line 108, in __init__
                        self.hubert_model = utils.get_hubert_model().to(self.dev)
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 333, in
                    get_hubert_model
                        vec_path = ensure_hubert_model()
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 328, in
                    ensure_hubert_model
                        download_file(url, vec_path, desc="Downloading Hubert model")
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 295, in
                    download_file
                        with temppath.open("wb") as f, tqdm(
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 1095,
                    in __init__
                        self.refresh(lock_args=self.lock_args)
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 1344,
                    in refresh
                        self.display()
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 1492,
                    in display
                        self.sp(self.__str__() if msg is None else msg)
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 347,
                    in print_status
                        fp_write('\r' + s + (' ' * max(last_len[0] - len_s, 0)))
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 340,
                    in fp_write
                        fp.write(str(s))
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\utils.py", line 127,
                    in inner
                        return func(*args, **kwargs)
                    OSError: [WinError 6] The handle is invalid


                    The above exception was the direct cause of the following exception:

                    Traceback (most recent call last):
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\gui.py",
                    line 600, in main
                        future.result()
                      File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 439, in result
                        return self.__get_result()
                      File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 391, in
                    __get_result
                        raise self._exception
                    OSError: [WinError 6] The handle is invalid

TerminatedWorkerError on "svc pre-hubert"

Traceback (most recent call last):
  File "/home/ubuntu/so-vits-svc/venv/bin/svc", line 8, in <module>
    sys.exit(cli())
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/so_vits_svc_fork/__main__.py", line 523, in pre_hubert
    preprocess_hubert_f0(
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/so_vits_svc_fork/preprocess_hubert_f0.py", line 102, in preprocess_hubert_f0
    Parallel(n_jobs=n_jobs)(
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/joblib/parallel.py", line 1098, in __call__
    self.retrieve()
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/joblib/parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/ubuntu/so-vits-svc/venv/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

I'm trying to train a model using WSL on windows with ROCm for AMD GPUs.
I'm getting this error when running the svc pre-hubert command.
How can I limit the RAM used?

How to resume training from checkpoint?

Right now if the training gets interrupted, I have to start over from scratch. Is there a way to continue training from the latest checkpoint?

Input/Output Device List duplication

The list of devices in the Input/Output dropdown list seems to duplicate itself further down the list.

CI does not work due to lack of memory.

Support AMD GPUs on Windows

Is your feature request related to a problem? Please describe.
AMD GPUs not supported on Windows

Describe the solution you'd like
AMD GPUs not supported on Windows

Additional context

LOG.info not printing at all? (v1.3.2)

Describe the bug
LOG.info calls aren't printing anything to the console
Example LOG.info(f"====> Epoch: {epoch}, cost {durtaion} s")

To Reproduce
Running svc train with the specific model path should do the trick

Additional context
I can see LOG.warning calls, especially since I've added those as well, and I couldn't spot anything in regards to the log level being set to only warnings instead of info. I remember seeing info logs in the Google colab I've used though (Might've been with the code from the main repository though)

svcg infer only uses dio f0 prediction method

When using svcg to infer to an audio file, it seems to be only using dio and changing the f0 prediction method makes no difference.
I changed it to crepe by manually changing lines in .py scrips with f0_method: Literal["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"] = "dio" to "crepe".
Also I think it would be better if pitch was set at 0 and not 12 when you start svcg.

More F0 Inference Methods

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

Error when running?

Hi, I ran these in windows cmd:

pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu117
pip install so-vits-svc-fork

then I ran svcg

but it gave me: 'svcg' is not recognized as an internal or external command, operable program or batch file.

please advise on how to fix this?

Invalid Start Byte error

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "C:\Users\Star Guard 719\Documents\sovits\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
    fn(i, *args)
  File "C:\Users\Star Guard 719\Documents\sovits\venv\lib\site-packages\so_vits_svc_fork\train.py", line 77, in run
    train_dataset = TextAudioSpeakerLoader(hps.data.training_files, hps)
  File "C:\Users\Star Guard 719\Documents\sovits\venv\lib\site-packages\so_vits_svc_fork\data_utils.py", line 26, in __init__
    self.audiopaths = load_filepaths_and_text(audiopaths)
  File "C:\Users\Star Guard 719\Documents\sovits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 551, in load_filepaths_and_text
    filepaths_and_text = [line.strip().split(split) for line in f]
  File "C:\Users\Star Guard 719\Documents\sovits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 551, in <listcomp>
    filepaths_and_text = [line.strip().split(split) for line in f]
  File "C:\Program Files\Python310\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 370: invalid start byte

Getting this error when running svc train, went through all other steps without issue. Reinstalled Python and tried on both 3.8 and 3.10

how specify the anaconda venv when executing "svc train" ?

I confiure a anaconda venv for this project, but it seems that this program do not execute "svc train" with the venv i configured. Could I specify the anaconda venv ?

Do not use crepe by default because of its performance

svcg is not recognized as an internal or external command, operable program or batch file

Describe the bug

i get this error when pip install -U so-vits-svc-fork
also svcg dosnt work

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.35.32215\include\yvals.h(17): fatal error C1083: Cannot open include file: 'crtdbg.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.35.32215\bin\HostX86\x64\cl.exe' failed with exit code 2
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for fairseq
Failed to build fairseq
ERROR: Could not build wheels for fairseq, which is required to install pyproject.toml-based projects

(so-vits-fork-new) C:\Users\Genesis>svcg
'svcg' is not recognized as an internal or external command,
operable program or batch file.

(so-vits-fork-new) C:\Users\Genesis>

To Reproduce
Steps to reproduce the behavior:
conda activate so-vits-fork
pip install -U so-vits-svc-fork
svcg

Additional context
Add any other context about the problem here.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Repository problems

These problems occurred while renovating this repository. View logs.

WARN: Use matchDepNames instead of matchPackageNames
WARN: Detected empty commit - aborting git push

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

Edited/Blocked

These updates have been manually edited so Renovate will no longer make changes. To discard all commits and start over, click on a checkbox.

chore(deps): update pre-commit hook psf/black to v24

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

dockerfile

Dockerfile

pytorch/pytorch 2.0.1-cuda11.7-cudnn8-runtime

github-actions

.github/workflows/ci.yml

actions/checkout v3

actions/setup-python v5

pre-commit/action v3.0.1

actions/checkout v3

wagoid/commitlint-github-action v5.5.1

actions/checkout v3

actions/setup-python v5

snok/install-poetry v1.3.4

codecov/codecov-action v3

actions/checkout v3

relekang/python-semantic-release v7.34.6

.github/workflows/hacktoberfest.yml

browniebroke/hacktoberfest-labeler-action v2.3.0

.github/workflows/issue-manager.yml

tiangolo/issue-manager 0.5.0

.github/workflows/labels.yml

actions/checkout v3

actions/setup-python v5

.github/workflows/poetry-upgrade.yml

browniebroke/github-actions v1

pep621

pyproject.toml

poetry-core >=1.0.0

poetry

pyproject.toml

python >=3.8,<3.13

librosa *

numpy ^1.23

pyworld *

requests *

scipy *

sounddevice *

SoundFile *

tqdm *

praat-parselmouth *

onnx *

onnxsim *

onnxoptimizer *

torch *

torchaudio *

tensorboard *

rich *

tqdm-joblib *

cm-time >=0.1.2

pysimplegui >=4.6,<5

pebble >=5.0

torchcrepe >=0.0.17

lightning ^2.0.1

fastapi ==0.110.1

transformers ^4.28.1

matplotlib ^3.7.1

pre-commit >=3

pytest ^8.0.0

pytest-cov ^4.0.0

pipdeptree ^2.7.0

pip-licenses ^4.3.1

myst-parser >=0.16

sphinx >=4.0

sphinx-rtd-theme >=1.0

pre-commit

.pre-commit-config.yaml

commitizen-tools/commitizen v3.21.3

pre-commit/pre-commit-hooks v4.5.0

python-poetry/poetry 1.8.2

pre-commit/mirrors-prettier v3.1.0

asottile/pyupgrade v3.15.2

PyCQA/autoflake v2.3.1

PyCQA/isort 5.13.2

psf/black 23.12.1

codespell-project/codespell v2.2.6

PyCQA/flake8 6.1.0

srstevenson/nb-clean 3.2.0

Check this box to trigger a request for Renovate to run again on this repository

Add a link to a Youtube video about this repo

https://www.youtube.com/watch?v=tZn0lcGO5OQ

However, do not make it too noticeable.

Vary the length of speech used for HuBERT/f0 inference and main inference in realtime inference

Describe the solution you'd like
Vary the length of speech used for HuBERT/f0 inference and main inference in realtime inference

Additional context
Hubert RTF ~ 0.05

voicepaw / so-vits-svc-fork Goto Github PK

so-vits-svc-fork's Introduction

SoftVC VITS Singing Voice Conversion Fork

Features not available in the original repo

Installation

Option 1. One click easy installation

Option 2. Manual installation (using pipx, experimental)

1. Installing pipx

2. Installing so-vits-svc-fork

Option 3. Manual installation

Update

Usage

Inference

GUI

CLI

Notes

Training

Before training

Cloud

Local

Notes

Further help

External Links

Contributors ✨

Footnotes

so-vits-svc-fork's People

Contributors

Stargazers

Watchers

Forkers

so-vits-svc-fork's Issues

Describe the bug

To Reproduce

Issue of Training Cluster model

Issue of Use trained model (Use trained model (with cluster))

Additional context

Log

Running Environment

Repository problems

Rate-Limited

Edited/Blocked

Open

Detected dependencies

Recommend Projects

Recommend Topics

Recommend Org