rvc-project / retrieval-based-voice-conversion Goto Github PK
View Code? Open in Web Editor NEWin preparation...
License: MIT License
in preparation...
License: MIT License
Hi, is there any reason why it's restricted to python 3.11.2? Is there anything I could do so it could run in python 3.8, 3.9 or 3.10?
When I set the index rate to 0.75 or 0.7 (I haven't tried with other values) this message appears.
Traceback (most recent call last):
File "D:\Biblioteca\Documentos\RVC Project\Retrieval-based-Voice-Conversion-develop\rvc\modules\vc\pipeline.py", line 307, in pipeline
index = faiss.read_index(file_index)
File "C:\Users\Guilherme\AppData\Local\Programs\Python\Python310\lib\site-packages\faiss\swigfaiss_avx2.py", line 10409, in read_index
return _swigfaiss_avx2.read_index(*args)
TypeError: Wrong number or type of arguments for overloaded function 'read_index'.
Possible C/C++ prototypes are:
faiss::read_index(char const *,int)
faiss::read_index(char const *)
faiss::read_index(FILE *,int)
faiss::read_index(FILE *)
faiss::read_index(faiss::IOReader *,int)
faiss::read_index(faiss::IOReader *)
When I tried to run rvc infer -m {model.pth} -i {input.wav} -o {output.wav}
to convert my audio, I found that the background music in the audio would also be converted.
Hello! Wondering if this library will end up being the code without any of the gui stuff or other optional features? I was considering a fork of the GUI library just to support inferencing. But if a smaller repo is coming here maybe I shouldn't, and I can contribute to this.
Hi there,
I believe I almost have this all figured out and it's working great. One issue I'm having is that after infering one time using the API, memory usage stays very high (12.7gb out of 16), even though no processing is happening. This is happening on a macbook air m1 16gb. Is there a way to force rvc to release that memory usage after each api call? Thanks in advance!
When I try to make an inference by outputting the file name, I get this error.
Traceback (most recent call last):
File "/opt/conda/bin/rvc", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/rvc/utils/cli/cli.py", line 29, in main
cli()
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/rvc/utils/cli/handler/infer.py", line 130, in infer
wavfile.write(outputpath, tgt_sr, audio_opt)
File "/opt/conda/lib/python3.10/site-packages/scipy/io/wavfile.py", line 771, in write
dkind = data.dtype.kind
AttributeError: 'NoneType' object has no attribute 'dtype'
And when I put "-o" as a folder, it gives an error saying it is a directory
Sorry for the bad English, I used Google Translate
I have this error when using vc.get_vc("/home/alcoft/Descargas/Modelos RVC/I4.0 V4/NEKOTSUBA_BI_VOICEVOX.pth")
.
This is the traceback:
Traceback (most recent call last):
File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server_all.py", line 1, in <module>
import ai_server
File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server.py", line 721, in <module>
start_server()
File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server.py", line 701, in start_server
cb.LoadAllModels()
File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/chatbot_all.py", line 130, in LoadAllModels
rvc.LoadModel()
File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/Inference/RVC_inference.py", line 27, in LoadModel
vc.get_vc("/home/alcoft/Descargas/Modelos RVC/I4.0 V4/NEKOTSUBA_BI_VOICEVOX.pth")
File "/home/alcoft/Projects/Multilang/TAO_I4.0/I4.0_ENV/lib/python3.11/site-packages/rvc/modules/vc/modules.py", line 84, in get_vc
index = get_index_path_from_model(sid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alcoft/Projects/Multilang/TAO_I4.0/I4.0_ENV/lib/python3.11/site-packages/rvc/modules/vc/utils.py", line 12, in get_index_path_from_model
for root, _, files in os.walk(os.getenv("index_root"), topdown=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen os>", line 343, in walk
TypeError: expected str, bytes or os.PathLike object, not NoneType
My code is:
from pathlib import Path
from scipy.io import wavfile
from rvc.modules.vc.modules import VC
import os
import torch
import json
import ai_config as cfg
device: str = "cpu"
vc: VC = VC()
def LoadModel() -> None:
global device, vc
if (not cfg.current_data.prompt_order.__contains__("rvc")):
raise Exception("Model is not in 'prompt_order'.")
if (cfg.current_data.rvc_method != "rmvpe" and cfg.current_data.rvc_method != "pm" and cfg.current_data.rvc_method != "harvest" and cfg.current_data.rvc_method != "crepe"):
raise Exception("RVC method must be 'rmvpe', 'pm', 'harvest' or 'crepe'.")
if (len(cfg.current_data.rvc_model_path) == 0):
return
device = "cuda" if (torch.cuda.is_available() and cfg.current_data.use_gpu_if_available and cfg.current_data.move_to_gpu.count("rvc") > 0) else "cpu"
vc.config.device = device
vc.get_vc("/home/alcoft/Descargas/Modelos RVC/I4.0 V4/NEKOTSUBA_BI_VOICEVOX.pth")
def __make_rvc__(audio_name: str, protect: float = 0.33, filter_radius: int = 3) -> bytes:
LoadModel()
tgt_sr, audio_opt, times, _ = vc.vc_single(1, Path(audio_name), f0_method = cfg.current_data.rvc_method, index_file = Path(cfg.current_data.rvc_index_path), filter_radius = filter_radius, protect = protect)
output_file = "tmp_rvc_output_"
output_file_id = 0
output_file_path = output_file + str(output_file_id) + ".wav"
while (os.path.exists(output_file_path)):
output_file_id += 1
output_file_path = output_file + str(output_file_id) + ".wav"
wavfile.write(output_file_path, tgt_sr, audio_opt)
with open(output_file_path, "wb") as f:
audio_bytes = f.read()
f.close()
os.remove(output_file_path)
return audio_bytes
def MakeRVC(data: str | dict[str]) -> bytes:
if (type(data) == str):
try:
data = json.loads(data)
except Exception as ex:
raise Exception("[RVC] Data must be a dictionary or a JSON code. ERROR: " + str(ex))
ddata = {
"input": "",
"protect": 0.33,
"filter_radius": 3
}
try:
ddata["input"] = data["input"]
except:
raise Exception("Unable to get audio path.")
try:
ddata["protect"] = float(data["protect"])
except:
pass
try:
ddata["filter_radius"] = int(data["filter_radius"])
except:
pass
return __make_rvc__(ddata["input"], ddata["protect"], ddata["filter_radius"], ddata["method"])
My Python version is Python 3.11.6
and the model path is /home/alcoft/Descargas/Modelos RVC/I4.0 V4/NEKOTSUBA_BI_VOICEVOX.pth
.
Can someone help me fix this?
It turns out that the code in package is not the same as this repo.
This is the command I used to install he package:
pip install rvc
**Comparations Below: **
This is the function vc_infer
is used to do the infering.
But in the package the code looks like this. (vc_single instead of vc_infer)
This raises an AttributeError
for those who follow the guide in README as there is nothing called vc_infer
there.
I have an NVIDIA graphics card, but I don't want to use the GPU, I just want to use my CPU.
When I run my code, RVC automatically runs on my GPU.
How can I change the device?
Hi, posting this here since I encountered the issue on my end and managed to solve it.
When installing RVC as a python package, using pip install git+https://github.com/RVC-Project/Retrieval-based-Voice-Conversion
, the version of torch
that was installed was the CPU version instead of the CUDA version, meaning the process was slowed by at least 10x.
To fix, re-install torch using pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
for NVIDIA GPUs.
C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
...
Traceback (most recent call last):
File "C:\Users\jeje9\Desktop\rvc_test\rvc_test.py", line 35, in <module>
for i in result:
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\modules.py", line 77, in uvr_wrapper
AudioSegment.from_file(process_path).export(
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\pydub\audio_segment.py", line 963, in export
p = subprocess.Popen(conversion_command, stdin=devnull, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "C:\Users\jeje9\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\jeje9\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1456, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
If you have this error then fear not, it can be fixed pretty easily:
import os
import pydub
pydub.AudioSegment.converter = os.getcwd() + "\\ffmpeg.exe" # or any other path to ffmpeg, as long as it is absolute and not relative.
Is this available in this version of the project yet? if so any examples, trying to migrate from the old repo where it was working.
thanks,
I tried to follow the readme but the init is not implemented. Could you share the .env file?
Thanks
Hello again! I have one issue with UVR.
My code is:
from pathlib import Path
from scipy.io import wavfile
from rvc.modules.uvr5.vr import AudioPreprocess
import os
import sys
import platform
currentCWD = os.getcwd()
path = sys.prefix
system = platform.system().lower()
if (system == "windows"):
path = path + "\\Lib\\site-packages"
else:
pythonVersion = sys.version_info
path = path + "/lib/python" + str(pythonVersion[0]) + "." + str(pythonVersion[1]) + "/site-packages"
os.chdir(path)
os.environ["TEMP"] = currentCWD
os.environ["weight_uvr5_root"] = currentCWD + "/uvr_assets"
model_path: str = "9_HP2-UVR.pth"
audio_path: str = "audio.wav"
agg: int = 10
uvr: AudioPreprocess = AudioPreprocess(model_path, agg, False)
uvr.config.use_cuda()
uvr.model.to("cuda")
print("Model loaded!")
inst, vocals, sr, _ = uvr.process(music_file = currentCWD + "/" + audio_path)
os.chdir(currentCWD)
wavfile.write("vocals.wav", sr, vocals)
wavfile.write("inst.wav", sr, inst)
print("Done!")
The output is:
Model loaded!
0%| | 0/19 [00:00<?, ?it/s]/home/alcoft/Projects/Tests_I4.0/LibI4/Python_AI/.env/lib/python3.12/site-packages/torch/nn/modules/conv.py:456: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv2d(input, weight, bias, self.stride,
100%|███████████████████████████████████████████| 19/19 [00:01<00:00, 10.89it/s]
Done!
And the audio output (both vocal and instrumental) are the same as the input.
Running python -c "import torch; print(torch.backends.cudnn.is_available())"
prints True
.
Also, when trying to use the CPU for the inference the code freezes here:
Model loaded!
0%| | 0/19 [00:00<?, ?it/s]
When this happens, the code does not use my CPU at all. The code does not print any error message.
My CPU it's not a very good CPU, but it should be enough for inference.
My GPU is a NVIDIA RTX 3050 and my OS is Arch Linux.
I have cuda, cudnn and nvidia drivers installed on my OS.
My Python version is Python 3.12.3
The UVR model I'm using is 9_HP2-UVR.pth.
If the problem is related with the UVR model I'm using, please recommend one that works.
I was able to get both the API and cli options working on a silicon macbook air m1 - very cool! However, the api seems exceptionally slow and asked for a lot of permissions for data and etc. (through vs code). Is this what you would expect? Inference with vocals on a single song took approx 2:30 with cli and 5:30 through the api. The API hangs on:
DEBUG:rvc.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109
for over three minutes and 30 seconds before proceeding so I think that's where the time is lost. Is there anyway to speed this up?
Thank you in advance!
Getting this error message when using UVR-De-Echo-Aggressive.pth
or UVR-De-Echo-Normal.pth
with UVR.uvr_wrapper()
Traceback (most recent call last):
File "C:\Users\jeje9\Desktop\rvc_test\rvc_test.py", line 62, in <module>
for item in generator:
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\modules.py", line 49, in uvr_wrapper
pre_fun = func(
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\vr.py", line 34, in __init__
model.load_state_dict(cpk)
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CascadedASPPNet:
Missing key(s) in state_dict: "stg1_low_band_net.enc1.conv1.conv.0.weight", "stg1_low_band_net.enc1.conv1.conv.1.weight", "stg1_low_band_net.enc1.conv1.conv.1.bias", "stg1_low_band_net.enc1.conv1.conv.1.running_mean", "stg1_low_band_net.enc1.conv1.conv.1.running_var", "stg1_low_band_net.enc1.conv2.conv.0.weight", "stg1_low_band_net.enc1.conv2.conv.1.weight", "stg1_low_band_net.enc1.conv2.conv.1.bias", "stg1_low_band_net.enc1.conv2.conv.1.running_mean", "stg1_low_band_net.enc1.conv2.conv.1.running_var", "stg1_low_band_net.enc2.conv1.conv.0.weight", "stg1_low_band_net.enc2.conv1.conv.1.weight", "stg1_low_band_net.enc2.conv1.conv.1.bias", "stg1_low_band_net.enc2.conv1.conv.1.running_mean", "stg1_low_band_net.enc2.conv1.conv.1.running_var", "stg1_low_band_net.enc2.conv2.conv.0.weight", "stg1_low_band_net.enc2.conv2.conv.1.weight", "stg1_low_band_net.enc2.conv2.conv.1.bias", "stg1_low_band_net.enc2.conv2.conv.1.running_mean", "stg1_low_band_net.enc2.conv2.conv.1.running_var", "stg1_low_band_net.enc3.conv1.conv.0.weight", "stg1_low_band_net.enc3.conv1.conv.1.weight", "stg1_low_band_net.enc3.conv1.conv.1.bias", "stg1_low_band_net.enc3.conv1.conv.1.running_mean", "stg1_low_band_net.enc3.conv1.conv.1.running_var", "stg1_low_band_net.enc3.conv2.conv.0.weight", "stg1_low_band_net.enc3.conv2.conv.1.weight", "stg1_low_band_net.enc3.conv2.conv.1.bias", "stg1_low_band_net.enc3.conv2.conv.1.running_mean", "stg1_low_band_net.enc3.conv2.conv.1.running_var", "stg1_low_band_net.enc4.conv1.conv.0.weight", "stg1_low_band_net.enc4.conv1.conv.1.weight", "stg1_low_band_net.enc4.conv1.conv.1.bias", "stg1_low_band_net.enc4.conv1.conv.1.running_mean", "stg1_low_band_net.enc4.conv1.conv.1.running_var", "stg1_low_band_net.enc4.conv2.conv.0.weight", "stg1_low_band_net.enc4.conv2.conv.1.weight", "stg1_low_band_net.enc4.conv2.conv.1.bias", "stg1_low_band_net.enc4.conv2.conv.1.running_mean", "stg1_low_band_net.enc4.conv2.conv.1.running_var", "stg1_low_band_net.aspp.conv1.1.conv.0.weight", "stg1_low_band_net.aspp.conv1.1.conv.1.weight", "stg1_low_band_net.aspp.conv1.1.conv.1.bias", "stg1_low_band_net.aspp.conv1.1.conv.1.running_mean", "stg1_low_band_net.aspp.conv1.1.conv.1.running_var", "stg1_low_band_net.aspp.conv2.conv.0.weight", "stg1_low_band_net.aspp.conv2.conv.1.weight", "stg1_low_band_net.aspp.conv2.conv.1.bias", "stg1_low_band_net.aspp.conv2.conv.1.running_mean", "stg1_low_band_net.aspp.conv2.conv.1.running_var", "stg1_low_band_net.aspp.conv3.conv.0.weight", "stg1_low_band_net.aspp.conv3.conv.1.weight", "stg1_low_band_net.aspp.conv3.conv.2.weight", "stg1_low_band_net.aspp.conv3.conv.2.bias", "stg1_low_band_net.aspp.conv3.conv.2.running_mean", "stg1_low_band_net.aspp.conv3.conv.2.running_var", "stg1_low_band_net.aspp.conv4.conv.0.weight", "stg1_low_band_net.aspp.conv4.conv.1.weight", "stg1_low_band_net.aspp.conv4.conv.2.weight", "stg1_low_band_net.aspp.conv4.conv.2.bias", "stg1_low_band_net.aspp.conv4.conv.2.running_mean", "stg1_low_band_net.aspp.conv4.conv.2.running_var", "stg1_low_band_net.aspp.conv5.conv.0.weight", "stg1_low_band_net.aspp.conv5.conv.1.weight", "stg1_low_band_net.aspp.conv5.conv.2.weight", "stg1_low_band_net.aspp.conv5.conv.2.bias", "stg1_low_band_net.aspp.conv5.conv.2.running_mean", "stg1_low_band_net.aspp.conv5.conv.2.running_var", "stg1_low_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_low_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_low_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_low_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_low_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_low_band_net.dec4.conv.conv.0.weight", "stg1_low_band_net.dec4.conv.conv.1.weight", "stg1_low_band_net.dec4.conv.conv.1.bias", "stg1_low_band_net.dec4.conv.conv.1.running_mean", "stg1_low_band_net.dec4.conv.conv.1.running_var", "stg1_low_band_net.dec3.conv.conv.0.weight", "stg1_low_band_net.dec3.conv.conv.1.weight", "stg1_low_band_net.dec3.conv.conv.1.bias", "stg1_low_band_net.dec3.conv.conv.1.running_mean", "stg1_low_band_net.dec3.conv.conv.1.running_var", "stg1_low_band_net.dec2.conv.conv.0.weight", "stg1_low_band_net.dec2.conv.conv.1.weight", "stg1_low_band_net.dec2.conv.conv.1.bias", "stg1_low_band_net.dec2.conv.conv.1.running_mean", "stg1_low_band_net.dec2.conv.conv.1.running_var", "stg1_low_band_net.dec1.conv.conv.0.weight", "stg1_low_band_net.dec1.conv.conv.1.weight", "stg1_low_band_net.dec1.conv.conv.1.bias", "stg1_low_band_net.dec1.conv.conv.1.running_mean", "stg1_low_band_net.dec1.conv.conv.1.running_var", "stg1_high_band_net.enc1.conv1.conv.0.weight", "stg1_high_band_net.enc1.conv1.conv.1.weight", "stg1_high_band_net.enc1.conv1.conv.1.bias", "stg1_high_band_net.enc1.conv1.conv.1.running_mean", "stg1_high_band_net.enc1.conv1.conv.1.running_var", "stg1_high_band_net.enc1.conv2.conv.0.weight", "stg1_high_band_net.enc1.conv2.conv.1.weight", "stg1_high_band_net.enc1.conv2.conv.1.bias", "stg1_high_band_net.enc1.conv2.conv.1.running_mean", "stg1_high_band_net.enc1.conv2.conv.1.running_var", "stg1_high_band_net.aspp.conv3.conv.2.weight", "stg1_high_band_net.aspp.conv3.conv.2.bias", "stg1_high_band_net.aspp.conv3.conv.2.running_mean", "stg1_high_band_net.aspp.conv3.conv.2.running_var", "stg1_high_band_net.aspp.conv4.conv.2.weight", "stg1_high_band_net.aspp.conv4.conv.2.bias", "stg1_high_band_net.aspp.conv4.conv.2.running_mean", "stg1_high_band_net.aspp.conv4.conv.2.running_var", "stg1_high_band_net.aspp.conv5.conv.2.weight", "stg1_high_band_net.aspp.conv5.conv.2.bias", "stg1_high_band_net.aspp.conv5.conv.2.running_mean", "stg1_high_band_net.aspp.conv5.conv.2.running_var", "stg1_high_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_high_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_high_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_high_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_high_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_high_band_net.dec4.conv.conv.0.weight", "stg1_high_band_net.dec4.conv.conv.1.weight", "stg1_high_band_net.dec4.conv.conv.1.bias", "stg1_high_band_net.dec4.conv.conv.1.running_mean", "stg1_high_band_net.dec4.conv.conv.1.running_var", "stg1_high_band_net.dec3.conv.conv.0.weight", "stg1_high_band_net.dec3.conv.conv.1.weight", "stg1_high_band_net.dec3.conv.conv.1.bias", "stg1_high_band_net.dec3.conv.conv.1.running_mean", "stg1_high_band_net.dec3.conv.conv.1.running_var", "stg1_high_band_net.dec2.conv.conv.0.weight", "stg1_high_band_net.dec2.conv.conv.1.weight", "stg1_high_band_net.dec2.conv.conv.1.bias", "stg1_high_band_net.dec2.conv.conv.1.running_mean", "stg1_high_band_net.dec2.conv.conv.1.running_var", "stg1_high_band_net.dec1.conv.conv.0.weight", "stg1_high_band_net.dec1.conv.conv.1.weight", "stg1_high_band_net.dec1.conv.conv.1.bias", "stg1_high_band_net.dec1.conv.conv.1.running_mean", "stg1_high_band_net.dec1.conv.conv.1.running_var", "stg2_bridge.conv.0.weight", "stg2_bridge.conv.1.weight", "stg2_bridge.conv.1.bias", "stg2_bridge.conv.1.running_mean", "stg2_bridge.conv.1.running_var", "stg2_full_band_net.enc1.conv1.conv.0.weight", "stg2_full_band_net.enc1.conv1.conv.1.weight", "stg2_full_band_net.enc1.conv1.conv.1.bias", "stg2_full_band_net.enc1.conv1.conv.1.running_mean", "stg2_full_band_net.enc1.conv1.conv.1.running_var", "stg2_full_band_net.enc1.conv2.conv.0.weight", "stg2_full_band_net.enc1.conv2.conv.1.weight", "stg2_full_band_net.enc1.conv2.conv.1.bias", "stg2_full_band_net.enc1.conv2.conv.1.running_mean", "stg2_full_band_net.enc1.conv2.conv.1.running_var", "stg2_full_band_net.enc2.conv1.conv.0.weight", "stg2_full_band_net.enc2.conv1.conv.1.weight", "stg2_full_band_net.enc2.conv1.conv.1.bias", "stg2_full_band_net.enc2.conv1.conv.1.running_mean", "stg2_full_band_net.enc2.conv1.conv.1.running_var", "stg2_full_band_net.enc2.conv2.conv.0.weight", "stg2_full_band_net.enc2.conv2.conv.1.weight", "stg2_full_band_net.enc2.conv2.conv.1.bias", "stg2_full_band_net.enc2.conv2.conv.1.running_mean", "stg2_full_band_net.enc2.conv2.conv.1.running_var", "stg2_full_band_net.enc3.conv1.conv.0.weight", "stg2_full_band_net.enc3.conv1.conv.1.weight", "stg2_full_band_net.enc3.conv1.conv.1.bias", "stg2_full_band_net.enc3.conv1.conv.1.running_mean", "stg2_full_band_net.enc3.conv1.conv.1.running_var", "stg2_full_band_net.enc3.conv2.conv.0.weight", "stg2_full_band_net.enc3.conv2.conv.1.weight", "stg2_full_band_net.enc3.conv2.conv.1.bias", "stg2_full_band_net.enc3.conv2.conv.1.running_mean", "stg2_full_band_net.enc3.conv2.conv.1.running_var", "stg2_full_band_net.enc4.conv1.conv.0.weight", "stg2_full_band_net.enc4.conv1.conv.1.weight", "stg2_full_band_net.enc4.conv1.conv.1.bias", "stg2_full_band_net.enc4.conv1.conv.1.running_mean", "stg2_full_band_net.enc4.conv1.conv.1.running_var", "stg2_full_band_net.enc4.conv2.conv.0.weight", "stg2_full_band_net.enc4.conv2.conv.1.weight", "stg2_full_band_net.enc4.conv2.conv.1.bias", "stg2_full_band_net.enc4.conv2.conv.1.running_mean", "stg2_full_band_net.enc4.conv2.conv.1.running_var", "stg2_full_band_net.aspp.conv1.1.conv.0.weight", "stg2_full_band_net.aspp.conv1.1.conv.1.weight", "stg2_full_band_net.aspp.conv1.1.conv.1.bias", "stg2_full_band_net.aspp.conv1.1.conv.1.running_mean", "stg2_full_band_net.aspp.conv1.1.conv.1.running_var", "stg2_full_band_net.aspp.conv2.conv.0.weight", "stg2_full_band_net.aspp.conv2.conv.1.weight", "stg2_full_band_net.aspp.conv2.conv.1.bias", "stg2_full_band_net.aspp.conv2.conv.1.running_mean", "stg2_full_band_net.aspp.conv2.conv.1.running_var", "stg2_full_band_net.aspp.conv3.conv.0.weight", "stg2_full_band_net.aspp.conv3.conv.1.weight", "stg2_full_band_net.aspp.conv3.conv.2.weight", "stg2_full_band_net.aspp.conv3.conv.2.bias", "stg2_full_band_net.aspp.conv3.conv.2.running_mean", "stg2_full_band_net.aspp.conv3.conv.2.running_var", "stg2_full_band_net.aspp.conv4.conv.0.weight", "stg2_full_band_net.aspp.conv4.conv.1.weight", "stg2_full_band_net.aspp.conv4.conv.2.weight", "stg2_full_band_net.aspp.conv4.conv.2.bias", "stg2_full_band_net.aspp.conv4.conv.2.running_mean", "stg2_full_band_net.aspp.conv4.conv.2.running_var", "stg2_full_band_net.aspp.conv5.conv.0.weight", "stg2_full_band_net.aspp.conv5.conv.1.weight", "stg2_full_band_net.aspp.conv5.conv.2.weight", "stg2_full_band_net.aspp.conv5.conv.2.bias", "stg2_full_band_net.aspp.conv5.conv.2.running_mean", "stg2_full_band_net.aspp.conv5.conv.2.running_var", "stg2_full_band_net.aspp.bottleneck.0.conv.0.weight", "stg2_full_band_net.aspp.bottleneck.0.conv.1.weight", "stg2_full_band_net.aspp.bottleneck.0.conv.1.bias", "stg2_full_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg2_full_band_net.aspp.bottleneck.0.conv.1.running_var", "stg2_full_band_net.dec4.conv.conv.0.weight", "stg2_full_band_net.dec4.conv.conv.1.weight", "stg2_full_band_net.dec4.conv.conv.1.bias", "stg2_full_band_net.dec4.conv.conv.1.running_mean", "stg2_full_band_net.dec4.conv.conv.1.running_var", "stg2_full_band_net.dec3.conv.conv.0.weight", "stg2_full_band_net.dec3.conv.conv.1.weight", "stg2_full_band_net.dec3.conv.conv.1.bias", "stg2_full_band_net.dec3.conv.conv.1.running_mean", "stg2_full_band_net.dec3.conv.conv.1.running_var", "stg2_full_band_net.dec2.conv.conv.0.weight", "stg2_full_band_net.dec2.conv.conv.1.weight", "stg2_full_band_net.dec2.conv.conv.1.bias", "stg2_full_band_net.dec2.conv.conv.1.running_mean", "stg2_full_band_net.dec2.conv.conv.1.running_var", "stg2_full_band_net.dec1.conv.conv.0.weight", "stg2_full_band_net.dec1.conv.conv.1.weight", "stg2_full_band_net.dec1.conv.conv.1.bias", "stg2_full_band_net.dec1.conv.conv.1.running_mean", "stg2_full_band_net.dec1.conv.conv.1.running_var", "stg3_bridge.conv.0.weight", "stg3_bridge.conv.1.weight", "stg3_bridge.conv.1.bias", "stg3_bridge.conv.1.running_mean", "stg3_bridge.conv.1.running_var", "stg3_full_band_net.enc1.conv1.conv.0.weight", "stg3_full_band_net.enc1.conv1.conv.1.weight", "stg3_full_band_net.enc1.conv1.conv.1.bias", "stg3_full_band_net.enc1.conv1.conv.1.running_mean", "stg3_full_band_net.enc1.conv1.conv.1.running_var", "stg3_full_band_net.enc1.conv2.conv.0.weight", "stg3_full_band_net.enc1.conv2.conv.1.weight", "stg3_full_band_net.enc1.conv2.conv.1.bias", "stg3_full_band_net.enc1.conv2.conv.1.running_mean", "stg3_full_band_net.enc1.conv2.conv.1.running_var", "stg3_full_band_net.aspp.conv3.conv.2.weight", "stg3_full_band_net.aspp.conv3.conv.2.bias", "stg3_full_band_net.aspp.conv3.conv.2.running_mean", "stg3_full_band_net.aspp.conv3.conv.2.running_var", "stg3_full_band_net.aspp.conv4.conv.2.weight", "stg3_full_band_net.aspp.conv4.conv.2.bias", "stg3_full_band_net.aspp.conv4.conv.2.running_mean", "stg3_full_band_net.aspp.conv4.conv.2.running_var", "stg3_full_band_net.aspp.conv5.conv.2.weight", "stg3_full_band_net.aspp.conv5.conv.2.bias", "stg3_full_band_net.aspp.conv5.conv.2.running_mean", "stg3_full_band_net.aspp.conv5.conv.2.running_var", "stg3_full_band_net.aspp.bottleneck.0.conv.0.weight", "stg3_full_band_net.aspp.bottleneck.0.conv.1.weight", "stg3_full_band_net.aspp.bottleneck.0.conv.1.bias", "stg3_full_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg3_full_band_net.aspp.bottleneck.0.conv.1.running_var", "stg3_full_band_net.dec4.conv.conv.0.weight", "stg3_full_band_net.dec4.conv.conv.1.weight", "stg3_full_band_net.dec4.conv.conv.1.bias", "stg3_full_band_net.dec4.conv.conv.1.running_mean", "stg3_full_band_net.dec4.conv.conv.1.running_var", "stg3_full_band_net.dec3.conv.conv.0.weight", "stg3_full_band_net.dec3.conv.conv.1.weight", "stg3_full_band_net.dec3.conv.conv.1.bias", "stg3_full_band_net.dec3.conv.conv.1.running_mean", "stg3_full_band_net.dec3.conv.conv.1.running_var", "stg3_full_band_net.dec2.conv.conv.0.weight", "stg3_full_band_net.dec2.conv.conv.1.weight", "stg3_full_band_net.dec2.conv.conv.1.bias", "stg3_full_band_net.dec2.conv.conv.1.running_mean", "stg3_full_band_net.dec2.conv.conv.1.running_var", "stg3_full_band_net.dec1.conv.conv.0.weight", "stg3_full_band_net.dec1.conv.conv.1.weight", "stg3_full_band_net.dec1.conv.conv.1.bias", "stg3_full_band_net.dec1.conv.conv.1.running_mean", "stg3_full_band_net.dec1.conv.conv.1.running_var", "aux1_out.weight", "aux2_out.weight".
Unexpected key(s) in state_dict: "stg2_low_band_net.0.enc1.conv.0.weight", "stg2_low_band_net.0.enc1.conv.1.weight", "stg2_low_band_net.0.enc1.conv.1.bias", "stg2_low_band_net.0.enc1.conv.1.running_mean", "stg2_low_band_net.0.enc1.conv.1.running_var", "stg2_low_band_net.0.enc1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc2.conv1.conv.0.weight", "stg2_low_band_net.0.enc2.conv1.conv.1.weight", "stg2_low_band_net.0.enc2.conv1.conv.1.bias", "stg2_low_band_net.0.enc2.conv1.conv.1.running_mean", "stg2_low_band_net.0.enc2.conv1.conv.1.running_var", "stg2_low_band_net.0.enc2.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc2.conv2.conv.0.weight", "stg2_low_band_net.0.enc2.conv2.conv.1.weight", "stg2_low_band_net.0.enc2.conv2.conv.1.bias", "stg2_low_band_net.0.enc2.conv2.conv.1.running_mean", "stg2_low_band_net.0.enc2.conv2.conv.1.running_var", "stg2_low_band_net.0.enc2.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc3.conv1.conv.0.weight", "stg2_low_band_net.0.enc3.conv1.conv.1.weight", "stg2_low_band_net.0.enc3.conv1.conv.1.bias", "stg2_low_band_net.0.enc3.conv1.conv.1.running_mean", "stg2_low_band_net.0.enc3.conv1.conv.1.running_var", "stg2_low_band_net.0.enc3.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc3.conv2.conv.0.weight", "stg2_low_band_net.0.enc3.conv2.conv.1.weight", "stg2_low_band_net.0.enc3.conv2.conv.1.bias", "stg2_low_band_net.0.enc3.conv2.conv.1.running_mean", "stg2_low_band_net.0.enc3.conv2.conv.1.running_var", "stg2_low_band_net.0.enc3.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc4.conv1.conv.0.weight", "stg2_low_band_net.0.enc4.conv1.conv.1.weight", "stg2_low_band_net.0.enc4.conv1.conv.1.bias", "stg2_low_band_net.0.enc4.conv1.conv.1.running_mean", "stg2_low_band_net.0.enc4.conv1.conv.1.running_var", "stg2_low_band_net.0.enc4.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc4.conv2.conv.0.weight", "stg2_low_band_net.0.enc4.conv2.conv.1.weight", "stg2_low_band_net.0.enc4.conv2.conv.1.bias", "stg2_low_band_net.0.enc4.conv2.conv.1.running_mean", "stg2_low_band_net.0.enc4.conv2.conv.1.running_var", "stg2_low_band_net.0.enc4.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc5.conv1.conv.0.weight", "stg2_low_band_net.0.enc5.conv1.conv.1.weight", "stg2_low_band_net.0.enc5.conv1.conv.1.bias", "stg2_low_band_net.0.enc5.conv1.conv.1.running_mean", "stg2_low_band_net.0.enc5.conv1.conv.1.running_var", "stg2_low_band_net.0.enc5.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc5.conv2.conv.0.weight", "stg2_low_band_net.0.enc5.conv2.conv.1.weight", "stg2_low_band_net.0.enc5.conv2.conv.1.bias", "stg2_low_band_net.0.enc5.conv2.conv.1.running_mean", "stg2_low_band_net.0.enc5.conv2.conv.1.running_var", "stg2_low_band_net.0.enc5.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv1.1.conv.0.weight", "stg2_low_band_net.0.aspp.conv1.1.conv.1.weight", "stg2_low_band_net.0.aspp.conv1.1.conv.1.bias", "stg2_low_band_net.0.aspp.conv1.1.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv1.1.conv.1.running_var", "stg2_low_band_net.0.aspp.conv1.1.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv2.conv.0.weight", "stg2_low_band_net.0.aspp.conv2.conv.1.weight", "stg2_low_band_net.0.aspp.conv2.conv.1.bias", "stg2_low_band_net.0.aspp.conv2.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv2.conv.1.running_var", "stg2_low_band_net.0.aspp.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv3.conv.0.weight", "stg2_low_band_net.0.aspp.conv3.conv.1.weight", "stg2_low_band_net.0.aspp.conv3.conv.1.bias", "stg2_low_band_net.0.aspp.conv3.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv3.conv.1.running_var", "stg2_low_band_net.0.aspp.conv3.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv4.conv.0.weight", "stg2_low_band_net.0.aspp.conv4.conv.1.weight", "stg2_low_band_net.0.aspp.conv4.conv.1.bias", "stg2_low_band_net.0.aspp.conv4.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv4.conv.1.running_var", "stg2_low_band_net.0.aspp.conv4.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv5.conv.0.weight", "stg2_low_band_net.0.aspp.conv5.conv.1.weight", "stg2_low_band_net.0.aspp.conv5.conv.1.bias", "stg2_low_band_net.0.aspp.conv5.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv5.conv.1.running_var", "stg2_low_band_net.0.aspp.conv5.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.bottleneck.conv.0.weight", "stg2_low_band_net.0.aspp.bottleneck.conv.1.weight", "stg2_low_band_net.0.aspp.bottleneck.conv.1.bias", "stg2_low_band_net.0.aspp.bottleneck.conv.1.running_mean", "stg2_low_band_net.0.aspp.bottleneck.conv.1.running_var", "stg2_low_band_net.0.aspp.bottleneck.conv.1.num_batches_tracked", "stg2_low_band_net.0.dec4.conv1.conv.0.weight", "stg2_low_band_net.0.dec4.conv1.conv.1.weight", "stg2_low_band_net.0.dec4.conv1.conv.1.bias", "stg2_low_band_net.0.dec4.conv1.conv.1.running_mean", "stg2_low_band_net.0.dec4.conv1.conv.1.running_var", "stg2_low_band_net.0.dec4.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.dec3.conv1.conv.0.weight", "stg2_low_band_net.0.dec3.conv1.conv.1.weight", "stg2_low_band_net.0.dec3.conv1.conv.1.bias", "stg2_low_band_net.0.dec3.conv1.conv.1.running_mean", "stg2_low_band_net.0.dec3.conv1.conv.1.running_var", "stg2_low_band_net.0.dec3.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.dec2.conv1.conv.0.weight", "stg2_low_band_net.0.dec2.conv1.conv.1.weight", "stg2_low_band_net.0.dec2.conv1.conv.1.bias", "stg2_low_band_net.0.dec2.conv1.conv.1.running_mean", "stg2_low_band_net.0.dec2.conv1.conv.1.running_var", "stg2_low_band_net.0.dec2.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.lstm_dec2.conv.conv.0.weight", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.weight", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.bias", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.running_mean", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.running_var", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.num_batches_tracked", "stg2_low_band_net.0.lstm_dec2.lstm.weight_ih_l0", "stg2_low_band_net.0.lstm_dec2.lstm.weight_hh_l0", "stg2_low_band_net.0.lstm_dec2.lstm.bias_ih_l0", "stg2_low_band_net.0.lstm_dec2.lstm.bias_hh_l0", "stg2_low_band_net.0.lstm_dec2.lstm.weight_ih_l0_reverse", "stg2_low_band_net.0.lstm_dec2.lstm.weight_hh_l0_reverse", "stg2_low_band_net.0.lstm_dec2.lstm.bias_ih_l0_reverse", "stg2_low_band_net.0.lstm_dec2.lstm.bias_hh_l0_reverse", "stg2_low_band_net.0.lstm_dec2.dense.0.weight", "stg2_low_band_net.0.lstm_dec2.dense.0.bias", "stg2_low_band_net.0.lstm_dec2.dense.1.weight", "stg2_low_band_net.0.lstm_dec2.dense.1.bias", "stg2_low_band_net.0.lstm_dec2.dense.1.running_mean", "stg2_low_band_net.0.lstm_dec2.dense.1.running_var", "stg2_low_band_net.0.lstm_dec2.dense.1.num_batches_tracked", "stg2_low_band_net.0.dec1.conv1.conv.0.weight", "stg2_low_band_net.0.dec1.conv1.conv.1.weight", "stg2_low_band_net.0.dec1.conv1.conv.1.bias", "stg2_low_band_net.0.dec1.conv1.conv.1.running_mean", "stg2_low_band_net.0.dec1.conv1.conv.1.running_var", "stg2_low_band_net.0.dec1.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.1.conv.0.weight", "stg2_low_band_net.1.conv.1.weight", "stg2_low_band_net.1.conv.1.bias", "stg2_low_band_net.1.conv.1.running_mean", "stg2_low_band_net.1.conv.1.running_var", "stg2_low_band_net.1.conv.1.num_batches_tracked", "stg2_high_band_net.enc1.conv.0.weight", "stg2_high_band_net.enc1.conv.1.weight", "stg2_high_band_net.enc1.conv.1.bias", "stg2_high_band_net.enc1.conv.1.running_mean", "stg2_high_band_net.enc1.conv.1.running_var", "stg2_high_band_net.enc1.conv.1.num_batches_tracked", "stg2_high_band_net.enc2.conv1.conv.0.weight", "stg2_high_band_net.enc2.conv1.conv.1.weight", "stg2_high_band_net.enc2.conv1.conv.1.bias", "stg2_high_band_net.enc2.conv1.conv.1.running_mean", "stg2_high_band_net.enc2.conv1.conv.1.running_var", "stg2_high_band_net.enc2.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.enc2.conv2.conv.0.weight", "stg2_high_band_net.enc2.conv2.conv.1.weight", "stg2_high_band_net.enc2.conv2.conv.1.bias", "stg2_high_band_net.enc2.conv2.conv.1.running_mean", "stg2_high_band_net.enc2.conv2.conv.1.running_var", "stg2_high_band_net.enc2.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.enc3.conv1.conv.0.weight", "stg2_high_band_net.enc3.conv1.conv.1.weight", "stg2_high_band_net.enc3.conv1.conv.1.bias", "stg2_high_band_net.enc3.conv1.conv.1.running_mean", "stg2_high_band_net.enc3.conv1.conv.1.running_var", "stg2_high_band_net.enc3.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.enc3.conv2.conv.0.weight", "stg2_high_band_net.enc3.conv2.conv.1.weight", "stg2_high_band_net.enc3.conv2.conv.1.bias", "stg2_high_band_net.enc3.conv2.conv.1.running_mean", "stg2_high_band_net.enc3.conv2.conv.1.running_var", "stg2_high_band_net.enc3.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.enc4.conv1.conv.0.weight", "stg2_high_band_net.enc4.conv1.conv.1.weight", "stg2_high_band_net.enc4.conv1.conv.1.bias", "stg2_high_band_net.enc4.conv1.conv.1.running_mean", "stg2_high_band_net.enc4.conv1.conv.1.running_var", "stg2_high_band_net.enc4.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.enc4.conv2.conv.0.weight", "stg2_high_band_net.enc4.conv2.conv.1.weight", "stg2_high_band_net.enc4.conv2.conv.1.bias", "stg2_high_band_net.enc4.conv2.conv.1.running_mean", "stg2_high_band_net.enc4.conv2.conv.1.running_var", "stg2_high_band_net.enc4.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.enc5.conv1.conv.0.weight", "stg2_high_band_net.enc5.conv1.conv.1.weight", "stg2_high_band_net.enc5.conv1.conv.1.bias", "stg2_high_band_net.enc5.conv1.conv.1.running_mean", "stg2_high_band_net.enc5.conv1.conv.1.running_var", "stg2_high_band_net.enc5.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.enc5.conv2.conv.0.weight", "stg2_high_band_net.enc5.conv2.conv.1.weight", "stg2_high_band_net.enc5.conv2.conv.1.bias", "stg2_high_band_net.enc5.conv2.conv.1.running_mean", "stg2_high_band_net.enc5.conv2.conv.1.running_var", "stg2_high_band_net.enc5.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv1.1.conv.0.weight", "stg2_high_band_net.aspp.conv1.1.conv.1.weight", "stg2_high_band_net.aspp.conv1.1.conv.1.bias", "stg2_high_band_net.aspp.conv1.1.conv.1.running_mean", "stg2_high_band_net.aspp.conv1.1.conv.1.running_var", "stg2_high_band_net.aspp.conv1.1.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv2.conv.0.weight", "stg2_high_band_net.aspp.conv2.conv.1.weight", "stg2_high_band_net.aspp.conv2.conv.1.bias", "stg2_high_band_net.aspp.conv2.conv.1.running_mean", "stg2_high_band_net.aspp.conv2.conv.1.running_var", "stg2_high_band_net.aspp.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv3.conv.0.weight", "stg2_high_band_net.aspp.conv3.conv.1.weight", "stg2_high_band_net.aspp.conv3.conv.1.bias", "stg2_high_band_net.aspp.conv3.conv.1.running_mean", "stg2_high_band_net.aspp.conv3.conv.1.running_var", "stg2_high_band_net.aspp.conv3.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv4.conv.0.weight", "stg2_high_band_net.aspp.conv4.conv.1.weight", "stg2_high_band_net.aspp.conv4.conv.1.bias", "stg2_high_band_net.aspp.conv4.conv.1.running_mean", "stg2_high_band_net.aspp.conv4.conv.1.running_var", "stg2_high_band_net.aspp.conv4.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv5.conv.0.weight", "stg2_high_band_net.aspp.conv5.conv.1.weight", "stg2_high_band_net.aspp.conv5.conv.1.bias", "stg2_high_band_net.aspp.conv5.conv.1.running_mean", "stg2_high_band_net.aspp.conv5.conv.1.running_var", "stg2_high_band_net.aspp.conv5.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.bottleneck.conv.0.weight", "stg2_high_band_net.aspp.bottleneck.conv.1.weight", "stg2_high_band_net.aspp.bottleneck.conv.1.bias", "stg2_high_band_net.aspp.bottleneck.conv.1.running_mean", "stg2_high_band_net.aspp.bottleneck.conv.1.running_var", "stg2_high_band_net.aspp.bottleneck.conv.1.num_batches_tracked", "stg2_high_band_net.dec4.conv1.conv.0.weight", "stg2_high_band_net.dec4.conv1.conv.1.weight", "stg2_high_band_net.dec4.conv1.conv.1.bias", "stg2_high_band_net.dec4.conv1.conv.1.running_mean", "stg2_high_band_net.dec4.conv1.conv.1.running_var", "stg2_high_band_net.dec4.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.dec3.conv1.conv.0.weight", "stg2_high_band_net.dec3.conv1.conv.1.weight", "stg2_high_band_net.dec3.conv1.conv.1.bias", "stg2_high_band_net.dec3.conv1.conv.1.running_mean", "stg2_high_band_net.dec3.conv1.conv.1.running_var", "stg2_high_band_net.dec3.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.dec2.conv1.conv.0.weight", "stg2_high_band_net.dec2.conv1.conv.1.weight", "stg2_high_band_net.dec2.conv1.conv.1.bias", "stg2_high_band_net.dec2.conv1.conv.1.running_mean", "stg2_high_band_net.dec2.conv1.conv.1.running_var", "stg2_high_band_net.dec2.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.lstm_dec2.conv.conv.0.weight", "stg2_high_band_net.lstm_dec2.conv.conv.1.weight", "stg2_high_band_net.lstm_dec2.conv.conv.1.bias", "stg2_high_band_net.lstm_dec2.conv.conv.1.running_mean", "stg2_high_band_net.lstm_dec2.conv.conv.1.running_var", "stg2_high_band_net.lstm_dec2.conv.conv.1.num_batches_tracked", "stg2_high_band_net.lstm_dec2.lstm.weight_ih_l0", "stg2_high_band_net.lstm_dec2.lstm.weight_hh_l0", "stg2_high_band_net.lstm_dec2.lstm.bias_ih_l0", "stg2_high_band_net.lstm_dec2.lstm.bias_hh_l0", "stg2_high_band_net.lstm_dec2.lstm.weight_ih_l0_reverse", "stg2_high_band_net.lstm_dec2.lstm.weight_hh_l0_reverse", "stg2_high_band_net.lstm_dec2.lstm.bias_ih_l0_reverse", "stg2_high_band_net.lstm_dec2.lstm.bias_hh_l0_reverse", "stg2_high_band_net.lstm_dec2.dense.0.weight", "stg2_high_band_net.lstm_dec2.dense.0.bias", "stg2_high_band_net.lstm_dec2.dense.1.weight", "stg2_high_band_net.lstm_dec2.dense.1.bias", "stg2_high_band_net.lstm_dec2.dense.1.running_mean", "stg2_high_band_net.lstm_dec2.dense.1.running_var", "stg2_high_band_net.lstm_dec2.dense.1.num_batches_tracked", "stg2_high_band_net.dec1.conv1.conv.0.weight", "stg2_high_band_net.dec1.conv1.conv.1.weight", "stg2_high_band_net.dec1.conv1.conv.1.bias", "stg2_high_band_net.dec1.conv1.conv.1.running_mean", "stg2_high_band_net.dec1.conv1.conv.1.running_var", "stg2_high_band_net.dec1.conv1.conv.1.num_batches_tracked", "aux_out.weight", "stg1_low_band_net.0.enc1.conv.0.weight", "stg1_low_band_net.0.enc1.conv.1.weight", "stg1_low_band_net.0.enc1.conv.1.bias", "stg1_low_band_net.0.enc1.conv.1.running_mean", "stg1_low_band_net.0.enc1.conv.1.running_var", "stg1_low_band_net.0.enc1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc2.conv1.conv.0.weight", "stg1_low_band_net.0.enc2.conv1.conv.1.weight", "stg1_low_band_net.0.enc2.conv1.conv.1.bias", "stg1_low_band_net.0.enc2.conv1.conv.1.running_mean", "stg1_low_band_net.0.enc2.conv1.conv.1.running_var", "stg1_low_band_net.0.enc2.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc2.conv2.conv.0.weight", "stg1_low_band_net.0.enc2.conv2.conv.1.weight", "stg1_low_band_net.0.enc2.conv2.conv.1.bias", "stg1_low_band_net.0.enc2.conv2.conv.1.running_mean", "stg1_low_band_net.0.enc2.conv2.conv.1.running_var", "stg1_low_band_net.0.enc2.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc3.conv1.conv.0.weight", "stg1_low_band_net.0.enc3.conv1.conv.1.weight", "stg1_low_band_net.0.enc3.conv1.conv.1.bias", "stg1_low_band_net.0.enc3.conv1.conv.1.running_mean", "stg1_low_band_net.0.enc3.conv1.conv.1.running_var", "stg1_low_band_net.0.enc3.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc3.conv2.conv.0.weight", "stg1_low_band_net.0.enc3.conv2.conv.1.weight", "stg1_low_band_net.0.enc3.conv2.conv.1.bias", "stg1_low_band_net.0.enc3.conv2.conv.1.running_mean", "stg1_low_band_net.0.enc3.conv2.conv.1.running_var", "stg1_low_band_net.0.enc3.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc4.conv1.conv.0.weight", "stg1_low_band_net.0.enc4.conv1.conv.1.weight", "stg1_low_band_net.0.enc4.conv1.conv.1.bias", "stg1_low_band_net.0.enc4.conv1.conv.1.running_mean", "stg1_low_band_net.0.enc4.conv1.conv.1.running_var", "stg1_low_band_net.0.enc4.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc4.conv2.conv.0.weight", "stg1_low_band_net.0.enc4.conv2.conv.1.weight", "stg1_low_band_net.0.enc4.conv2.conv.1.bias", "stg1_low_band_net.0.enc4.conv2.conv.1.running_mean", "stg1_low_band_net.0.enc4.conv2.conv.1.running_var", "stg1_low_band_net.0.enc4.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc5.conv1.conv.0.weight", "stg1_low_band_net.0.enc5.conv1.conv.1.weight", "stg1_low_band_net.0.enc5.conv1.conv.1.bias", "stg1_low_band_net.0.enc5.conv1.conv.1.running_mean", "stg1_low_band_net.0.enc5.conv1.conv.1.running_var", "stg1_low_band_net.0.enc5.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc5.conv2.conv.0.weight", "stg1_low_band_net.0.enc5.conv2.conv.1.weight", "stg1_low_band_net.0.enc5.conv2.conv.1.bias", "stg1_low_band_net.0.enc5.conv2.conv.1.running_mean", "stg1_low_band_net.0.enc5.conv2.conv.1.running_var", "stg1_low_band_net.0.enc5.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv1.1.conv.0.weight", "stg1_low_band_net.0.aspp.conv1.1.conv.1.weight", "stg1_low_band_net.0.aspp.conv1.1.conv.1.bias", "stg1_low_band_net.0.aspp.conv1.1.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv1.1.conv.1.running_var", "stg1_low_band_net.0.aspp.conv1.1.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv2.conv.0.weight", "stg1_low_band_net.0.aspp.conv2.conv.1.weight", "stg1_low_band_net.0.aspp.conv2.conv.1.bias", "stg1_low_band_net.0.aspp.conv2.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv2.conv.1.running_var", "stg1_low_band_net.0.aspp.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv3.conv.0.weight", "stg1_low_band_net.0.aspp.conv3.conv.1.weight", "stg1_low_band_net.0.aspp.conv3.conv.1.bias", "stg1_low_band_net.0.aspp.conv3.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv3.conv.1.running_var", "stg1_low_band_net.0.aspp.conv3.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv4.conv.0.weight", "stg1_low_band_net.0.aspp.conv4.conv.1.weight", "stg1_low_band_net.0.aspp.conv4.conv.1.bias", "stg1_low_band_net.0.aspp.conv4.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv4.conv.1.running_var", "stg1_low_band_net.0.aspp.conv4.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv5.conv.0.weight", "stg1_low_band_net.0.aspp.conv5.conv.1.weight", "stg1_low_band_net.0.aspp.conv5.conv.1.bias", "stg1_low_band_net.0.aspp.conv5.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv5.conv.1.running_var", "stg1_low_band_net.0.aspp.conv5.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.bottleneck.conv.0.weight", "stg1_low_band_net.0.aspp.bottleneck.conv.1.weight", "stg1_low_band_net.0.aspp.bottleneck.conv.1.bias", "stg1_low_band_net.0.aspp.bottleneck.conv.1.running_mean", "stg1_low_band_net.0.aspp.bottleneck.conv.1.running_var", "stg1_low_band_net.0.aspp.bottleneck.conv.1.num_batches_tracked", "stg1_low_band_net.0.dec4.conv1.conv.0.weight", "stg1_low_band_net.0.dec4.conv1.conv.1.weight", "stg1_low_band_net.0.dec4.conv1.conv.1.bias", "stg1_low_band_net.0.dec4.conv1.conv.1.running_mean", "stg1_low_band_net.0.dec4.conv1.conv.1.running_var", "stg1_low_band_net.0.dec4.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.dec3.conv1.conv.0.weight", "stg1_low_band_net.0.dec3.conv1.conv.1.weight", "stg1_low_band_net.0.dec3.conv1.conv.1.bias", "stg1_low_band_net.0.dec3.conv1.conv.1.running_mean", "stg1_low_band_net.0.dec3.conv1.conv.1.running_var", "stg1_low_band_net.0.dec3.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.dec2.conv1.conv.0.weight", "stg1_low_band_net.0.dec2.conv1.conv.1.weight", "stg1_low_band_net.0.dec2.conv1.conv.1.bias", "stg1_low_band_net.0.dec2.conv1.conv.1.running_mean", "stg1_low_band_net.0.dec2.conv1.conv.1.running_var", "stg1_low_band_net.0.dec2.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.lstm_dec2.conv.conv.0.weight", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.weight", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.bias", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.running_mean", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.running_var", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.num_batches_tracked", "stg1_low_band_net.0.lstm_dec2.lstm.weight_ih_l0", "stg1_low_band_net.0.lstm_dec2.lstm.weight_hh_l0", "stg1_low_band_net.0.lstm_dec2.lstm.bias_ih_l0", "stg1_low_band_net.0.lstm_dec2.lstm.bias_hh_l0", "stg1_low_band_net.0.lstm_dec2.lstm.weight_ih_l0_reverse", "stg1_low_band_net.0.lstm_dec2.lstm.weight_hh_l0_reverse", "stg1_low_band_net.0.lstm_dec2.lstm.bias_ih_l0_reverse", "stg1_low_band_net.0.lstm_dec2.lstm.bias_hh_l0_reverse", "stg1_low_band_net.0.lstm_dec2.dense.0.weight", "stg1_low_band_net.0.lstm_dec2.dense.0.bias", "stg1_low_band_net.0.lstm_dec2.dense.1.weight", "stg1_low_band_net.0.lstm_dec2.dense.1.bias", "stg1_low_band_net.0.lstm_dec2.dense.1.running_mean", "stg1_low_band_net.0.lstm_dec2.dense.1.running_var", "stg1_low_band_net.0.lstm_dec2.dense.1.num_batches_tracked", "stg1_low_band_net.0.dec1.conv1.conv.0.weight", "stg1_low_band_net.0.dec1.conv1.conv.1.weight", "stg1_low_band_net.0.dec1.conv1.conv.1.bias", "stg1_low_band_net.0.dec1.conv1.conv.1.running_mean", "stg1_low_band_net.0.dec1.conv1.conv.1.running_var", "stg1_low_band_net.0.dec1.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.1.conv.0.weight", "stg1_low_band_net.1.conv.1.weight", "stg1_low_band_net.1.conv.1.bias", "stg1_low_band_net.1.conv.1.running_mean", "stg1_low_band_net.1.conv.1.running_var", "stg1_low_band_net.1.conv.1.num_batches_tracked", "stg1_high_band_net.enc5.conv1.conv.0.weight", "stg1_high_band_net.enc5.conv1.conv.1.weight", "stg1_high_band_net.enc5.conv1.conv.1.bias", "stg1_high_band_net.enc5.conv1.conv.1.running_mean", "stg1_high_band_net.enc5.conv1.conv.1.running_var", "stg1_high_band_net.enc5.conv1.conv.1.num_batches_tracked", "stg1_high_band_net.enc5.conv2.conv.0.weight", "stg1_high_band_net.enc5.conv2.conv.1.weight", "stg1_high_band_net.enc5.conv2.conv.1.bias", "stg1_high_band_net.enc5.conv2.conv.1.running_mean", "stg1_high_band_net.enc5.conv2.conv.1.running_var", "stg1_high_band_net.enc5.conv2.conv.1.num_batches_tracked", "stg1_high_band_net.lstm_dec2.conv.conv.0.weight", "stg1_high_band_net.lstm_dec2.conv.conv.1.weight", "stg1_high_band_net.lstm_dec2.conv.conv.1.bias", "stg1_high_band_net.lstm_dec2.conv.conv.1.running_mean", "stg1_high_band_net.lstm_dec2.conv.conv.1.running_var", "stg1_high_band_net.lstm_dec2.conv.conv.1.num_batches_tracked", "stg1_high_band_net.lstm_dec2.lstm.weight_ih_l0", "stg1_high_band_net.lstm_dec2.lstm.weight_hh_l0", "stg1_high_band_net.lstm_dec2.lstm.bias_ih_l0", "stg1_high_band_net.lstm_dec2.lstm.bias_hh_l0", "stg1_high_band_net.lstm_dec2.lstm.weight_ih_l0_reverse", "stg1_high_band_net.lstm_dec2.lstm.weight_hh_l0_reverse", "stg1_high_band_net.lstm_dec2.lstm.bias_ih_l0_reverse", "stg1_high_band_net.lstm_dec2.lstm.bias_hh_l0_reverse", "stg1_high_band_net.lstm_dec2.dense.0.weight", "stg1_high_band_net.lstm_dec2.dense.0.bias", "stg1_high_band_net.lstm_dec2.dense.1.weight", "stg1_high_band_net.lstm_dec2.dense.1.bias", "stg1_high_band_net.lstm_dec2.dense.1.running_mean", "stg1_high_band_net.lstm_dec2.dense.1.running_var", "stg1_high_band_net.lstm_dec2.dense.1.num_batches_tracked", "stg1_high_band_net.enc1.conv.0.weight", "stg1_high_band_net.enc1.conv.1.weight", "stg1_high_band_net.enc1.conv.1.bias", "stg1_high_band_net.enc1.conv.1.running_mean", "stg1_high_band_net.enc1.conv.1.running_var", "stg1_high_band_net.enc1.conv.1.num_batches_tracked", "stg1_high_band_net.aspp.conv3.conv.1.bias", "stg1_high_band_net.aspp.conv3.conv.1.running_mean", "stg1_high_band_net.aspp.conv3.conv.1.running_var", "stg1_high_band_net.aspp.conv3.conv.1.num_batches_tracked", "stg1_high_band_net.aspp.conv4.conv.1.bias", "stg1_high_band_net.aspp.conv4.conv.1.running_mean", "stg1_high_band_net.aspp.conv4.conv.1.running_var", "stg1_high_band_net.aspp.conv4.conv.1.num_batches_tracked", "stg1_high_band_net.aspp.conv5.conv.1.bias", "stg1_high_band_net.aspp.conv5.conv.1.running_mean", "stg1_high_band_net.aspp.conv5.conv.1.running_var", "stg1_high_band_net.aspp.conv5.conv.1.num_batches_tracked", "stg1_high_band_net.aspp.bottleneck.conv.0.weight", "stg1_high_band_net.aspp.bottleneck.conv.1.weight", "stg1_high_band_net.aspp.bottleneck.conv.1.bias", "stg1_high_band_net.aspp.bottleneck.conv.1.running_mean", "stg1_high_band_net.aspp.bottleneck.conv.1.running_var", "stg1_high_band_net.aspp.bottleneck.conv.1.num_batches_tracked", "stg1_high_band_net.dec4.conv1.conv.0.weight", "stg1_high_band_net.dec4.conv1.conv.1.weight", "stg1_high_band_net.dec4.conv1.conv.1.bias", "stg1_high_band_net.dec4.conv1.conv.1.running_mean", "stg1_high_band_net.dec4.conv1.conv.1.running_var", "stg1_high_band_net.dec4.conv1.conv.1.num_batches_tracked", "stg1_high_band_net.dec3.conv1.conv.0.weight", "stg1_high_band_net.dec3.conv1.conv.1.weight", "stg1_high_band_net.dec3.conv1.conv.1.bias", "stg1_high_band_net.dec3.conv1.conv.1.running_mean", "stg1_high_band_net.dec3.conv1.conv.1.running_var", "stg1_high_band_net.dec3.conv1.conv.1.num_batches_tracked", "stg1_high_band_net.dec2.conv1.conv.0.weight", "stg1_high_band_net.dec2.conv1.conv.1.weight", "stg1_high_band_net.dec2.conv1.conv.1.bias", "stg1_high_band_net.dec2.conv1.conv.1.running_mean", "stg1_high_band_net.dec2.conv1.conv.1.running_var", "stg1_high_band_net.dec2.conv1.conv.1.num_batches_tracked", "stg1_high_band_net.dec1.conv1.conv.0.weight", "stg1_high_band_net.dec1.conv1.conv.1.weight", "stg1_high_band_net.dec1.conv1.conv.1.bias", "stg1_high_band_net.dec1.conv1.conv.1.running_mean", "stg1_high_band_net.dec1.conv1.conv.1.running_var", "stg1_high_band_net.dec1.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.enc5.conv1.conv.0.weight", "stg3_full_band_net.enc5.conv1.conv.1.weight", "stg3_full_band_net.enc5.conv1.conv.1.bias", "stg3_full_band_net.enc5.conv1.conv.1.running_mean", "stg3_full_band_net.enc5.conv1.conv.1.running_var", "stg3_full_band_net.enc5.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.enc5.conv2.conv.0.weight", "stg3_full_band_net.enc5.conv2.conv.1.weight", "stg3_full_band_net.enc5.conv2.conv.1.bias", "stg3_full_band_net.enc5.conv2.conv.1.running_mean", "stg3_full_band_net.enc5.conv2.conv.1.running_var", "stg3_full_band_net.enc5.conv2.conv.1.num_batches_tracked", "stg3_full_band_net.lstm_dec2.conv.conv.0.weight", "stg3_full_band_net.lstm_dec2.conv.conv.1.weight", "stg3_full_band_net.lstm_dec2.conv.conv.1.bias", "stg3_full_band_net.lstm_dec2.conv.conv.1.running_mean", "stg3_full_band_net.lstm_dec2.conv.conv.1.running_var", "stg3_full_band_net.lstm_dec2.conv.conv.1.num_batches_tracked", "stg3_full_band_net.lstm_dec2.lstm.weight_ih_l0", "stg3_full_band_net.lstm_dec2.lstm.weight_hh_l0", "stg3_full_band_net.lstm_dec2.lstm.bias_ih_l0", "stg3_full_band_net.lstm_dec2.lstm.bias_hh_l0", "stg3_full_band_net.lstm_dec2.lstm.weight_ih_l0_reverse", "stg3_full_band_net.lstm_dec2.lstm.weight_hh_l0_reverse", "stg3_full_band_net.lstm_dec2.lstm.bias_ih_l0_reverse", "stg3_full_band_net.lstm_dec2.lstm.bias_hh_l0_reverse", "stg3_full_band_net.lstm_dec2.dense.0.weight", "stg3_full_band_net.lstm_dec2.dense.0.bias", "stg3_full_band_net.lstm_dec2.dense.1.weight", "stg3_full_band_net.lstm_dec2.dense.1.bias", "stg3_full_band_net.lstm_dec2.dense.1.running_mean", "stg3_full_band_net.lstm_dec2.dense.1.running_var", "stg3_full_band_net.lstm_dec2.dense.1.num_batches_tracked", "stg3_full_band_net.enc1.conv.0.weight", "stg3_full_band_net.enc1.conv.1.weight", "stg3_full_band_net.enc1.conv.1.bias", "stg3_full_band_net.enc1.conv.1.running_mean", "stg3_full_band_net.enc1.conv.1.running_var", "stg3_full_band_net.enc1.conv.1.num_batches_tracked", "stg3_full_band_net.aspp.conv3.conv.1.bias", "stg3_full_band_net.aspp.conv3.conv.1.running_mean", "stg3_full_band_net.aspp.conv3.conv.1.running_var", "stg3_full_band_net.aspp.conv3.conv.1.num_batches_tracked", "stg3_full_band_net.aspp.conv4.conv.1.bias", "stg3_full_band_net.aspp.conv4.conv.1.running_mean", "stg3_full_band_net.aspp.conv4.conv.1.running_var", "stg3_full_band_net.aspp.conv4.conv.1.num_batches_tracked", "stg3_full_band_net.aspp.conv5.conv.1.bias", "stg3_full_band_net.aspp.conv5.conv.1.running_mean", "stg3_full_band_net.aspp.conv5.conv.1.running_var", "stg3_full_band_net.aspp.conv5.conv.1.num_batches_tracked", "stg3_full_band_net.aspp.bottleneck.conv.0.weight", "stg3_full_band_net.aspp.bottleneck.conv.1.weight", "stg3_full_band_net.aspp.bottleneck.conv.1.bias", "stg3_full_band_net.aspp.bottleneck.conv.1.running_mean", "stg3_full_band_net.aspp.bottleneck.conv.1.running_var", "stg3_full_band_net.aspp.bottleneck.conv.1.num_batches_tracked", "stg3_full_band_net.dec4.conv1.conv.0.weight", "stg3_full_band_net.dec4.conv1.conv.1.weight", "stg3_full_band_net.dec4.conv1.conv.1.bias", "stg3_full_band_net.dec4.conv1.conv.1.running_mean", "stg3_full_band_net.dec4.conv1.conv.1.running_var", "stg3_full_band_net.dec4.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.dec3.conv1.conv.0.weight", "stg3_full_band_net.dec3.conv1.conv.1.weight", "stg3_full_band_net.dec3.conv1.conv.1.bias", "stg3_full_band_net.dec3.conv1.conv.1.running_mean", "stg3_full_band_net.dec3.conv1.conv.1.running_var", "stg3_full_band_net.dec3.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.dec2.conv1.conv.0.weight", "stg3_full_band_net.dec2.conv1.conv.1.weight", "stg3_full_band_net.dec2.conv1.conv.1.bias", "stg3_full_band_net.dec2.conv1.conv.1.running_mean", "stg3_full_band_net.dec2.conv1.conv.1.running_var", "stg3_full_band_net.dec2.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.dec1.conv1.conv.0.weight", "stg3_full_band_net.dec1.conv1.conv.1.weight", "stg3_full_band_net.dec1.conv1.conv.1.bias", "stg3_full_band_net.dec1.conv1.conv.1.running_mean", "stg3_full_band_net.dec1.conv1.conv.1.running_var", "stg3_full_band_net.dec1.conv1.conv.1.num_batches_tracked".
size mismatch for stg1_high_band_net.enc2.conv1.conv.0.weight: copying a param with shape torch.Size([24, 12, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 32, 3, 3]).
size mismatch for stg1_high_band_net.enc2.conv1.conv.1.weight: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for stg1_high_band_net.enc2.conv1.conv.1.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for stg1_high_band_net.enc2.conv1.conv.1.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for stg1_high_band_net.enc2.conv1.conv.1.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for stg1_high_band_net.enc2.conv2.conv.0.weight: copying a param with shape torch.Size([24, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for stg1_high_band_net.enc2.conv2.conv.1.weight: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for stg1_high_band_net.enc2.conv2.conv.1.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for stg1_high_band_net.enc2.conv2.conv.1.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for stg1_high_band_net.enc2.conv2.conv.1.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for stg1_high_band_net.enc3.conv1.conv.0.weight: copying a param with shape torch.Size([48, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
size mismatch for stg1_high_band_net.enc3.conv1.conv.1.weight: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg1_high_band_net.enc3.conv1.conv.1.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg1_high_band_net.enc3.conv1.conv.1.running_mean: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg1_high_band_net.enc3.conv1.conv.1.running_var: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg1_high_band_net.enc3.conv2.conv.0.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
size mismatch for stg1_high_band_net.enc3.conv2.conv.1.weight: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg1_high_band_net.enc3.conv2.conv.1.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg1_high_band_net.enc3.conv2.conv.1.running_mean: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg1_high_band_net.enc3.conv2.conv.1.running_var: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg1_high_band_net.enc4.conv1.conv.0.weight: copying a param with shape torch.Size([72, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
size mismatch for stg1_high_band_net.enc4.conv1.conv.1.weight: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.enc4.conv1.conv.1.bias: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.enc4.conv1.conv.1.running_mean: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.enc4.conv1.conv.1.running_var: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.enc4.conv2.conv.0.weight: copying a param with shape torch.Size([72, 72, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for stg1_high_band_net.enc4.conv2.conv.1.weight: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.enc4.conv2.conv.1.bias: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.enc4.conv2.conv.1.running_mean: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.enc4.conv2.conv.1.running_var: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.aspp.conv1.1.conv.0.weight: copying a param with shape torch.Size([96, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
size mismatch for stg1_high_band_net.aspp.conv1.1.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.aspp.conv1.1.conv.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.aspp.conv1.1.conv.1.running_mean: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.aspp.conv1.1.conv.1.running_var: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.aspp.conv2.conv.0.weight: copying a param with shape torch.Size([96, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
size mismatch for stg1_high_band_net.aspp.conv2.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.aspp.conv2.conv.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.aspp.conv2.conv.1.running_mean: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.aspp.conv2.conv.1.running_var: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg1_high_band_net.aspp.conv3.conv.0.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1, 3, 3]).
size mismatch for stg1_high_band_net.aspp.conv3.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
size mismatch for stg1_high_band_net.aspp.conv4.conv.0.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1, 3, 3]).
size mismatch for stg1_high_band_net.aspp.conv4.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
size mismatch for stg1_high_band_net.aspp.conv5.conv.0.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1, 3, 3]).
size mismatch for stg1_high_band_net.aspp.conv5.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
size mismatch for stg3_full_band_net.enc2.conv1.conv.0.weight: copying a param with shape torch.Size([96, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
size mismatch for stg3_full_band_net.enc2.conv1.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg3_full_band_net.enc2.conv1.conv.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg3_full_band_net.enc2.conv1.conv.1.running_mean: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg3_full_band_net.enc2.conv1.conv.1.running_var: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg3_full_band_net.enc2.conv2.conv.0.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
size mismatch for stg3_full_band_net.enc2.conv2.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg3_full_band_net.enc2.conv2.conv.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg3_full_band_net.enc2.conv2.conv.1.running_mean: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg3_full_band_net.enc2.conv2.conv.1.running_var: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for stg3_full_band_net.enc3.conv1.conv.0.weight: copying a param with shape torch.Size([192, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
size mismatch for stg3_full_band_net.enc3.conv1.conv.1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg3_full_band_net.enc3.conv1.conv.1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg3_full_band_net.enc3.conv1.conv.1.running_mean: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg3_full_band_net.enc3.conv1.conv.1.running_var: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg3_full_band_net.enc3.conv2.conv.0.weight: copying a param with shape torch.Size([192, 192, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for stg3_full_band_net.enc3.conv2.conv.1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg3_full_band_net.enc3.conv2.conv.1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg3_full_band_net.enc3.conv2.conv.1.running_mean: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg3_full_band_net.enc3.conv2.conv.1.running_var: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for stg3_full_band_net.enc4.conv1.conv.0.weight: copying a param with shape torch.Size([288, 192, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
size mismatch for stg3_full_band_net.enc4.conv1.conv.1.weight: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.enc4.conv1.conv.1.bias: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.enc4.conv1.conv.1.running_mean: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.enc4.conv1.conv.1.running_var: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.enc4.conv2.conv.0.weight: copying a param with shape torch.Size([288, 288, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for stg3_full_band_net.enc4.conv2.conv.1.weight: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.enc4.conv2.conv.1.bias: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.enc4.conv2.conv.1.running_mean: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.enc4.conv2.conv.1.running_var: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.aspp.conv1.1.conv.0.weight: copying a param with shape torch.Size([384, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
size mismatch for stg3_full_band_net.aspp.conv1.1.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.aspp.conv1.1.conv.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.aspp.conv1.1.conv.1.running_mean: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.aspp.conv1.1.conv.1.running_var: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.aspp.conv2.conv.0.weight: copying a param with shape torch.Size([384, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
size mismatch for stg3_full_band_net.aspp.conv2.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.aspp.conv2.conv.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.aspp.conv2.conv.1.running_mean: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.aspp.conv2.conv.1.running_var: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for stg3_full_band_net.aspp.conv3.conv.0.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1, 3, 3]).
size mismatch for stg3_full_band_net.aspp.conv3.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
size mismatch for stg3_full_band_net.aspp.conv4.conv.0.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1, 3, 3]).
size mismatch for stg3_full_band_net.aspp.conv4.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
size mismatch for stg3_full_band_net.aspp.conv5.conv.0.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1, 3, 3]).
size mismatch for stg3_full_band_net.aspp.conv5.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
size mismatch for out.weight: copying a param with shape torch.Size([2, 48, 1, 1]) from checkpoint, the shape in current model is torch.Size([2, 64, 1, 1]).
Here is my code:
import os
import pydub
cwd = os.getcwd()
ffmpeg_exec = cwd + "\\ffmpeg.exe" # or any other path to ffmpeg, as long as it is absolute and not relative.
pydub.AudioSegment.converter = ffmpeg_exec
from dotenv import load_dotenv
from rvc.modules.uvr5.modules import UVR
load_dotenv(".env")
print("Loading UVR")
uvr = UVR()
print("Extracting vocals...")
os.chdir(cwd + "\\Lib\\site-packages")
# downloaded model from:
# https://github.com/TRvlvr/model_repo/releases/
generator = uvr.uvr_wrapper(
model_name="2_HP-UVR.pth",
audio_path=cwd + "\\audio.wav",
save_vocal_path=cwd + "\\vocal",
save_ins_path=cwd + "\\inst",
agg=5,
export_format="wav",
temp_path=cwd + "\\tmp")
for item in generator:
print(item)
voc_file = cwd + "\\inst\\vocal_audio.wav_5.wav"
generator = uvr.uvr_wrapper(
model_name="5_HP-Karaoke-UVR.pth",
audio_path=voc_file,
save_vocal_path=cwd + "\\main",
save_ins_path=cwd + "\\other",
agg=5,
export_format="wav",
temp_path=cwd + "\\tmp")
for item in generator:
print(item)
main_voc_file = cwd + "\\other\\vocal_vocal_audio.wav_5.wav_5.wav"
generator = uvr.uvr_wrapper(
model_name="UVR-De-Echo-Aggressive.pth",
audio_path=main_voc_file,
save_vocal_path=cwd + "\\noecho",
save_ins_path=cwd + "\\echo",
agg=5,
export_format="wav",
temp_path=cwd + "\\tmp")
for item in generator:
print(item)
Note that both 2_HP-UVR.pth
and 5-HP-Karaoke-UVR.pth
work just fine.
I also ended up trying out the VR-DeEchoNormal.pth
from my own RVC WebUI install and ended up with another error:
Traceback (most recent call last):
File "C:\Users\jeje9\Desktop\rvc_test\rvc_test.py", line 62, in <module>
for item in generator:
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\modules.py", line 85, in uvr_wrapper
pre_fun._path_audio_(
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\vr.py", line 239, in _path_audio_
) = librosa.core.load(
TypeError: load() takes 1 positional argument but 3 positional arguments (and 2 keyword-only arguments) were given
code:
generator = uvr.uvr_wrapper(
model_name="VR-DeEchoNormal.pth",
audio_path=main_voc_file,
save_vocal_path=cwd + "\\noecho",
save_ins_path=cwd + "\\echo",
agg=5,
export_format="wav",
temp_path=cwd + "\\tmp")
for item in generator:
print(item)
I am trying to access the RVC API method infer_convert using the Gradio client, but I am encountering issues with the F0 curve file. I have tried using None and empty string and other files, but it still does not work. Could you please provide an example or guidance on how to properly use the F0 curve file with this API?
MacBook Pro Intel i9 8-Core / AMD Radeon Pro 5300M / 32GB DDR4 RAM / macOS Sanoma 14.2
Python 3.10.13
Poetry 1.7.1
CLI command:
PYTORCH_ENABLE_MPS_FALLBACK=1 rvc infer -rmr 1 -p 0 -ir 0.75 -m weights/Peter/model.pth -if weights/Peter/index.index -i input.mp3 -o output1.mp3
command output:
NFO:rvc.configs.config:No supported Nvidia GPU found
INFO:rvc.configs.config:overwrite configs.json
INFO:rvc.configs.config:Use mps instead
INFO:rvc.configs.config:is_half:False, device:mps
UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
DEBUG:rvc.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109
INFO:rvc.modules.vc.modules:Select index:
INFO:fairseq.tasks.hubert_pretraining:current directory is /Retrieval-based-Voice-Conversion
INFO:fairseq.tasks.hubert_pretraining:HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
INFO:fairseq.models.hubert.hubert:HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'conv_pos_batch_norm': False, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
Traceback (most recent call last):
File "/Retrieval-based-Voice-Conversion/rvc/modules/vc/pipeline.py", line 307, in pipeline
index = faiss.read_index(file_index)
File "/Retrieval-based-Voice-Conversion/.venv/lib/python3.10/site-packages/faiss/swigfaiss_avx2.py", line 9924, in read_index
return _swigfaiss_avx2.read_index(*args)
TypeError: Wrong number or type of arguments for overloaded function 'read_index'.
Possible C/C++ prototypes are:
faiss::read_index(char const *,int)
faiss::read_index(char const *)
faiss::read_index(FILE *,int)
faiss::read_index(FILE *)
faiss::read_index(faiss::IOReader *,int)
faiss::read_index(faiss::IOReader *)
INFO:rvc.modules.vc.pipeline:Loading rmvpe model,assets/rmvpe/rmvpe.pt
/Retrieval-based-Voice-Conversion/.venv/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: The operator 'aten::_fft_r2c' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/Retrieval-based-Voice-Conversion/rvc/lib/infer_pack/attentions.py:334: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
x = F.pad(
{'npy': 6.011045217514038, 'f0': 135.6644949913025, 'infer': 27.060052633285522}
Finish inference. Check output1.mp3
Although I get an output file, the sound has lots of artefacts/noise and is not smooth at all. I see some warnings and errors in the console output, are they the cause? or is it the models I am using?
Also how to get the output combined with the instrumental when using music audio?
Thanks,
i got old version of RVC, where i can download the latest release
Hi there,
I've installed this in a virtual environment but in trying to run rvc init I get:
zsh: permission denied: rvc
I'm kind of stuck at this point. Do you have any insight into how I could get this working? Thank you in advance!
Hello, what will be the difference ? with https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Thank You
I have this error when trying to execute my code.
My code is:
from pathlib import Path
from dotenv import load_dotenv
from scipy.io import wavfile
from rvc.modules.vc.modules import VC
import os
import torch
import json
import ai_config as cfg
vc: VC = VC()
def __load_model__(model_path: str, device: str) -> None:
vc.config.device = device
vc.get_vc(model_path)
def LoadModel() -> None:
if (not cfg.current_data.prompt_order.__contains__("rvc")):
raise Exception("Model is not in 'prompt_order'.")
if (vc != None or len(cfg.current_data.rvc_model_path.strip()) == 0):
return
device = "cuda" if (torch.cuda.is_available() and cfg.current_data.use_gpu_if_available and cfg.current_data.move_to_gpu.count("rvc") > 0) else "cpu"
load_dotenv("rvc_env")
__load_model__(cfg.current_data.rvc_model_path, device)
def __make_rvc__(audio_name: str | Path, protect: float = 0.33, filter_radius: int = 3, method: str = "rmvpe") -> bytes:
LoadModel()
if (type(audio_name) == str):
audio_name = Path(audio_name)
if (len(cfg.current_data.rvc_index_path.strip()) == 0):
index_file = None
else:
index_file = Path(cfg.current_data.rvc_index_path)
if (method != "rmvpe" and method != "pm" and method != "harvest" and method != "crepe"):
raise Exception("RMV method must be 'rmvpe', 'pm', 'harvest' or 'crepe'.")
tgt_sr, audio_opt, _, _ = vc.vc_single(sid = 0, input_audio_path = audio_name, f0_method = method, index_file = index_file, filter_radius = filter_radius, protect = protect)
output_file = "tmp_rvc_audio_"
output_file_id = 0
while (os.path.exists(output_file + str(output_file_id) + ".wav")):
output_file_id += 1
wavfile.write(output_file + str(output_file_id) + ".wav", tgt_sr, audio_opt)
audio_bytes = b""
with open(output_file + str(output_file_id) + ".wav", "wb") as f:
audio_bytes = f.read()
f.close()
os.remove(output_file + str(output_file_id) + ".wav")
return audio_bytes
def MakeRVC(data: str | dict[str]) -> bytes:
if (type(data) == str):
try:
data = json.loads(data)
except Exception as ex:
raise Exception("[RVC] Data must be a dictionary or a JSON code. ERROR: " + str(ex))
ddata = {
"input": "",
"protect": 0.33,
"filter_radius": 3,
"method": "rmvpe"
}
try:
ddata["input"] = data["input"]
except:
raise Exception("Unable to get audio path.")
try:
ddata["protect"] = float(data["protect"])
except:
pass
try:
ddata["filter_radius"] = int(data["filter_radius"])
except:
pass
try:
ddata["method"] = data["method"]
except:
pass
return __make_rvc__(ddata["input"], ddata["protect"], ddata["filter_radius"], ddata["method"])
The traceback is:
Traceback (most recent call last):
File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server_all.py", line 1, in <module>
import ai_server
File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server.py", line 9, in <module>
import chatbot_all as cb
File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/chatbot_all.py", line 14, in <module>
import Inference.RVC_inference as rvc
File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/Inference/RVC_inference.py", line 4, in <module>
from rvc.modules.vc.modules import VC
File "/home/alcoft/.local/lib/python3.11/site-packages/rvc/modules/vc/modules.py", line 21, in <module>
from rvc.modules.vc.utils import *
File "/home/alcoft/.local/lib/python3.11/site-packages/rvc/modules/vc/utils.py", line 3, in <module>
from fairseq import checkpoint_utils
File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/__init__.py", line 20, in <module>
from fairseq.distributed import utils as distributed_utils
File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/distributed/__init__.py", line 7, in <module>
from .fully_sharded_data_parallel import (
File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/distributed/fully_sharded_data_parallel.py", line 10, in <module>
from fairseq.dataclass.configs import DistributedTrainingConfig
File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/dataclass/__init__.py", line 6, in <module>
from .configs import FairseqDataclass
File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/dataclass/configs.py", line 1104, in <module>
@dataclass
^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 1230, in dataclass
return wrap(cls)
^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 1220, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 958, in _process_class
cls_fields.append(_get_field(cls, name, type, kw_only))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 815, in _get_field
raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory
Can anyone help me fix this error?
Hi, getting this error when using the UVR.uvr_wrapper() function:
Loading UVR
Extracting vocals...
Traceback (most recent call last):
File "C:\Users\jeje9\Desktop\rvc_test\rvc_test.py", line 27, in <module>
for i in result:
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\modules.py", line 49, in uvr_wrapper
pre_fun = func(
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\vr.py", line 31, in __init__
mp = ModelParameters("rvc/lib/uvr5_pack/lib_v5/modelparams/4band_v2.json")
File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\lib\uvr5_pack\lib_v5\model_param_init.py", line 55, in __init__
with open(config_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'rvc/lib/uvr5_pack/lib_v5/modelparams/4band_v2.json'
Here is my code:
import os
from dotenv import load_dotenv
from rvc.modules.uvr5.modules import UVR
# downloaded uvr model from:
# https://github.com/TRvlvr/model_repo/releases/
cwd = os.getcwd()
load_dotenv(".env")
print("Loading UVR")
uvr = UVR()
print("Extracting vocals...")
result = uvr.uvr_wrapper(
model_name="2_HP-UVR.pth",
audio_path=cwd + "audio.wav",
save_vocal_path=cwd + "vocal.wav",
save_ins_path=cwd + "inst.wav",
agg=10,
export_format="wav",
temp_path=cwd + "tmp.wav")
for i in result:
print(i)
I made sure to look at the path, and the file does exist, so I'm thinking it might be an issue with the expected CWD since the path is relative.
For more info: I'm running python from a venv and using this command to run: ./Scripts/python.exe rvc_test.py
from the C:\Users\jeje9\Desktop\rvc_test
directory where the rvc_test.py
file is
Maybe we can do this together.
Always got this message:
INFO:xx:Train Epoch: 478 [92%]
INFO:xx:[110200, 9.421142503636453e-05]
INFO:xx:loss_disc=3.598, loss_gen=3.174, loss_fm=8.765,loss_mel=19.367, loss_kl=1.520
Is this normal?
I just git clone this, and then run rvc init in my terminal. Can anybody tell me what I shoud do before I run this command?
I want to use RVC model and host it on aws and get an api which can be used to send some basic details and a voice sample to api, it processes it according to the data sent and sends back a converted voice sample
Please help, this is quite urgent for me, even if you don't have a complete solution, tell me whatever you can, just point me in correct direction
#25 & #14
Looks like RVC won't properly read an index_file.
/root/.cache/pypoetry/virtualenvs/rvc-9TtSrW0h-py3.10/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
2024-05-28 19:13:17 | INFO | rvc.modules.vc.modules | Select index:
index_file: /rvc_models/added_IVF632_Flat_nprobe_1_IvanaAlawi_v2.index
2024-05-28 19:13:18 | INFO | fairseq.tasks.hubert_pretraining | current directory is /app
2024-05-28 19:13:18 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2024-05-28 19:13:18 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
Traceback (most recent call last):
File "/app/rvc/modules/vc/pipeline.py", line 307, in pipeline
index = faiss.read_index(file_index)
File "/root/.cache/pypoetry/virtualenvs/rvc-9TtSrW0h-py3.10/lib/python3.10/site-packages/faiss/swigfaiss_avx2.py", line 10538, in read_index
return _swigfaiss_avx2.read_index(*args)
TypeError: Wrong number or type of arguments for overloaded function 'read_index'.
Possible C/C++ prototypes are:
faiss::read_index(char const *,int)
faiss::read_index(char const *)
faiss::read_index(FILE *,int)
faiss::read_index(FILE *)
faiss::read_index(faiss::IOReader *,int)
faiss::read_index(faiss::IOReader *)
I can confirm the .index file is also in the correct location.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.