audio-westlakeu / fullsubnet Goto Github PK

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Home Page: https://fullsubnet.readthedocs.io/en/latest/

License: MIT License

Python 100.00%

speech-enhancement speech-processing speech-separation pytorch pretrained-model paper full-band sub-band single-channel noise-reduction

fullsubnet's Introduction

FullSubNet

Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

Guides

The documentation is hosted on Read the Docs. Check the documentation for how to train and test models.

Improved FullSubNet: Further reduces computational costs and enables high sampling rate data processing, e.g., 48 KHz and 24 KHz.
- ❇️ Model Architecture
📰 FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement, ICASSP 2021
- 📸 Demo (Audio Clips)
- 🎏 Model Checkpoints
- ❇️ Model Architecture
📰 Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement
- ❇️ Model Architecture
- 📸 Demo (Audio Clips)
cIRM-based Fullband baseline model (described in the original FullSubNet paper)
- ❇️ Model Architecture

Citation

If you use this code for your research, please consider citeing:

@INPROCEEDINGS{hao2020fullsubnet,
    author={Hao, Xiang and Su, Xiangdong and Horaud, Radu and Li, Xiaofei},
    booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    title={Fullsubnet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement},
    year={2021},
    pages={6633-6637},
    doi={10.1109/ICASSP39728.2021.9414177}
}

License

This repository Under the MIT license.

fullsubnet's People

Contributors

Stargazers

Watchers

Forkers

17011775 200987299 xzm2004260 okrio vkeep xiaozhuo12138 gatsbychen vvasily raikarsagar diaodiaolzq parvez2017 vancause eagomez2 zhongshijun runngezhang ctwgl sumedhachak newoneincntk sciai-ai keonlee9420 iamweiweishi tiamat-tech erridan23 noise-suppression wimmerb gxu82 3i-hust-asr xiongmaoxia gongpinghuang unanan lemon1124 rookiejunchen weberjulian fatehsinghiit scofir avetikpahlevanyan vladimirsvsv77 windstudent samsudinng juie shaun95 aidanmomo threhe13 oucxlw jungbin-park crodriguez1a normonisping songtaoshi wenxingyang93 toprince jerryctt ioyy900205 sorangnl0 hashimoto-siproc jassam yaoao2017 nangongmujd intflow agangzz martinmml markhsia huynguyenqc shkarupadc pzhang266 miblue119 mtxing zhucq youngjay0612 flyk13 fchest lflyme jinmingche kumpon-sotsukpiam au-plus chaiyakornkhunthong wendonggan llizarr drago-new kutuyy07 erat-verbum fragrantrookie xiaoxx18 dccdark mallang327 pechpijit selimcandemirtas bkungbk sutthikarn07 kunrabil speechwatch sucrose-desu jessicajunginmyworld ooshyun baekms zhangyuewei98 wackerx itnongkhai tamjojo meadow163 wuxiangyu

fullsubnet's Issues

How make inference on a single WAV file using pre-train model?

No can find example for a single file.
Thanks you.

[Question] How to do streaming reasoning (Real-Time), is there any documentation?

I'm 小白, thank you for the guidance !

Training and Preprocessing related

Hi,

I tried preparing DNS Sample data for 50hours and got clean, noise, noisy folders which contains audio data by running noisyspeech_synthesizer_singleprocess.py

As per the code in config/common/fullsubnet_train.toml it requires below text files. Is there any code available to generate these files from audio data?

[train_dataset]
path = "dataset.DNS_INTERSPEECH_train.Dataset"
clean_dataset = "/Datasets/DNS-Challenge-INTERSPEECH/datasets/clean_0.6.txt"
noise_dataset = "/Datasets/DNS-Challenge-INTERSPEECH/datasets/noise.txt"
rir_dataset = "~/Datasets/DNS-Challenge-INTERSPEECH/datasets/rir.txt"

Regards
Yugesh

Difficulties with getting demos

Hi,

I have problems with understanding how to get some demos on my noisy audios with your model. You have multiple branches with different codes, in docs you refer to inference.py file which is not present in main branch. I could find the needed code inside source.zip from release. But using that code raises an exception that the checkpoint file does not fit the initialized model (errors in loading state dict)
I also have hard time understanding all your config hierarchy and what should I change to simply get a result for several audios.

I would be very grateful if you pointed out where working code is located and how to launch it.

Best regards,
Vladyslav

The real-time speech enhance is poor

The block length of 32 ms and the block shift of 8 ms for real-time speech enhancement is poor，but a single audio speech enhancement works well.
What causes it?
How can I improve ?
Noisy:

The block length of 32 ms and the block shift of 8 ms for real-time speech enhancement :

A single audio speech enhancement :

error !

hi @haoxiangsnr,
I run pre-training on Google Colab followed by this link:
https://github.com/haoxiangsnr/FullSubNet/blob/main/docs/getting_started.md
but I got an issue like this:
command: !python inference.py -C /content/FullSubNet/recipes/dns_interspeech_2020/fullband_baseline/inference.toml -M /content/drive/MyDrive/Colab_Notebooks/FullSubNet/fullsubnet_best_model_58epochs.tar -O /content/drive/MyDrive/Colab_Notebooks/FullSubNet/output_dir
result:

Loading inference dataset...
Loading model...
Traceback (most recent call last):
File "inference.py", line 32, in
main(configuration, checkpoint_path, output_dir)
File "inference.py", line 16, in main
output_dir
File "/content/FullSubNet/recipes/dns_interspeech_2020/inferencer.py", line 50, in init
super().init(config, checkpoint_path, output_dir)
File "/content/FullSubNet/audio_zen/inferencer/base_inferencer.py", line 27, in init
self.model, epoch = self._load_model(config["model"], checkpoint_path, self.device)
File "/content/FullSubNet/audio_zen/inferencer/base_inferencer.py", line 91, in _load_model
model = initialize_module(model_config["path"], args=model_config["args"], initialize=True)
File "/content/FullSubNet/audio_zen/utils.py", line 87, in initialize_module
module = importlib.import_module(module_path)
File "/usr/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'model'

please help me a hand, thanks a lot

Unable to fine-tune pre-trained model (fullsubnet_best_model_58epochs.tar)

I am trying to continue training the pre-trained FullSubNet model provided by this repo:

fullsubnet_best_model_58epochs.tar

I can confirm the model works for inference. However, I run into issues loading the state dictionary for training based on how the model was saved.

Here is the error in full:

(FullSubNet) $ torchrun --standalone --nnodes=1 --nproc_per_node=1 train.py -C fullsubnet/train.toml -R
1 process initialized.
Traceback (most recent call last):
  File "/home/github/FullSubNet/recipes/dns_interspeech_2020/train.py", line 99, in <module>
    entry(local_rank, configuration, args.resume, args.only_validation)
  File "/home/github/FullSubNet/recipes/dns_interspeech_2020/train.py", line 59, in entry
    trainer = trainer_class(
  File "/home/github/FullSubNet/recipes/dns_interspeech_2020/fullsubnet/trainer.py", line 17, in __init__
    super().__init__(dist, rank, config, resume, only_validation, model, loss_function, optimizer)
  File "/home/github/FullSubNet/audio_zen/trainer/base_trainer.py", line 84, in __init__
    self._resume_checkpoint()
  File "/home/github/FullSubNet/audio_zen/trainer/base_trainer.py", line 153, in _resume_checkpoint
    self.scaler.load_state_dict(checkpoint["scaler"])
  File "/home/anaconda3/envs/FullSubNet/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 502, in load_state_dict
    raise RuntimeError("The source state dict is empty, possibly because it was saved "
RuntimeError: The source state dict is empty, possibly because it was saved from a disabled instance of GradScaler.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1822537) of binary: /home/anaconda3/envs/FullSubNet/bin/python
Traceback (most recent call last):
  File "/home/anaconda3/envs/FullSubNet/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
  File "/home/anaconda3/envs/FullSubNet/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/anaconda3/envs/FullSubNet/lib/python3.9/site-packages/torch/distributed/run.py", line 724, in main
    run(args)
  File "/home/anaconda3/envs/FullSubNet/lib/python3.9/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/home/anaconda3/envs/FullSubNet/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/anaconda3/envs/FullSubNet/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-06-19_21:25:20
  host      : host-server
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1822537)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Are there specific modifications that need to be made to continue training?

Thank you for your help.

The results of released model

As the released model has better results than reported in paper, there is some diff in training list or settings?

Usage

Hii. Please How do I use this to remove the background noise of a .wav file.

I have been reading through the repo and I can't seem to find the starting point

Training dataset bug

Looks like a bug
https://github.com/haoxiangsnr/FullSubNet/blob/38bb6e64b4bebc377e1ba84911851aae51b6cc37/recipes/dns_interspeech_2020/dataset_train.py#L214-L217

processing latency of fullsubnet

Thank you very much for your work. Can the processing latency of fullsubnet be further shortened to less than 10ms

[Question] The real-time speech enhance is poor, Need help!

At present, I have completed the modification of cumulative_laplace_norm, and then sent it to the network in batches through Stft for streaming inference, and obtained the Hidden state and cell state of the network. But the results are poor. I looked at the previous issue, you said that you need to replace LSTM with LSTMCell, what is the difference between the two? Why do you convert it?
Pictures are as follow:

Fullwav_load:

Stream_load:

Metrics: (My experiment)

...	NB_PESQ	WB_PESQ	SI_SDR	STOI
FullSubNet-cum\| Epoch 130	3.364	2.861	17.65	96.25
FullSubNet-cum-stream \|	3.155	2.466	14.77	94.30

# of Epochs for training a FullBand baseline model

Hello,

My question is about the training. I am just trying to replicate the results with the DNS challange dataset. The number of epochs for the fullband_baseline.toml file is set to 9999 which seems to be a "little" high :) Could you please shed some light on it ? Is this the default value ?

Thank you for sharing your work.

B.R.

仓库里模型比比赛时提交的结果好很多，这个是有什么不同吗

您好，
我注意到FullSubNet 参加DNS2021 的成绩是在dev_testset MOS 3.06.
https://www.microsoft.com/en-us/research/uploads/prod/2020/12/Challenge_Results.pdf

但是我实际测试仓库的的模型，指标是：3.44
==> DNSMOS_SIG : 3.790972579288795
==> DNSMOS_BAK : 4.130271822666175
==> DNSMOS_OVR : 3.441530761341177

请问是当时提交的模型与这里的不一样吗？

感谢！

Error(s) in loading state_dict for Model: Missing key(s) in state_dict

Hi, thanks for uploading the project. I have the following issue.

I downloaded the src and the tar file and changed the path in the toml. This is the command I run.

python inference.py -C config/inference/fullsubnet.toml -M /content/fullsubnet_best_model_58epochs.tar -O /content/output

This is the output

Using CPU in the experiment.

Loading inference dataset...

Num of noisy files in /content/input: 1

Loading model...

当前正在处理 tar 格式的模型断点，其 epoch 为：58.

Traceback (most recent call last):
  File "inference.py", line 37, in <module>
    main(configuration, checkpoint_path, output_dir)
  File "inference.py", line 15, in main
    output_dir
  File "/content/FullSubNet-0.1/src/inferencer/DNS_INTERSPEECH.py", line 52, in __init__
    super().__init__(config, checkpoint_path, output_dir)
  File "/content/FullSubNet-0.1/src/common/inferencer.py", line 25, in __init__
    self.model, epoch = self._load_model(config["model"], checkpoint_path, self.device)
  File "/content/FullSubNet-0.1/src/common/inferencer.py", line 94, in _load_model
    model.load_state_dict(new_state_dict)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
	Missing key(s) in state_dict: "fband_model.sequence_model.weight_ih_l0", "fband_model.sequence_model.weight_hh_l0", "fband_model.sequence_model.bias_ih_l0", "fband_model.sequence_model.bias_hh_l0", "fband_model.sequence_model.weight_ih_l1", "fband_model.sequence_model.weight_hh_l1", "fband_model.sequence_model.bias_ih_l1", "fband_model.sequence_model.bias_hh_l1", "fband_model.fc_output_layer.weight", "fband_model.fc_output_layer.bias", "sband_model.sequence_model.weight_ih_l0", "sband_model.sequence_model.weight_hh_l0", "sband_model.sequence_model.bias_ih_l0", "sband_model.sequence_model.bias_hh_l0", "sband_model.sequence_model.weight_ih_l1", "sband_model.sequence_model.weight_hh_l1", "sband_model.sequence_model.bias_ih_l1", "sband_model.sequence_model.bias_hh_l1", "sband_model.fc_output_layer.weight", "sband_model.fc_output_layer.bias". 
	Unexpected key(s) in state_dict: "l.sequence_model.weight_ih_l0", "l.sequence_model.weight_hh_l0", "l.sequence_model.bias_ih_l0", "l.sequence_model.bias_hh_l0", "l.sequence_model.weight_ih_l1", "l.sequence_model.weight_hh_l1", "l.sequence_model.bias_ih_l1", "l.sequence_model.bias_hh_l1", "l.fc_output_layer.weight", "l.fc_output_layer.bias". ```

The batch size for the validation stage must be one

Hi Hao Xiang

I can't run the demo due to the limit that gpu usage must be over 20 percentage. Therefore, I found the validation set the batchsize

must be one. Can I change the batchsize in validation?

Hope your reply!

关于训练时长

您好；
我实际用了大概11h的数据用于训练，batch_size=48, 两块k80显卡；
目前训练发现一个epoch用时大概50分钟，按照.toml中epochs=9999的配置，要训练接近一年。？
想问下这种训练速度是正常的么

reverberat data

When i look into the code in generate training data, the reverberation step convolve the rir signal with the clean signal. Then the reverberated signal act as clean signal to train the network. Is there something wrong or i misunderstanding the code.

您好，请教下关于overlapped_chunk()函数

您好，能问下https://github.com/haoxiangsnr/FullSubNet/blob/main/recipes/dns_interspeech_2020/inferencer.py#L130 noisy传入的大小嘛，以及num_mics=8的含义。

Volume problem after noise reduction

@haoxiangsnr
Sometimes there are volume variations when speech is enhanced by pre-trained model，for example：
noisy：

enhanced：

Any plans about releasing the pretrained models？

First, thanks for the open-source implementation. I saw that the pretrained model is on your TO-DO list in the baseline readme. Do you have any plans for the releasing schedule?
Thanks a lot!

Unexpected Training Time

I am trying to get FullSubNet up and running by following the repo instructions. It seem we must make a custom train.toml, where we specify the relevant file paths and have text files with absolute paths. I am only looking at the training with no reverb.

I observe the following training time for one epoch on my system with two 2080 Ti GPUs.

This project contains 1 models, the number of the parameters is: 
        Network 1: 5.637635 million.
The amount of parameters in the project is 5.637635 million.
=============== 1 epoch ===============
[0 seconds] Begin training...
         Saving 1 epoch model checkpoint...
[966 seconds] Training has finished, validation is in progress...
         Saving 1 epoch model checkpoint...
         😃 Found a best score in the 1 epoch, saving...
[1031 seconds] This epoch is finished.

This is much faster than I would expect a 5M parameter model to train for a dateset of this size. I am also not sure how to use the evaluation logs, since they seem to be in a proprietary format.

Could you tell us how long it takes to train a few epochs and what evaluation results we should expect early on?

Thank you for your help.

Does the Pretrained Model available in releases folder works with 48k sampling rate?

@haoxiangsnr

Hello,

FullSubNet model works with 48k sampling rate in inferencing time?

Regards
Yugesh

Sub-band model

Hi, thanks for sharing this excellent project.
I took a great interest in the model(Delayed Sub-Band LSTM) posted in the paper.
I've tried my hard to reproduce this model, but still can't get a well performance.
Can you release the code about your sub-band models ?
Thanks a lot!

Date Preprocessing

Hi,

I use the pretrain model and its always very good effect except some special case, fine tuning need in my case.
Now I train the model like this:
1, download dns interspeech2020 branch data, split clean data to 6s length and 3s overlap, then every epoch model meet 500h noisy data;
2, I use 2080Ti * 8 and batch size set to 8. Other param just like train.toml;
3, the train loss look like this

Is there any other preprocess on the orign train data? And any advice for train in this little batch?
Thank you

Normalization

Dear authors,

I notice that in snr_mix, the signal dBFs will be [-35, -15], meaning the intensity can change randomly.
However, in inference.py, normalization is applied, which is weird.
From my understanding, we either normalize all data or don't normalize all data, but why do you choose to normalize it in inference while discarding it during training?
Maybe I have some misunderstanding, please correct me if possible.

Best

Audio input length

I am curious about does changing input length of the model to 80000 (16k sample rate) affect the proformance? Do I need to increase the hidden layers or change some settings? Thank you~

Not an issue, but I wanted to show you some impressive results...

Greetings!
I have been helping restore an old (1980) recording of a recording of an interview with an elderly person relaying stories of the early history of the Baha'i Faith in the U.S.. I have surveyed and tried a number of machine learning methods to denoise and enhance the recording. I just finished processing with your FullSubNet today, and it far surpassed the other ones I tried out in removing the recording noise to make the voice easier to understand. Enclosed is a graphic of the comparison of the frequency spectograms of the 3 files where the top one is the original recording, the middle is the result of using another method (that was dozens of times slower than yours and principally dealt with white noise) and at the bottom is the result of FullSubNet using your pretrained checkpoint. The reduction in noise going from the original recording to what your method produced was astonishing! I can check to see if the archivist would allow me to provide the recordings (so if you are interested in getting them, please let me know). Thanks so much for making the code available here!
Regards
-Steve

Training and Validation cRM Mismatch

During training, with batch size 10, we observe the following shapes:

cRM torch.Size([10, 128, 193, 2])
noisy_real torch.Size([10, 257, 193])
noisy_imag torch.Size([10, 257, 193])

However, during validation, we see:

cRM torch.Size([1, 257, 626, 2])
noisy_real torch.Size([1, 257, 626])
noisy_imag torch.Size([1, 257, 626])

Why is dimension 1 and 2 of the cRM different during training but not during validation?

Without these, I am unable to get the enhanced waveform during training, since this calculation fails:

cRM = decompress_cIRM(cRM)

enhanced_real = cRM[..., 0] * noisy_real - cRM[..., 1] * noisy_imag
enhanced_imag = cRM[..., 1] * noisy_real + cRM[..., 0] * noisy_imag

Question about DDP

Hello Dear Author, Thank you for providing such detailed code.
I have a troubling question, has there been a comparison of the accuracy of the trained models in DP mode and DDP mode?
And has the author compared the difference between full precision and half precision training in DDP mode?
Because my experimental results are that DP mode accuracy is much higher than DDP, and the difference between DDP half precision and DDP full precision is not much.

CPU推理速度很慢

一个4-5分钟的音频，在cpu上推理大概40-50分钟出结果，在GPU上大概2-3秒，这个参赛实时率是怎么算的
cpu信息：
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
座： 2
NUMA 节点： 2
厂商 ID： GenuineIntel
CPU 系列： 6
型号： 85
型号名称： Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
步进： 4
CPU MHz： 2301.000
CPU max MHz: 2301.0000
CPU min MHz: 1000.0000
BogoMIPS： 4600.0

Number of files to be synthesized: 12000 Generating file #0 WARNING: Audio type not supported Generating file

Im getting an error when I try to run python noisyspeech_synthesizer_singleprocess.py
Number of files to be synthesized: 12000
Generating file #0
WARNING: Audio type not supported
Generating file #1500
WARNING: Audio type not supported
Generating file #3000
WARNING: Audio type not supported
Generating file #4500
WARNING: Audio type not supported
Generating file #6000
WARNING: Audio type not supported
Generating file #7500
WARNING: Audio type not supported
Generating file #9000
WARNING: Audio type not supported
Generating file #10500
WARNING: Audio type not supported

how to fix this

num_groups_in_drop_band 和训练的batch size的关系

我可以使用batch size为8 (大于1设置), 然后 num_groups_in_drop_band 设置为1, 这样可以同时训练多个batch size音频不, 加快训练速度

Will there be a 44.1 or 48kHz pre-trained model released?

Hi guys! Your work is absolutely amazing and inspiring. Trying your model on data that was not on your datasets, it performed very well.

I did have to convert to 16kHz first since as I understand, it was trained on 16kHz?

My question is: will you guys release a 44.1kHz or 48kHz pre-trained model in the near future or not?

While I could try my hand at training that model, I'm nowhere near experienced as you guys at this and feel that I'd miss so many things and would not be able to create a model that generalizes as well as you have, or maybe you can prove me wrong.

Training is wrong

I loaded your code to train the whole data.
But, the process pauses after loading train.toml
Do you test correctness of your code.

'torch.onnx.export()' error

Hi, thanks for sharing this excellent project.
I'm having some problems about 'torch.onnx.export()'

Failed to export an ONNX attribute 'onnx::Gather', since it's not constant, please try to make things (e.g., kernel size) static if possible

how can i fixed it,have any suggestions?

Thanks a lot!

How to use Pretrained models

Hello,
could you please guide me how to use the pretrained model to just use it for one audio file ? and where to download the weights

😍😍

How to Use Pretrained, pickled model in Releases with No Documentation?

@haoxiangsnr Hi, it's already April and there still isn't any documentation for the pretrained model in releases. How do we go about using the pickled file data.pkl for inference? Thanks!

Training Error

Hello and thanks for your good idea, I'm new in deep learning and tried to train your model with my data in the following command:
torchrun --standalone --nnodes = 1 --nproc_per_node = 1 train.py -C fullsubnet / my_train.toml
The only change that In the file "train.toml" I gave (except for data paths ) it was that in the [train_dataset.dataloader] section, I put the batch size= 8 and num_workers= 36, but I got an error. The part of the error file is as follows:

Training: 100%|██████████| 7500/7500 [1:07:11<00:00, 1.86it/s]
Validation: 0it [00:00, ?it/s]
Validation: 0it [00:00, ?it/s]
Traceback (most recent call last):
File "/home/p.mohammadian.student.sharif/FullSubNet/recipes/dns_interspeech_2020/train.py", line 103, in
entry(local_rank, configuration, args.resume, args.only_validation)
File "/home/p.mohammadian.student.sharif/FullSubNet/recipes/dns_interspeech_2020/train.py", line 72, in entry
trainer.train()
File "/home/p.mohammadian.student.sharif/FullSubNet/audio_zen/trainer/base_trainer.py", line 337, in train
metric_score = self._validation_epoch(epoch)
File "/home/p.mohammadian.student.sharif/.conda/envs/p.mohammadian.student.sharif/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, kwargs)
File "/home/p.mohammadian.student.sharif/FullSubNet/recipes/dns_interspeech_2020/fullsubnet/trainer.py", line 111, in _validation_epoch
self.writer.add_scalar(f"Loss/Validation_Total", loss_total / len(self.valid_dataloader), epoch)
ZeroDivisionError: float division by zero
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 34536) of binary: /home/p.mohammadian.student.sharif/.conda/envs/p.mohammadian.student.sharif/bin/python
Traceback (most recent call last):
File "/home/p.mohammadian.student.sharif/.conda/envs/p.mohammadian.student.sharif/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.10.2', 'console_scripts', 'torchrun')())
File "/home/p.mohammadian.student.sharif/.conda/envs/p.mohammadian.student.sharif/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init**.py", line 345, in wrapper
return f(*args, kwargs)
File "/home/p.mohammadian.student.sharif/.conda/envs/p.mohammadian.student.sharif/lib/python3.9/site-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/home/p.mohammadian.student.sharif/.conda/envs/p.mohammadian.student.sharif/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/p.mohammadian.student.sharif/.conda/envs/p.mohammadian.student.sharif/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/p.mohammadian.student.sharif/.conda/envs/p.mohammadian.student.sharif/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-03-01_00:13:28
host : GPU-4-0-3.local
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 34536)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Thanks for your guidance!

Hi, if I want to achieve the real-time speech enhance by pre-trained model, do you have any detailed suggestions? Thanks

error

con somebody help me fix this

command : python inference.py -C C:\Users\punnp\Desktop\FullSubNet\recipes\dns_interspeech_2020\fullsubnet/inference.toml -M C:\Users\punnp\Desktop\FullSubNet\model\fullsubnet_best_model_58epochs.tar -O C:\Users\punnp\Desktop\FullSubNet\output

result :
Traceback (most recent call last):
File "C:\Users\punnp\anaconda3\envs\FullSubNet\lib\site-packages\toml\decoder.py", line 512, in loads
multibackslash)
File "C:\Users\punnp\anaconda3\envs\FullSubNet\lib\site-packages\toml\decoder.py", line 778, in load_line
value, vtype = self.load_value(pair[1], strictly_valid)
File "C:\Users\punnp\anaconda3\envs\FullSubNet\lib\site-packages\toml\decoder.py", line 880, in load_value
return (self.load_array(v), "array")
File "C:\Users\punnp\anaconda3\envs\FullSubNet\lib\site-packages\toml\decoder.py", line 1026, in load_array
nval, ntype = self.load_value(a[i])
File "C:\Users\punnp\anaconda3\envs\FullSubNet\lib\site-packages\toml\decoder.py", line 866, in load_value
raise ValueError("Reserved escape sequence used")
ValueError: Reserved escape sequence used

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "inference.py", line 30, in
configuration = toml.load(config_path.as_posix())
File "C:\Users\punnp\anaconda3\envs\FullSubNet\lib\site-packages\toml\decoder.py", line 134, in load
return loads(ffile.read(), _dict, decoder)
File "C:\Users\punnp\anaconda3\envs\FullSubNet\lib\site-packages\toml\decoder.py", line 514, in loads
raise TomlDecodeError(str(err), original, pos)
toml.decoder.TomlDecodeError: Reserved escape sequence used (line 20 column 1 char 241)

what the means of customerSTFT class?

what is the means?I cannot find this class?

when is it appropriate to use_mag_phase=True

Hello, Thanks for the paper and code! I was wondering, when is it appropriate to use the mag_phase to do istft? https://github.com/haoxiangsnr/FullSubNet/blob/main/audio_zen/acoustics/feature.py#L48

Questions about the training process

Very interesting project. Thank you for sharing.

I have a quastion - what are the text files noise.txt, rir.txt and clean_0.6.txt?
Are they part of the original dataset or dedicated files that you've created for the training?

Another qaustion - is it possible to run it on Windows run without the "dist" feature (using a single GPU)?
(I mean after commecting all parts related to the 'dist')

questions about rir_dataset in fullband_train.toml

I am really interested in this project, but I am a little confused about the rir_dataset, and I want to make clear about what rir_dataset is. Is it compatitable with any dataset elsewhere or is there a project's official dataset?

这个项目很有意思，但是那个rir_dataset在原始的DNS数据集里没有，我注意到网上有一些，但我不确定能不能用？
或者您当时跑训练时用的rir数据集方便参考一下？
感谢！

New

Batch size and GPU out of memory

Hi, I have been trying to train the FullSubNet model for a while using the code in this repo. What I experienced is that I must use a batch size of maximum 12, resulting in a very slow and inefficient training (the loss decreases quite slowly). If I try with a larger batch size, I get a GPU out-of-memory message.

I have two Nvidia RTX 2080 Ti with 11 GB each. I see from train.toml that the default batch size is 48, any suggestion on that?

Demand + CSTR VCTK Corpus

Hi!

Thank you very much for releasing the code for your work.
I wonder when do you plan to release model trained on Demand + CSTR VCTK Corpus? It is a very important academic benchmark and it would be a great asset for comparison with your work.

Thank you!

Delayed Sub-Band LSTM

Great job! Thanks for the open-source implementation. I saw delayed sub-Band LSTM is in your planed models. When will you update that? Thank you!