facebookresearch / denoiser Goto Github PK

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

License: Other

Python 97.90% Shell 2.10%

denoiser's Issues

Why is bandmask applied on both input (noisy) and target (clean)??

Hi thanks for the repo! I have a question. Why is bandmask applied on both input (noisy) and target (clean)?? would it make sense to only apply bandmask on the input data and let the model reconstruct the missing frequencies? Sorry if this questions is obvious and i just lack some fundamental understanding! Would greatly appreciate any insight on this.

kernel_downsample2 & kernel_upsample2 are the same.

These two functions are identical.
Not sure if it was made with an intention.

denoiser/denoiser/resample.py

Line 34 in 84a5c00

def upsample2(x, zeros=56):

Linux-like support

Hey, really nice project, and cool things in the code, thanks for sharing !

I didn't know about audio-loopback, so I checked if we can emulate it with Linux but this doesn't seem completely obvious.

At the moment, we do not provide official support for other OSes.

Do you have plans about that?

Cheers,
Manu

Processing error with custom dataset

I am training my model on custom dataset, when doing first evaluation this happens:

What "Processing Error" is? How to fix it?

Downsample audio files

Hi,
Does your repo have the script to downsample .wav files?
Or can you share with us the code you used?

KeyError: 'model' when trying to continue from a previously trained model 'best.th'

Yesterday I fine tuned dns64 on 32kHz on Colab. I exported the best.th model file to google, and then tested it, it worked fine. However, I wanted to continue training from this model's point. So I set the config file setting continue_from to /content/drive/MyDrive/best.th

Then when running, I got this error:

[2021-01-11 03:50:57,731][denoiser.solver][INFO] - Loading checkpoint model: /content/drive/MyDrive/best.th
[2021-01-11 03:50:59,057][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 78, in run
    solver = Solver(data, model, optimizer, args)
  File "/content/denoiser/denoiser/solver.py", line 70, in __init__
    self._reset()
  File "/content/denoiser/denoiser/solver.py", line 111, in _reset
    self.model.load_state_dict(package['model']['state'])
KeyError: 'model'

Would love any help whatsoever. It'd be really cool if I could continue training from a previous checkpoint since Colab doesn't give continuous access.

Higher sample rate

Hello, thanks for this repo! Is there a pre-trained model for the higher sample rate? something like 44.1khz which is the most common found in audio?

Distributed training incosstency in train.run(), denoiser/solver.train() and denoiser/solver._run_one_epoch()

Hi,
I noticed that in some cases along these functions sometimes model is used and sometimes dmodel (distributed).

examples:
train.py line 72:
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, betas=(0.9, args.beta2))

denoiser/solver.py in self.train() line 138, 153, 179, 185 :
self.model.train()
self.model.eval()
pesq, stoi = evaluate(self.args, self.model, self.tt_loader)
enhance(self.args, self.model, self.samples_dir)

denoiser/solver.py in self._run_one_epoch() line 216:
estimate = self.dmodel(noisy)

Is it still consistent with distributed training if the optimizer gets the model.parameters() and not dmodel ones?
Also, the same question regarding the model.train() and eval().

Thanks in advance!!

During eval: ValueError: fs (sampling frequency) should be either 8000 or 16000

Hey guys, while training on Colab with a sample rate of 32kHz, the training was going fine up until it started the evaluation process.

I first got this warning multiple times:

 Run model on reference ref and degraded deg
       Sample rate (fs) - No default. Must select either 8000 or 16000.
       Note there is narrow band (nb) mode only when sampling rate is 8000Hz.

[2021-01-10 11:40:29,138][denoiser.evaluate][INFO] - Eval estimates | 116/148 | 12.1 it/sec

 Run model on reference ref and degraded deg
       Sample rate (fs) - No default. Must select either 8000 or 16000.
       Note there is narrow band (nb) mode only when sampling rate is 8000Hz.

 Run model on reference ref and degraded deg
       Sample rate (fs) - No default. Must select either 8000 or 16000.
       Note there is narrow band (nb) mode only when sampling rate is 8000Hz.

And so on...

And then I got this error:

[2021-01-10 11:40:31,883][__main__][ERROR] - Some error happened
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/content/denoiser/denoiser/evaluate.py", line 91, in _run_metrics
    pesq_i = get_pesq(clean, estimate, sr=args.sample_rate)
  File "/content/denoiser/denoiser/evaluate.py", line 108, in get_pesq
    pesq_val += pesq(sr, ref_sig[i], out_sig[i], 'wb')
  File "/usr/local/lib/python3.6/dist-packages/pesq/__init__.py", line 28, in pesq
    raise ValueError("fs (sampling frequency) should be either 8000 or 16000")
ValueError: fs (sampling frequency) should be either 8000 or 16000
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 170, in train
    pesq, stoi = evaluate(self.args, self.model, self.tt_loader)
  File "/content/denoiser/denoiser/evaluate.py", line 72, in evaluate
    pesq_i, stoi_i = pending.result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
ValueError: fs (sampling frequency) should be either 8000 or 16000

For the test set, should I convert the files to 16kHz? Wouldnt make much sense since I'm training on 32kHz.

KeyError: 'class'

No need to respond as I know I'm doing some weird experimental stuff here...

I trained a model with demucs.hidden set to 96, and a depth of 6.

I downloaded the checkpoint.th which was around 4.5 GB.

I ran a command to test it on an audio file, and I got this error:

INFO:denoiser.pretrained:Loading model from /Volumes/Transcend/checkpoint_44K_96.th
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/enhance.py", line 138, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/enhance.py", line 107, in enhance
    model = pretrained.get_model(args).to(args.device)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/pretrained.py", line 62, in get_model
    pkg = torch.load(args.model_path, map_location='cuda:0')
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 765, in _legacy_load
    result = unpickler.load()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 721, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 800, in restore_location
    return default_restore_location(storage, map_location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 174, in default_restore_location
    result = fn(storage, location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 150, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 134, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

So then I edited pretrained.py

I changed:

pkg = torch.load(args.model_path)

to:

pkg = torch.load(args.model_path, map_location=torch.device('cpu'))

and I also tried:

pkg = torch.load(args.model_path, map_location='cpu')

And I get this error when I run the same command again:

INFO:denoiser.pretrained:Loading model from /Volumes/Transcend/checkpoint_44K_96.th
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/enhance.py", line 138, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/enhance.py", line 107, in enhance
    model = pretrained.get_model(args).to(args.device)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/pretrained.py", line 65, in get_model
    model = deserialize_model(pkg)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/denoiser/utils.py", line 39, in deserialize_model
    klass = package['class']
KeyError: 'class'

Recommended architectural parameter changes for higher sample rate

Any recommendations on architecture changes if we want to train on higher sample rate data?

For example, the number of channels, kernel size, strides, etc from this line of code, or even changes to the multiresolution stft loss found here? Or maybe even the parameters in Bandmasking found here.

My intuition is as you have a higher sample rate you want to have larger filter sizes and kernel size to be able to model more frequency bands. Is that a correct assumption? Thanks!

your computer configuration

Hello, may I ask your computer configuration?
And my video card is 2080ti, Is it applicable?
Thank you very much

Very long audio files : sub_iter.strides(0)[0] == 0 INTERNAL ASSERT FAILED

Hi there !

Thanks for your work ! I've been applying your model on short audio files with success, and the result is very impressive !
I'd like to go one step further and enhance 16-hour long audio files.

When I launch :

python -m denoiser.enhance $PRETRAINED_MODEL --noisy_dir=${DATA_DIR} --out_dir=${DATA_DIR}_enhanced_by_${SUFFIX} --verbose --device cuda

I get :

Traceback (most recent call last):
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/enhance.py", line 138, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/enhance.py", line 130, in enhance
    estimate = get_estimate(model, noisy_signals, args)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/enhance.py", line 67, in get_estimate
    estimate = model(noisy)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/demucs.py", line 161, in forward
    mono = mix.mean(dim=1, keepdim=True)
RuntimeError: sub_iter.strides(0)[0] == 0 INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cuda/Reduce.cuh":928, please report a bug to PyTorch.

I tried to launch the model on cpus, with or without the --streaming flag but without success.
According to this thread, it seems that the error occurs when calling the sum function on very large tensors.

Here's the error I get on CPU :

/var/spool/slurmd/job1202815/slurm_script: line 40: 10526 Floating point exception(core dumped) python -m denoiser.enhance $PRETRAINED_MODEL --noisy_dir=${DATA_DIR} --out_dir=${DATA_DIR}_enhanced_by_${SUFFIX} --num_workers 10 --verbose

Does it seem unrealistic to enhance such long audio files to you ? Can you think of any workaround ?
I could cut my long audio files into multiple smaller chunks, but I'd create artifacts and would prefer to avoid this pain :)

Thanks a lot :)

torch error loading checkpoint + best.th

This seems like a new development. After training a new model, and trying to run inference on my computer, I get this (new warning paired with the following error):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  '"sox" backend is being deprecated. '
INFO:denoiser.pretrained:Loading model from /Users/yousseavx/Downloads/checkpoint.th
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/yousseavx/denoiser/denoiser/enhance.py", line 155, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/Users/yousseavx/denoiser/denoiser/enhance.py", line 113, in enhance
    model = pretrained.get_model(args).to(args.device)
  File "/Users/yousseavx/denoiser/denoiser/pretrained.py", line 59, in get_model
    pkg = torch.load(args.model_path)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 774, in _legacy_load
    result = unpickler.load()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 730, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

(This is in fact with the new modification / updated code that solved the error a while back). So when I tried to modify the line:

pkg = torch.load(args.model_path)

to:

pkg = torch.load(args.model_path, map_location=torch.device('cpu'))

I now get this:

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  '"sox" backend is being deprecated. '
INFO:denoiser.pretrained:Loading model from /Users/yousseavx/Downloads/best.th

But now the program just terminates in the terminal immediately after it prints that last line. And I've tried this with both the checkpoint.th model and best.th model.

The model was dns48, training from scratch, training only without cross-validation or testing.

I tried this with the pip3 install -U denoiser

And I also tried running this same command from inside the current git cloned repo, same error / behavior.

master64 same model as reported in the paper?

Hi,
Thanks for the great repo. I ran the master64 pretrained model (that was trained on VCTK and DNS together) and evaluated it on the VCTK valset. I am getting a PESQ score of 3.019 and STOI of 95.00 (averaged across all 824 files in the valset). This model corresponds to H=64, U=4,S=4, so I looked at the paper and the objective score mentioned there (for the same parameter model) is PESQ=2.94 and STOI =95. Does that look to be correct, or I am doing something wrong here?

One another question I had was regarding access to the non-causal model reported in the paper. Are you guys working to release that soon? Thanks in advance for your response!!

RuntimeError: Offset past EOF

Hi,

I'm trying to reproduce your model.
I got an error when I started training on GPUs with launch_valentini.sh.
The error was 'Offset past EOF' but I'm not familiar with the error.
I didn't change conf/conf.yaml except for output directory of logs.
Can you give me any advices I should check next step?

Thank you.

Script output:

$ bash launch_valentini.sh
[2021-02-19 21:30:45,199][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-19 21:30:45,719][denoiser.executor][INFO] - Starting 1 worker processes for DDP.
[2021-02-19 21:30:46,017][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-19 21:30:49,350][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-19 21:30:49,351][denoiser.solver][INFO] - Training...
[2021-02-19 21:30:49,483][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 200, in _run_one_epoch
    for i, data in enumerate(logprog):
  File "/data/home/ntyoshi/denoiser/denoiser/utils.py", line 126, in __next__
    value = next(self._iterator)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/home/ntyoshi/denoiser/denoiser/data.py", line 96, in __getitem__
    return self.noisy_set[index], self.clean_set[index]
  File "/data/home/ntyoshi/denoiser/denoiser/audio.py", line 72, in __getitem__
    out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/__init__.py", line 85, in load
    filetype=filetype,
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/_sox_backend.py", line 47, in load
    filetype
RuntimeError: Offset past EOF

[2021-02-19 21:30:49,532][denoiser.executor][ERROR] - Worker 0 died, killing all workers

Question about the frame length and frame shift .

Hello，sorry to disturb you.
I read the paper and code, but still confused about the frame length and frame shift of the audios.

In the Training paragraph, it said " With this setup, the causal DEMUCS processes audio has a frame size of 37 ms and a stride of 16 ms."

Here, why the frame length and frame shift is 37 and 16ms ? How is it calculated？

Hopefully to hear from you.

Using `--model_path` with pretrained models returns error

Hi,
Inference runs perfectly if I specify the model using --dns48 or --dns64 or --master64. Example:
python3 -m denoiser.enhance --noisy_dir=../noisy/ --out_dir=../cleaned/ --master64 --sample_rate 16000 will do the job.
However, when I try to specify the path of the pre-trained model explicitly using --model_path or -m it will break. Example:
python3 -m denoiser.enhance --noisy_dir=../noisy/ --out_dir=../cleaned/ --model_path /root/.cache/torch/checkpoints/master64-8a5dfb4bb92753dd.th --sample_rate 16000
will give the following error:

INFO:denoiser.pretrained:Loading model from /root/.cache/torch/checkpoints/master64-8a5dfb4bb92753dd.th
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/content/denoiser/denoiser/enhance.py", line 138, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/content/denoiser/denoiser/enhance.py", line 107, in enhance
    model = pretrained.get_model(args).to(args.device)
  File "/content/denoiser/denoiser/pretrained.py", line 60, in get_model
    model = deserialize_model(pkg)
  File "/content/denoiser/denoiser/utils.py", line 35, in deserialize_model
    klass = package['class']
KeyError: 'class'

What I am missing here?

Thank you for your awesome project!

Training on DNS - training parameters?

Hi,
I was trying to train the denoiser on DNS, and so I downloaded the dataset from asteroid{https://github.com/mpariente/asteroid/blob/master/egs/dns_challenge}, finished until stage 2 which is preprocessing the dataset. I get around 60K paired files after preprocessing. For training the denoiser, I am unable to run the code at batch_size=128 (which is in 'launch_dns.sh'). With a 12G 1080Ti GPU, I can do atmost batch_size=4, which is taking me around 3 days to do 1 epoch which is a lot, so I was wondering if you guys did any preprocessing on top for DNS?? Thanks!

The problem of the enhanced results while using different loss functions?

Hello, When I train the demucs model with l1-loss，the enhanced result is very well, and the speech sense is always satisfactory 。But when I change the loss function from l1-loss to l1-loss+0.1* multi-resolution STFT loss， I just changed the stft_loss in the configuration file config.yaml to be true in the configuration file。But the enhanced speech introduced some noise. It will be clear in frequency domain, just like consisting of multiple single frequency components. I have thought a lot and have no idea now.
Hopefully to hear from you.

The script to generate the egs/ files

Hi, Thanks for this great work.
I'm trying to reproduce your paper results and ran script you gave us at README.md in order to generate the egs/ files of Valentini and DNS but json files weren't saved.
Please check my questions below.

I took a look at denoiser/audio.py and I wonder if json.dump(meta, sys.stdout, indent=4) at the bottom line worked in your environment.
When I changed it to json.dump(meta, sys.argv[2], indent=4), the script looked work well in the case of valentini.
When I tried dns script after I cloned DNS-challenge (interspeech2020/master), the part of $testset/synthetic/reverb/noisy seemed to be weird because there is no directory. I guess it means $testset/synthetic/with_reverb/noisy but is it true? Or should I change the branch of the DNS-challenge repository?

Thank you!

Not processing audio fast enough on i7

Hello,
I'm running denoiser with macOS X

ProductName:	Mac OS X
ProductVersion:	10.14.5
BuildVersion:	18F132

on macBook Pro 2,6 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3

^@ip-192-168-178-22:denoiser loretoparisi$ python3 -m denoiser.live
Model loaded.
Ready to process audio.
Not processing audio fast enough, time per frame is 43.6ms !
Not processing audio fast enough, time per frame is 17.2ms !
Not processing audio fast enough, time per frame is 16.8ms !
Not processing audio fast enough, time per frame is 16.5ms !
Not processing audio fast enough, time per frame is 16.4ms !
Not processing audio fast enough, time per frame is 16.0ms !
Not processing audio fast enough, time per frame is 16.0ms !
Not processing audio fast enough, time per frame is 16.1ms !

No additional audio processing application is running. This is top when running denoiser

Processes: 353 total, 4 running, 349 sleeping, 2030 threads                                                                                                   00:53:09
Load Avg: 1.50, 1.53, 1.75  CPU usage: 50.53% user, 5.12% sys, 44.33% idle  SharedLibs: 314M resident, 60M data, 119M linkedit.
MemRegions: 117477 total, 3538M resident, 140M private, 2092M shared. PhysMem: 13G used (2551M wired), 3477M unused.
VM: 1779G vsize, 1371M framework vsize, 17855884(0) swapins, 22605506(0) swapouts. Networks: packets: 8744381/7632M in, 10377030/7433M out.
Disks: 2747195/188G read, 2354610/124G written.

This is without denoiser running:

Processes: 351 total, 2 running, 349 sleeping, 1905 threads                                                                                                   00:54:44
Load Avg: 2.00, 1.72, 1.80  CPU usage: 2.99% user, 3.35% sys, 93.64% idle  SharedLibs: 314M resident, 60M data, 119M linkedit.
MemRegions: 117174 total, 3539M resident, 139M private, 2078M shared. PhysMem: 12G used (2550M wired), 3649M unused.
VM: 1770G vsize, 1370M framework vsize, 17855884(0) swapins, 22605506(0) swapouts. Networks: packets: 8744977/7632M in, 10377512/7433M out.
Disks: 2747324/188G read, 2355710/124G written.

If you need additional system info, I'm happy to provide.

Thank you.

how to compute the time delay or the lookahead time of the causal or uncausal models ?

In the paper, the lookahead time of the causal Demcus is 37ms and the stride is 16ms，and the process window of H = 64 is 16 ms。 So I don't know how to get these time according to the model or how to compute from the parameters of the source code ?

got response "Killed%"

$ python -m denoiser.enhance --noisy_dir="/home/16k_1" --out_dir="/home/op1"


    INFO:denoiser.pretrained:Loading` pre-trained real time H=48 model trained on DNS.
    Killed%

My files are having sample rate of 16000 & channel = 1. still got "Killed%" response

ERROR: Command errored out with exit status 128

ERROR: Command errored out with exit status 128: git clone -q 'ssh://****@github.com/ludlows/python-pesq' /private/var/folders/_b/szqwdfn979n4fdg7f2j875_r0000gn/T/pip-install-hy9metiy/pesq Check the logs for full command output.

with pip3 install -r requirements.txt with our without python3 -m venv .venv, thus this does not happen when installing with pip.

Thank you

inference denoiser

Tell me in a nutshell, if possible, which functions to take from which files to implement such an implementation:
input wav -> function denoiser -> output wav

What parameters to set to train a deeper network?

Hey guys thanks for all the help so far!

I’m curious if I wanted to train from scratch a deeper network, what parameters do I need to change in the yaml config file?

When I fine tuned dns48 and dns64 on 44.1kHz it worked great but I felt that for some reason the higher res frequencies weren’t really as present in the denoiser recording as in the original recording and I’m wondering if that had something to do with the size of the network

Ideally I’d want it to even ‘fill in’ a bit where the higher res frequencies aren’t really audible.

Although maybe this is getting into changing the actual architecture.

Training details

Hi!

First, thank you for the awesome work!

I was interested in getting more details about how you trained your SOTA model. If I understand correctly, you first trained the model on the Valentini dataset, but how did you partition the data into train, validation and test set (there is no such partition when I download it)? Then you used the best model from this first training as a restart for the training with the DNS dataset, right? Did you then use both of the datasets together or only DNS in this second stage? Also for DNS, how did you build the noisy audio samples and how did you partition into train and test set?

Thank you in advance!

Why limit to single thread?

enhance.py
torch.set_num_threads(1)

def get_estimate(model, noisy, args):
    torch.set_num_threads(1)
    if args.streaming:
        streamer = DemucsStreamer(model, dry=args.dry)
        with torch.no_grad():
            estimate = torch.cat([
                streamer.feed(noisy[0]),
                streamer.flush()], dim=1)[None]
    else:
        with torch.no_grad():
            estimate = model(noisy)
            estimate = (1 - args.dry) * estimate + args.dry * noisy
    return estimate

.

solver.py:211: UserWarning: Using a target size (torch.Size([1, 1, 112543])) that is different to the input size (torch.Size([1, 1, 112499])).

In the config yaml file I set depth to 6 and hidden to 96, and I got this error during cross-validation (training was going fine up til then):

[2021-02-05 00:17:19,682][__main__][INFO] - For logs, checkpoints and samples check /content/drive/My Drive/outputs/exp_demucs.hidden=96
[2021-02-05 00:17:26,292][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-05 00:17:26,292][denoiser.solver][INFO] - Training...
[2021-02-05 00:39:43,596][denoiser.solver][INFO] - Train | Epoch 1 | 822/4114 | 0.6 it/sec | Loss 0.00932
[2021-02-05 01:02:00,280][denoiser.solver][INFO] - Train | Epoch 1 | 1644/4114 | 0.6 it/sec | Loss 0.00815
[2021-02-05 01:24:16,911][denoiser.solver][INFO] - Train | Epoch 1 | 2466/4114 | 0.6 it/sec | Loss 0.00757
[2021-02-05 01:46:33,454][denoiser.solver][INFO] - Train | Epoch 1 | 3288/4114 | 0.6 it/sec | Loss 0.00715
[2021-02-05 02:08:49,911][denoiser.solver][INFO] - Train | Epoch 1 | 4110/4114 | 0.6 it/sec | Loss 0.00687
[2021-02-05 02:08:56,199][denoiser.solver][INFO] - Train Summary | End of Epoch 1 | Time 6689.91s | Train Loss 0.00686
[2021-02-05 02:08:56,200][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-05 02:08:56,200][denoiser.solver][INFO] - Cross validation...
/content/denoiser/denoiser/solver.py:211: UserWarning: Using a target size (torch.Size([1, 1, 112543])) that is different to the input size (torch.Size([1, 1, 112499])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  loss = F.l1_loss(clean, estimate)
[2021-02-05 02:09:00,276][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 148, in train
    valid_loss = self._run_one_epoch(epoch, cross_valid=True)
  File "/content/denoiser/denoiser/solver.py", line 211, in _run_one_epoch
    loss = F.l1_loss(clean, estimate)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2190, in l1_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "/usr/local/lib/python3.6/dist-packages/torch/functional.py", line 52, in broadcast_tensors
    return torch._C._VariableFunctions.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (112499) must match the size of tensor b (112543) at non-singleton dimension 2

I wonder if this is related to my settings or something else. batch size was set to 6

there are some error in using " python -m denoiser.live -i 5 --out 0".

i install denoiser using "pip install denoiser".
os : ubuntu 16.04 LTS
error log:
File "/home/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/live.py", line 142, in
main()
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/live.py", line 80, in main
model = get_model(args).to(args.device)
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/pretrained.py", line 69, in get_model
model = dns48()
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/pretrained.py", line 31, in dns48
return _demucs(pretrained, DNS_48_URL, hidden=48)
File "/home/anaconda3/lib/python3.7/site-packages/denoiser/pretrained.py", line 25, in _demucs
state_dict = torch.hub.load_state_dict_from_url(url, map_location='cpu')
File "/home/anaconda3/lib/python3.7/site-packages/torch/hub.py", line 495, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/home/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 585, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 772, in _legacy_load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 4906173 more bytes. The file might be corrupted.

It seems that something went wrong during model loading.

The parameters of the models is not the same to the described in papers?

Hello，sorry to bother you! When compute the parameters of the causal model with the default config(H =48), the generated model best.th is 73M，which described is 18.9M in the paper. So which one is right or I got it wrong?

Which trainset of Valentini dataset you use for training?

Great work, I was trying to reproduce your result, but there are two trainsets in Valentini dataset, one with 56 speakers, the other contains 28 speakers. Could you let me know which train set you are used for training?

Error when fine-tuning: RuntimeError: Error(s) in loading state_dict for Demucs

After running python3 train.py I get:

[2020-09-30 00:18:43,333][__main__][INFO] - For logs, checkpoints and samples check /Users/youssef/denoiser/outputs/exp_
[2020-09-30 00:18:44,810][denoiser.solver][INFO] - Fine tuning from pre-trained model dns64
Downloading: "https://dl.fbaipublicfiles.com/adiyoss/denoiser/dns64-a7761ff99a7d5bb6.th" to /Users/youssef/.cache/torch/checkpoints/dns64-a7761ff99a7d5bb6.th
100%|#################################################################################################################################| 128M/128M [02:00<00:00, 1.12MB/s]

[2020-09-30 00:20:47,197][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 75, in run
    solver = Solver(data, model, optimizer, args)
  File "/Users/youssef/denoiser/denoiser/solver.py", line 70, in __init__
    self._reset()
  File "/Users/youssef/denoiser/denoiser/solver.py", line 123, in _reset
    self.model.load_state_dict(model.state_dict())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Demucs:
	size mismatch for encoder.0.0.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
	size mismatch for encoder.0.0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
	size mismatch for encoder.0.2.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
	size mismatch for encoder.0.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for encoder.1.0.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
	size mismatch for encoder.1.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for encoder.1.2.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
	size mismatch for encoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for encoder.2.0.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
	size mismatch for encoder.2.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for encoder.2.2.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
	size mismatch for encoder.2.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for encoder.3.0.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
	size mismatch for encoder.3.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for encoder.3.2.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
	size mismatch for encoder.3.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for encoder.4.0.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
	size mismatch for encoder.4.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for encoder.4.2.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
	size mismatch for encoder.4.2.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for decoder.0.0.weight: copying a param with shape torch.Size([2048, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 768, 1]).
	size mismatch for decoder.0.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for decoder.0.2.weight: copying a param with shape torch.Size([1024, 512, 8]) from checkpoint, the shape in current model is torch.Size([768, 384, 8]).
	size mismatch for decoder.0.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for decoder.1.0.weight: copying a param with shape torch.Size([1024, 512, 1]) from checkpoint, the shape in current model is torch.Size([768, 384, 1]).
	size mismatch for decoder.1.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for decoder.1.2.weight: copying a param with shape torch.Size([512, 256, 8]) from checkpoint, the shape in current model is torch.Size([384, 192, 8]).
	size mismatch for decoder.1.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for decoder.2.0.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
	size mismatch for decoder.2.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for decoder.2.2.weight: copying a param with shape torch.Size([256, 128, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
	size mismatch for decoder.2.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for decoder.3.0.weight: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
	size mismatch for decoder.3.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for decoder.3.2.weight: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([96, 48, 8]).
	size mismatch for decoder.3.2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([48]).
	size mismatch for decoder.4.0.weight: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
	size mismatch for decoder.4.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
	size mismatch for decoder.4.2.weight: copying a param with shape torch.Size([64, 1, 8]) from checkpoint, the shape in current model is torch.Size([48, 1, 8]).
	size mismatch for lstm.lstm.weight_ih_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.weight_hh_l0: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.bias_ih_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.bias_hh_l0: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.weight_ih_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.weight_hh_l1: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
	size mismatch for lstm.lstm.bias_ih_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for lstm.lstm.bias_hh_l1: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).

Before running this command, I changed the config.yaml file from:

continue_pretrained:

continue_pretrained: dns64

STFT Loss device issues

Hi,
When fine-tuning on a gpu machine and setting the STFT loss to true in the config file I get an error:

    solver.train()
  File "/home/wscuser/denoiser/denoiser/solver.py", line 143, in train
    train_loss = self._run_one_epoch(epoch)
  File "/home/wscuser/denoiser/denoiser/solver.py", line 50, in _run_one_epoch
    sc_loss, mag_loss = self.mrstftloss(estimate.squeeze(1), clean.squeeze(1))
  File "/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wscuser/denoiser/denoiser/stft_loss.py", line 138, in forward
    sc_l, mag_l = f(x, y)
  File "/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wscuser/denoiser/denoiser/stft_loss.py", line 94, in forward
    x_mag = stft(x, self.fft_size, self.shift_size, self.win_length, self.window)
  File "/home/wscuser/denoiser/denoiser/stft_loss.py", line 28, in stft
    x_stft = torch.stft(x, fft_size, hop_size, win_length, window)
  File "/anaconda/lib/python3.7/site-packages/torch/functional.py", line 516, in stft
    normalized, onesided, return_complex)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Any idea why is it happening?

Thanks in advance!

what is the purpose of "first" in _separate_frame function

hi,
it is an amazing project, and thank you for the awesome work!

when i tried to understand _separate_frame function:

denoiser/denoiser/demucs.py

Line 345 in 84a5c00

def _separate_frame(self, frame):

specifically this line, why need this operation?

denoiser/denoiser/demucs.py

Line 373 in 84a5c00

x = th.cat([prev, x], -1)

thanks a lot again!

RuntimeError: CUDA out of memory

Hey guys, in trying to make the first 'hello world' into training this model / fine-tuning it, I basically replaced the debug files noisy.json and clean.json with my own json file content that pointed to my own dataset. The dataset contains around 2.5K files and is around 1GB when at 44kHz and lower when at 16kHz as expected.

The problem is that when trying to run this on Colab (which worked with the original toy dataset provided, I'm now getting this unexpected error:

[2020-10-01 19:57:18,722][__main__][INFO] - For logs, checkpoints and samples check /content/denoiser/outputs/exp_demucs.hidden=64
[2020-10-01 19:57:23,614][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-10-01 19:57:23,615][denoiser.solver][INFO] - Training...
[2020-10-01 19:57:26,054][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/content/denoiser/denoiser/solver.py", line 207, in _run_one_epoch
    estimate = self.dmodel(noisy)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/denoiser/denoiser/demucs.py", line 184, in forward
    x = decode(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/activation.py", line 94, in forward
    return F.relu(input, inplace=self.inplace)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 914, in relu
    result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 502.00 MiB (GPU 0; 14.73 GiB total capacity; 13.03 GiB already allocated; 467.88 MiB free; 13.49 GiB reserved in total by PyTorch)

I have different versions of my dataset, from 16kHz, to 22kHz, 32kHz, and 44.1khz. Every time I try one, I get a variant of the same error above. For example, when I try with 44.1kHz:

!python3 train.py demucs.hidden=64 sample_rate=44100

I get:

[2020-10-01 20:00:59,516][__main__][INFO] - For logs, checkpoints and samples check /content/denoiser/outputs/exp_demucs.hidden=64,sample_rate=44100
[2020-10-01 20:01:04,753][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-10-01 20:01:04,754][denoiser.solver][INFO] - Training...
[2020-10-01 20:01:09,170][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/content/denoiser/denoiser/solver.py", line 207, in _run_one_epoch
    estimate = self.dmodel(noisy)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/denoiser/denoiser/demucs.py", line 176, in forward
    x = encode(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/activation.py", line 448, in forward
    return F.glu(input, self.dim)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 946, in glu
    return torch._C._nn.glu(input, dim)
RuntimeError: CUDA out of memory. Tried to allocate 2.69 GiB (GPU 0; 14.73 GiB total capacity; 11.28 GiB already allocated; 2.65 GiB free; 11.30 GiB reserved in total by PyTorch)

Whereas when I try 16kHz, I get:

[2020-10-01 19:57:18,722][__main__][INFO] - For logs, checkpoints and samples check /content/denoiser/outputs/exp_demucs.hidden=64
[2020-10-01 19:57:23,614][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2020-10-01 19:57:23,615][denoiser.solver][INFO] - Training...
[2020-10-01 19:57:26,054][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 99, in main
    _main(args)
  File "train.py", line 93, in _main
    run(args)
  File "train.py", line 76, in run
    solver.train()
  File "/content/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/content/denoiser/denoiser/solver.py", line 207, in _run_one_epoch
    estimate = self.dmodel(noisy)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/denoiser/denoiser/demucs.py", line 184, in forward
    x = decode(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/activation.py", line 94, in forward
    return F.relu(input, inplace=self.inplace)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 914, in relu
    result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 502.00 MiB (GPU 0; 14.73 GiB total capacity; 13.03 GiB already allocated; 467.88 MiB free; 13.49 GiB reserved in total by PyTorch)

And the 'memory tried to allocate' just varies, the free memory is always just underneath the 'available' memory.

The fact that the 'free memory' varies when I change the dataset (which have different sizes) makes me think it's something entirely different than CUDA being out of memory, though I could be wrong.

The version pytorch I'm running is 1.4.0 and 0.4.0 for torchaudio because otherwise I get an error saying the Cuda driver is out of date.

I get this error when I try to train, and when I try to fine tune.

Am I doing something wrong in setting all this up? Should I be arranging my files differently than the debug ones? I tried to place everything its correct directory, and point everything to its correct directory.

training loss increases when finetuning on master64 on VCTK

Hi!
Thanks for the great repo. The work is awesome!
When I tried to finetune a denoising network on VCTK dataset, I observe that the train loss increases after a few epochs. I am loading the pretrained model "master64" and then finetuning on the VCTK dataset. Here is my forked repo with more details:link. Please look at the "history.json" to check the training history.

Thanks!

Is there any way to use multi-threading for `denoiser.enhance`?

This is with --num_workers=30 --batch_size=30, and there are 773 wavs in my noisy_dir

My other threads are lonely :(

I love the repo project by the way, insanely cool and effective.

vergy small loss while bad performance

I should say it is an excellent project firstly.

Once I cloned the codes and tried to train the sample dataset (debug), the training process has no problem. The validation loss is around 0.05 and the performance of the enhanced noisy file is good. Then I tried to replace the debug dataset by my dataset:

move my dataset to dataset
generate the clean and noisy .json files with make_debug.sh
modify the conf/config.yaml and build a new conf/dset/mydata.yaml for my dataset
The new training process has no errors and the validation loss is around 0.0008. But when I check the enhanced performance of files in output/ex/samples, the level of the enhanced wav files is almost zero. The level of the input noisy files is around 500~1000, so there should be mistakes somewhere. But I dont know whether I missed something or my dataset has some problem?

Thanks in advance.

RuntimeError: Given groups=1, weight of size [48, 1, 8], expected input[1, 2, 114721108] to have 1 channels, but got 2 channels instead

how hydra version？

hi，thanks your repo.
I running './train.py ddp=1'
I get the BUg :
Traceback (most recent call last):
File "./train.py", line 104, in main
_main(args)
File "./train.py", line 96, in _main
start_ddp_workers()
File "/data/code/denoiser/denoiser/executor.py", line 77, in start_ddp_workers
log = utils.HydraConfig().hydra.job_logging.handlers.file.filename
AttributeError: 'HydraConfig' object has no attribute 'hydra'

Implementation of CSIG, CBAK, COVL metrics

Hi!
In original paper you evaluated your model on metrics PESQ, STOI, CSIG, CBAK, COVL. I found the implementations of PESQ and STOI metrics in denoiser/evaluate.py file but didn't see CSIG, CBAK, COVL metrics. Are you going to add these metrics? It will be very helpful!

Thanks in advance!

concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

FineTuning on custom dataset using pre-trained models

Hi!
The pre-trained model is performing really well on my dataset. I wanted to finetune this model, that is, master64 (trained on Valentini and DNS datasets), on my dataset and then check the results. Could you let me know how I can do that?
On a tangential note, in the config files, there is a parameter called matching which is set to sort. If I have understood correctly, as long as clean and noisy files are named similarly so that they have the same order on sort, the matching is successul, am I correct?
Thanks in advance for the clarification!

Resample kernel

Hi, in the resample kernel part，when generating hann window to truncated sinc function，i confused why creat a 4 /times zeros hann window and then select the odd part. Is there any difference to diretly creat a 2 /times zeros hann window?

denoiser/denoiser/resample.py

Lines 52 to 53 in e4d61a1

 win = th.hann_window(4 * zeros + 1, periodic=False) 

 winodd = win[1::2]

hubconf models update

Hi! Thank you for your work! Can you add existing model to hubconf.py? Seems that there are some mistakes in current models' names:

denoiser/hubconf.py

Line 9 in 7083381

from denoiser.pretrained import demucs_rt48, demucs_rt64 # noqa

Thank you!

Usage example denoiser

Hi, I apologize for the stupid question, but what commands need to be run to remove noise from wav file (indicating the path to it)?

Denosing .wav files using pre-trained model

Hi!
I have a couple of .wav files that I want to denoise using the model you have provided, without any training involved. Could you suggest how that can be done?
Thanks in advance.

	win = th.hann_window(4 * zeros + 1, periodic=False)
	winodd = win[1::2]

facebookresearch / denoiser Goto Github PK

denoiser's Issues

Recommend Projects

Recommend Topics

Recommend Org