drethage / speech-denoising-wavenet Goto Github PK
View Code? Open in Web Editor NEWA neural network for end-to-end speech denoising
License: MIT License
A neural network for end-to-end speech denoising
License: MIT License
Getting two errors while training the Wavenet model on Google Colab cloud GPU.
1.
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 409, in data_generator_task
generator_output = next(generator)
File "/content/drive/app/speech-denoising-wavenet/datasets.py", line 125, in get_random_batch_generator
noise = noisy - speech
ValueError: operands could not be broadcast together with shapes (65302,) (60727,)
File "main.py", line 169, in
main()
File "main.py", line 163, in main
training(config, cla)
File "main.py", line 81, in training
config['training']['num_epochs'])
File "/content/drive/app/speech-denoising-wavenet/models.py", line 167, in fit_model
initial_epoch=self.epoch_num)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1481, in fit_generator
str(generator_output))
"ValueError: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None".
Any idea what the issue is and how to resolve it? @drethage
Running main.py
with arguments !THEANO_FLAGS=optimizer=fast_compile,device=gpu python /content/speech-denoising-wavenet/main.py --mode inference --config /content/speech-denoising-wavenet/sessions/001/config.json --noisy_input_path /content/speech-denoising-wavenet/data/NSDTSEA/noisy_testset_wav --clean_input_path /content/speech-denoising-wavenet/data/NSDTSEA/clean_testset_wav
Using TensorFlow backend.
/usr/lib/python2.7/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
WARNING: Logging before flag parsing goes to stderr.
W0724 03:39:50.798191 139764758083456 deprecation_wrapper.py:119] From /usr/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:310: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
W0724 03:39:50.807167 139764758083456 deprecation.py:506] From /usr/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:619: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0724 03:39:53.131911 139764758083456 deprecation.py:506] From /usr/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:480: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Loading model from epoch: 144
W0724 03:39:53.568438 139764758083456 deprecation_wrapper.py:119] From /usr/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:106: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
W0724 03:39:53.568799 139764758083456 deprecation_wrapper.py:119] From /usr/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:111: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
W0724 03:39:53.569037 139764758083456 deprecation_wrapper.py:119] From /usr/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:116: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2019-07-24 03:39:53.569346: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-24 03:39:53.574141: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-07-24 03:39:53.574383: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x28812c0 executing computations on platform Host. Devices:
2019-07-24 03:39:53.574417: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
W0724 03:39:53.574882 139764758083456 deprecation_wrapper.py:119] From /usr/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:258: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
2019-07-24 03:39:53.858215: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
W0724 03:39:54.107969 139764758083456 deprecation_wrapper.py:119] From /usr/lib/python2.7/site-packages/keras/optimizers.py:610: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
Performing inference..
The script then stops.
Could you provide some additional information or evaluation script for enhanced speech? (to getting SIG, BAK, OVL)
Denoising: p257_156.wav
0%| | 0/1 [00:00<?, ?it/s]terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
thanks for your repo. Can you provide the cudnn, cudatoolkit and tensorflow_gpu version? I was confused by these versions.
We need to move it into Android, but there is no such interface to read from the checkpoints.
Requirements.txt doesn't specify tensorflow version needed.
@drethage Can you clarify if the training/test data needs to be in .wav format or it can support any other format? I have audio data in .mp3 format and converting it to .wav is resource intensive. How can I train and test the wavenet model with data in .mp3 format itself?
Dear,
In the python3+,No module named optparse, I find the instruction that
"Deprecated since version 2.7: The optparse module is deprecated and will not be developed further; development will continue with the argparse module."
and the link is here,
but could it be used with argparse ?
I trained the model according to the usage instructions, and also downloaded the given data set, but the following error will occur during training. Can anyone help me?
Using Theano backend.
E:\anaconda\lib\site-packages\theano\gpuarray\dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <= v7.
warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 7401 on context None
Mapped name None to device cuda: GeForce GTX 1060 (0000:01:00.0)
Traceback (most recent call last):
File "main.py", line 169, in
main()
File "main.py", line 165, in main
inference(config, cla)
File "main.py", line 108, in inference
load_checkpoint=cla.load_checkpoint, print_model_summary=cla.print_model_summary)
File "E:\speech-denoising-wavenet-master\models.py", line 67, in init
self.model = self.setup_model(load_checkpoint, print_model_summary)
File "E:\speech-denoising-wavenet-master\models.py", line 76, in setup_model
model = self.build_model()
File "E:\speech-denoising-wavenet-master\models.py", line 220, in build_model
name='data_input_target_field_length')(data_expanded)
File "E:\anaconda\lib\site-packages\keras\engine\base_layer.py", line 457, in call
output = self.call(inputs, **kwargs)
File "E:\speech-denoising-wavenet-master\layers.py", line 47, in call
x = keras.backend.permute_dimensions(x, [0, 2, 1])
File "E:\anaconda\lib\site-packages\keras\backend\theano_backend.py", line 936, in permute_dimensions
y._keras_shape = tuple(np.asarray(x._keras_shape)[list(pattern)])
IndexError: index 2 is out of bounds for axis 0 with size 2
@drethage
I do not know why. All the audio after denoising is almost silent.
`
{
"dataset": {
"extract_voice": true,
"in_memory_percentage": 1,
"noise_only_percent": 0.02,
"num_condition_classes": 29,
"path": "data/ShowerNoise/",
"regain": 0.06,
"sample_rate": 16000,
"type": "nsdtsea"
},
"model": {
"condition_encoding": "binary",
"dilations": 7,
"filters": {
"lengths": {
"res": 3,
"final": [3, 3],
"skip": 1
},
"depths": {
"res": 128,
"skip": 128,
"final": [2048, 256]
}
},
"num_stacks": 3,
"target_field_length": 1601,
"target_padding": 1
},
"optimizer": {
"decay": 0.0,
"epsilon": 1e-08,
"lr": 0.001,
"momentum": 0.9,
"type": "adam"
},
"training": {
"batch_size": 4,
"early_stopping_patience": 16,
"loss": {
"out_1": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_2": {
"l1": 1,
"l2": 0,
"weight": 1
}
},
"num_epochs": 15,
"num_test_samples": 50,
"num_train_samples": 450,
"path": "sessions/ShowerNoise",
"verbosity": 1
}
}
I have 500 audio files for training, and inside them, there are 100 files are clean audio, 7 files are noise-only with the silence output. I do not know why.
`
can you share your pesq and stoi? I evaluate testset, but i find that the results is bad, logmmse method pesq:2.837,stoi:0.89, wavenet-denoised method:pesq:2.34 stoi:0.92. how can i modify it?
Readme requirements doesn't specify python version.
So apparently https://github.com/francoisgermain/SpeechDenoisingWithDeepFeatureLosses is picking up some steam... what are the differences betweent eh two systems?
Hi,
I'm using a P100 GPU and tried to retrained the model(pretrained is too large), but I found it always
gose to early stopping at small epoch, any way to do more epoch ?
I'm using default config and modified to used less memory by reducing "dilations" to 5
Epoch 34/250
1000/1000 [==============================] - 66s - loss: 0.0765 - data_output_1_loss: 0.0382 - data_output_2_loss: 0.0382 - data_output_1_mean_absolute_error: 0.0382 - data_output_1_valid_mean_absolute_error: 0.0382 - data_output_2_mean_absolute_error: 0.0382 - data_output_2_valid_mean_absolute_error: 0.0382 - val_loss: 0.0749 - val_data_output_1_loss: 0.0374 - val_data_output_2_loss: 0.0374 - val_data_output_1_mean_absolute_error: 0.0374 - val_data_output_1_valid_mean_absolute_error: 0.0374 - val_data_output_2_mean_absolute_error: 0.0374 - val_data_output_2_valid_mean_absolute_error: 0.0374
Epoch 00033: early stopping
def get_dataset(config, model):
if config['dataset']['type'] == 'vctk+demand':
return datasets.VCTKAndDEMANDDataset(config, model).load_dataset()
elif config['dataset']['type'] == 'nsdtsea':
return datasets.NSDTSEADataset(config, model).load_dataset()
I can not find the the class of VCTKAndDEMANDDataset().
Thanks.
The training process starting with the code according to "readme.md" has been early terminated at epoch 34 with loss 0.034, I consider the trained model has satisfied some requirements and could deal with denoising but I got the 0dB audio which contains nothing but silence. I do not know whether it is related to the early stop but the very low loss makes it confusing.
I like this framework. I tested both training and denoising phases and it works. I was wondering about ways to speed up the training phase? aside from using a GPU, what other tricks can I use to speed up the training or what ideal parameters to use, etc? thanks a lot for your time and research!
Why do we need to input clean sound? Isn't it possible to output a clean sound file just by inputting noisy sound?
Ideally, while inferencing the trained model should take the noisy audio data and output the denoised/cleaned version of the same audio files. I have some noisy audio files on which I want to run the pretrained model and get the clean files. But, while inferencing, it asks for path to clean audio data too which I don't have and the model doesn't run without it.
Why is a previously trained model asking for clean audio files? @drethage
Everytime I run the model with the regular inputs from the readme I get this.
Denoising: p232_001.wav
0%| | 0/1 [00:00<?, ?it/s]
(wavenet) C:\Users\JSai\Documents\Cochlear_Implants\speech-denoising-wavenet>
ANd it terminates.
Any help would be great.
we have trained using our test data, but there are errors reported as
ValueError: Error when checking : expected condition_input to have shape (None, 1) but got array with shape (1, 5)
I wonder what the errors may be like, how we can fix it, and how are the designated target wav can be created. Noted that we have only clean wav files with 16000 sample rate.
Thanks
Hi! I couldn't find the pre-trained model...it was supposed to have the folder sessions/001/models?
I am trying to run this model on my files. I have followed the instructions in the documentation. When I run this command:
THEANO_FLAGS=optimizer=fast_compile,device=gpu KERAS_BACKEND=theano python main.py --mode inference --target_field_length 16001 --batch_size 4 --config sessions/001/config.json --noisy_input_path ../noisy
I get an exception,
Exception: The nvidia driver version installed with this OS does not give good results for reduction.
I am cuda9.0 and I had installed all the requirements in the requirements.txt. Has anyone faced a similar issue.
On running the same inference command given in the readme.md, I am getting the following OOM error. I am running it on Intel Core i5 7th gen CPU with 8GB RAM and NVidia 940MX 4GB GPU, Keras 1.2 and Theano 0.9.0.
THEANO_FLAGS=optimizer=fast_compile,device=gpu python main.py --mode inference --config sessions/001/config.json --noisy_input_path data/NSDTSEA/noisy_testset_wav --clean_input_path data/NSDTSEA/clean_testset_wav
Using TensorFlow backend.
/usr/local/lib/python2.7/dist-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
Loading model from epoch: 144
2018-02-18 17:40:19.280369: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-18 17:40:19.486539: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-18 17:40:19.486944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.2415
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.67GiB
2018-02-18 17:40:19.486961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
Performing inference..
Denoising: p232_001.wav
0%| | 0/2 [00:00<?, ?it/s]
2018-02-18 17:40:23.358141: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.01GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-02-18 17:40:33.358618: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 696.00MiB. Current allocation summary follows.
.
.
Stats:
Limit: 3605921792
InUse: 3542674176
MaxInUse: 3542674176
NumAllocs: 973
MaxAllocSize: 464153344
.
.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[192,128,2770,1]
Allocator (GPU_0_bfc) ran out of memory trying to allocate 389.18MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
@drethage How to solve this error? If you need anymore information please let me know.
System give a error: the parameter of data_expanded is float type, but it should be int type. So I switch the type to int, system also give a error: the parameter of data_expanded is int type, but it should float type. And I'm fuzzed!
While the provided model in sessions/001 works well on NSDTSEA test files, the results on my own noisy files (recorded in real conditions) are much worse.
What did I do:
Large part of speech was suppressed, although SNR is not very low.
Maybe I do something wrong?
To try it yourself: https://drive.google.com/open?id=1njlPLNjbTuY1QlW_19y06a1ywuImBUHo
As i use the python 3, debuging this code meet some problems.
I want to know the model size/ parameter number, so could you tell me the szie of the model? number of parameters,
Thank you !
Qiquan Zhang
The config file documentation mentions that:
"...in_memory_percentage: (float) Percentage of the dataset to load into memory, useful when dataset requires more memory than available..."
Does it mean that if I set the value to "1.00" the training toolchain will try load 100% of the recordings to memory?
And when I set it to "0.10" the training toolchain will try load only 10% of the recordings to memory at first, but later it will eventually load the other 90% during subsequent iterations - or it will just use 10% in total and ignore the other 90%?
Hi,
I am trying to reproduce the repository. However, I am facing the problem with running and using Theano. Can you tell me which NVIDIA-driver version and cuda, cudnn versions you have used. Also, if you can tell which pygpu version was used to run this project.
Thanks,
Akhil
I would like to predict a distribution but not a single value per sample, like original Wavenet does. What should I change in source code (I have seen there is some preparation for this in util.py where sound is converted from linear to ulaw and back...). Help please
I have been digging into the code but I haven't been able to make it work, what I want to achieve is just to denoise files.
What flow have you follow to achieve it?
Hi, I used this code and the finetuned model directly. But I encountered a problem. The output enhanced wave file is not as long as the original noisy wave. Some samples seemed to be missing in the end of the input noisy wave. I can't figure out what is the reason.
Thanks very much.
ValueError: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None
I was trying to train with my new set of data .
Windows 10, python 3.6.3
Full error when trying to run main.py:
Using TensorFlow backend.
Traceback (most recent call last):
File "\speech-denoising-wavenet-master\main.py", line 169, in <module>
main()
File "speech-denoising-wavenet-master\main.py", line 163, in main
training(config, cla)
File "\speech-denoising-wavenet-master\main.py", line 72, in training
model = models.DenoisingWavenet(config, load_checkpoint=cla.load_checkpoint, print_model_summary=cla.print_model_summary)
File "speech-denoising-wavenet-master\models.py", line 50, in __init__
self.samples_of_interest_indices = self.get_padded_target_field_indices()
File "speech-denoising-wavenet-master\models.py", line 184, in get_padded_target_field_indices
target_sample_index + self.half_target_field_length + self.target_padding + 1)
TypeError: 'float' object cannot be interpreted as an integer
Hi,
initially I could not train the model using the batch size 10, due to the memory issue. Then I started training model using batch size of 5 (edits the config.json file) without touching any other parms, The algorithm stops after 41 epochs due to the early stopping condition. I used the 41th check point as the model parameters while testing, the returned denoised wave file is empty (full silence).
what would be the problem? is the early stopping causes the null model? does the batch size really matter while testing a model?
Hello,
When I run on the same NSDTSEA without changing any of the config.json parameters, the model returned has a validation error of 0.013. If you don't mind, could you tell me how you got the validation error as 0.00144 for the model you displayed. Is there any special parameters to tune to achieve that performance?.
Hi,
I have a setup with NVIDIA GTX 1080 card with 8GB RAM but I'm still unable to perform a successfull denoising run on the default dataset without getting an out-of-memory error from TensorFlow :(
I'm currently retraining the model with a smaller "dilations" parameter value, but just out of curiosity - what kind of HW are you using to run denoising without OOM error on your side?
Dear,
Could U please update the project with python3 ?
Thx
I follow the instruction on readme file. Also I used the NSDTSEA dataset to rebuild the results, but the output speech is 0dB, and they are all silence. So I am wondering did I miss something that might lead to this result? Also, I could not find the model under sessions/001/, so I used the configuration under sessions/001/ to train the model, and used the trained model to denoise, but it still give me silence for the results. Could you tell me what else I could do to debug?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.