breizhn / dtln-aec Goto Github PK

View Code? Open in Web Editor NEW

269.0 269.0 70.0 59.1 MB

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

License: MIT License

Python 100.00%

dtln-aec's People

Contributors

Stargazers

Watchers

dtln-aec's Issues

Is this a mistake in the paper? or Am i misunderstood it ?

The input of the second core should be the original segment not the FFT result.

Making Echo Cancellation working on real time voice

can it be used in real time voice?

It seems that it takes long time to process data after I tried you demo.
Can it use in real time voice communication? or is it only used in offline voice process.

Problem in duplicating the training process

i am duplicating the training process proposed in your paper, and i'm new to AEC.

when i have

*_doubletalk_lpb/mic.wav
*_doubletalk_with_movement_lpb/mic.wav
*_farend_singletalk_lpb/mic.wav
*_farend_singletalk_with_movement_lpb/mic.wav
*_nearend_singletalk_mic.wav
*_sweep_lpb/mic.wav

which two should be the inputs of the model, and which should be the label. Thanks.

H

Looking forward to more updates about training details.

Thanks for your great job.
I checked your paper today, but still confused about some training details.
It's no doubt this is a wonderful aec job.
I am looking forward to some more updates about training details if you are free.
Best wishes.

The influence of the hyperparameter encoder_size

In your paper "Acoustic echo cancellation with the dual-signal transformation LSTM network", it is mentioned that the size of the learned feature representation is also 512. Is it means the encoder_size is 512? And in your DNS-Challenge paper, the encoder_size is 256. I want to know the reason you changing the encoder size.
encoder_size form 256 to 512, will it influence the model size, number of parameters, objective and subjective metric and execution time?
Thanks a lot!

where aru your training data for this project?

The model based on 'DTLN' project and your paper about 'DTLN-aec' can't be trained as good as your pretrained one.

Hi, Nils, after carefully reading the paper and the code in 'DTLN' project, I modified the model of DTLN to DTLN-aec's.I check the model structure again and again to make sure that it's coinsident with your paper.Then I compose the trainging data as your describe in the paper except for 'Random spectral shaping' which I'm not sure how to implement.The dataset include farend speech and echo from 'synthetic' dataset and 'real' dataset.It also include my synthesized echo data with 'DNS-Challenge 3' speech data.But the model I got performed not so well as the pretraned one.Could you help me to verify the model structure or the trainging dataset composing process?Thanks in advance!

Some questions about concatenate operation in DTLN-aec model?

Hi, breizhn~
I have some questions about concatenate operation. I want to know that whether the features of the microphone and the loop-back signal are concatenated in the channel dimension or the time dimension？
I'm looking forward to your reply!
Good Luck!

Does DTLN-aec also contain the noise suppression?

I want to use DTLN-aec in real time communication. Does DTLN-aec also contain the noise suppression? or It can be combined with other ANS/AGC? the audio processing sequence just like: DTLN-aec->ANS(DTLN like)->AGC?

Best Regards

the SNR loss is inf or nan in the training stage

Thanks for your wonderful work in the Acoustic Echo Cancellation filed.

I'd like to reproduce the DTLN-aec training code with the DTLN repository code. But, the SNR loss is inf or nan in the training stage, when i use a small ammount of AEC-Challenge synthetic data to train

I'm looking forward to your reply. Thanks

about repository construction plan

hi, breizhn~ Would you plan to construct this repository with complete data preparation and model trainning instruction?

speech is totolly removed.

I tested with a wav including clean speech. After running your code, I found the speech is totolly removed. Is it reasonble?

up-sampling the phone data to 16k and then down-sampling to 8k Model effect will be better

Thank you very much for your work. We first downsample the 16k AEC data set to 8k and train the model; then apply it to 8k phone data. The effect is not as good as up-sampling the phone data to 16k and then down-sampling to 8k. , What is the reason for this? Thank you

Keras models needed

Hey! Can you please add the .h5 files for the dtln-aec models as well?

Failed on mobile device

Hi, i had try it run on mobile device, but it's so slowly. Maybe need a lite model for mobile. : ）

About the training target: nearend speech

Thanks for your great job.
I have a problem with the training target. I do not know which I should take as the training target among nearend-speech with rir and noise, nearend-speech with rir and pure nearend-speech. After reading the paper, I did some tests but got terrible training losses when selecting pure nearend-speech as training target. And, I got some good results when when selecting nearend-speech with rir and noise as training target.
I will appreciate any advice.
Looking forward to your reply. @breizhn

Just Questions this time :)

Thanks nils @breizhn with the tflite :) apols for being dumb.
If you ever have the time to answer then please do.

I have been wondering if https://www.tensorflow.org/lite/examples/on_device_training/overview could be used to increase accuracy.
I have been reading the proposed DTLN-aec model architecture with similar effect to my 1st tries with tflite-dtln and just thought I would ask do you have any code examples for training

Training problems in double talk situations

For echo cancellation systems, two input signals are usually used, namely
Input the near-end microphone signal and the far-end microphone signal, and output the near-end voice signal.

When the model trains the voice in the dual-talk situation, it should input the near-end microphone voice and the far-end voice, and output the near-end voice, but the AEC-Challenge data set only shares ***_doubletalk_lpb.wav (far-end voice) and ** *_doubletalk_mic.wav (Near-end microphone voice), because we need the near-end voice as a label, where is the near-end voice?

Loss decrease for a few and sudden explodes to inf

Hi Nils, i encounted a problem while training the DTLN-aec, when i train only on the real data it stable at the loss 0.03.
However when i trained on both real and synthetic data, the loss derease for some batch and explode to inf at a sudden.
i thought it is because the log in the snr. Whether or not it is normal? Or will the loss to be stable at a finite number after several epochs.

Could DTLN-aec process audio frame real-time on mobile?

We have tested DTLN, well done on mobile devices. Great job 👍

MODEL FOR aec

which model do you apply for the DTLN-aec? is it from the DTLN repo?

How to use the model to generate the echo cancelled file.

Hi,

I used your dtln repo to generate bunch of noisy suppressed sound file by simply following this
$ python run_evaluation.py -i in/folder/with/wav -o target/folder/processed/files -m ./pretrained_model/model.h5

I want to use your aec model to generate the echo suppressed files. It doesn't seem to work with
$ python run_aec.py -i /folder/with/input/files -o /target/folder/ -m ./pretrained_models/dtln_aec_512

It looks like the model needs both mic and lpb file to generate the processed file. Am I understand it right? Would it be possible to just generate the enhanced file the same way as the dtln?

Thanks,

How to training DTLN-aec model

Hi, breizhn. thanks for this repository!
I am a beginner in machine learning.
following the readme I have trained DTLN model with 48K audio data and it sames work correctly by running real time processing test with TFlite. I can also run AEC with pretrained DTLN-aec model on 16K audio.

but when I running AEC with DTLN model (whether 16k or 48k) the error shown as below:

/run_aec.py", line 116, in process_file
interpreter_1. set_tensor(input_details_1[2]["index"], lpb_mag)
IndexError: list index out of range

it's caused by the different model params I guess.
I'm not sure if I was missing something.

my params is:

self. fs = 48000
self. batchsize = 22
self. len_samples = 15
self. activation = 'sigmoid'
self. numUnits = 128
self. numLayer = 2
self. blockLen = 512 * 3
self. block_shift = 128 * 3
self. dropout = 0.25
self. lr = 1e-3
self. max_epochs = 3 #just for test
self. encoder_size = 256
self. eps = 1e-7

The question is how can I get a DTLN model for AEC？
Thanks a lot!

Can I get some files for quantization of the model?

Hi,
I am currently using DTLN-aec for real-time acoustic echo cancellation testing.

For me, this model's performance is so lovely, but it needs to be lightweightening for using in real-time process. so, I'm trying to quantization the tflite file.

I want to change float32 to float16 through the dynamic range quantization.

However, In the process of quantization, The tf.lite.TFLiteConverter.from_saved_model function takes a tensorflow(.pb) model as a parameter. so I needs .pb file.

Therefore, can I get a .pb file or quantized-tflite file?

If I succeed in quantization, I can share my code and results with you.

Thank you for reading it. Have a nice day.

Can you open source your crowdsourced test data and results?

Hi
Do you know the NISQA (NON-INTRUSIVE SPEECH QUALITY ASSESSMENT) project? The current author model that is focused on distortions that occur in communication networks, and not focused on speech enhancment. Can you fine-tune this model with your data so that it can cover the front-end signal processing?
Thanks!

breizhn / dtln-aec Goto Github PK

dtln-aec's People

Contributors

Stargazers

Watchers

Forkers

dtln-aec's Issues

Recommend Projects

Recommend Topics

Recommend Org