breizhn / dtln-aec Goto Github PK
View Code? Open in Web Editor NEWThis Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.
License: MIT License
This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.
License: MIT License
It seems that it takes long time to process data after I tried you demo.
Can it use in real time voice communication? or is it only used in offline voice process.
i am duplicating the training process proposed in your paper, and i'm new to AEC.
when i have
which two should be the inputs of the model, and which should be the label. Thanks.
H
Thanks for your great job.
I checked your paper today, but still confused about some training details.
It's no doubt this is a wonderful aec job.
I am looking forward to some more updates about training details if you are free.
Best wishes.
In your paper "Acoustic echo cancellation with the dual-signal transformation LSTM network", it is mentioned that the size of the learned feature representation is also 512. Is it means the encoder_size is 512? And in your DNS-Challenge paper, the encoder_size is 256. I want to know the reason you changing the encoder size.
encoder_size form 256 to 512, will it influence the model size, number of parameters, objective and subjective metric and execution time?
Thanks a lot!
Hi, Nils, after carefully reading the paper and the code in 'DTLN' project, I modified the model of DTLN to DTLN-aec's.I check the model structure again and again to make sure that it's coinsident with your paper.Then I compose the trainging data as your describe in the paper except for 'Random spectral shaping' which I'm not sure how to implement.The dataset include farend speech and echo from 'synthetic' dataset and 'real' dataset.It also include my synthesized echo data with 'DNS-Challenge 3' speech data.But the model I got performed not so well as the pretraned one.Could you help me to verify the model structure or the trainging dataset composing process?Thanks in advance!
Hi, breizhn~
I have some questions about concatenate operation. I want to know that whether the features of the microphone and the loop-back signal are concatenated in the channel dimension or the time dimension?
I'm looking forward to your reply!
Good Luck!
I want to use DTLN-aec in real time communication. Does DTLN-aec also contain the noise suppression? or It can be combined with other ANS/AGC? the audio processing sequence just like: DTLN-aec->ANS(DTLN like)->AGC?
Best Regards
Thanks for your wonderful work in the Acoustic Echo Cancellation filed.
I'd like to reproduce the DTLN-aec training code with the DTLN repository code. But, the SNR loss is inf or nan in the training stage, when i use a small ammount of AEC-Challenge synthetic data to train
I'm looking forward to your reply. Thanks
hi, breizhn~ Would you plan to construct this repository with complete data preparation and model trainning instruction?
I tested with a wav including clean speech. After running your code, I found the speech is totolly removed. Is it reasonble?
Thank you very much for your work. We first downsample the 16k AEC data set to 8k and train the model; then apply it to 8k phone data. The effect is not as good as up-sampling the phone data to 16k and then down-sampling to 8k. , What is the reason for this? Thank you
Hey! Can you please add the .h5 files for the dtln-aec models as well?
Hi, i had try it run on mobile device, but it's so slowly. Maybe need a lite model for mobile. : )
Thanks for your great job.
I have a problem with the training target. I do not know which I should take as the training target among nearend-speech with rir and noise, nearend-speech with rir and pure nearend-speech. After reading the paper, I did some tests but got terrible training losses when selecting pure nearend-speech as training target. And, I got some good results when when selecting nearend-speech with rir and noise as training target.
I will appreciate any advice.
Looking forward to your reply. @breizhn
Thanks nils @breizhn with the tflite :) apols for being dumb.
If you ever have the time to answer then please do.
I have been wondering if https://www.tensorflow.org/lite/examples/on_device_training/overview could be used to increase accuracy.
I have been reading the proposed DTLN-aec model architecture with similar effect to my 1st tries with tflite-dtln and just thought I would ask do you have any code examples for training
For echo cancellation systems, two input signals are usually used, namely
Input the near-end microphone signal and the far-end microphone signal, and output the near-end voice signal.
When the model trains the voice in the dual-talk situation, it should input the near-end microphone voice and the far-end voice, and output the near-end voice, but the AEC-Challenge data set only shares ***_doubletalk_lpb.wav (far-end voice) and ** *_doubletalk_mic.wav (Near-end microphone voice), because we need the near-end voice as a label, where is the near-end voice?
Hi Nils, i encounted a problem while training the DTLN-aec, when i train only on the real data it stable at the loss 0.03.
However when i trained on both real and synthetic data, the loss derease for some batch and explode to inf at a sudden.
i thought it is because the log
in the snr. Whether or not it is normal? Or will the loss to be stable at a finite number after several epochs.
We have tested DTLN, well done on mobile devices. Great job 👍
which model do you apply for the DTLN-aec? is it from the DTLN repo?
Hi,
I used your dtln repo to generate bunch of noisy suppressed sound file by simply following this
$ python run_evaluation.py -i in/folder/with/wav -o target/folder/processed/files -m ./pretrained_model/model.h5
I want to use your aec model to generate the echo suppressed files. It doesn't seem to work with
$ python run_aec.py -i /folder/with/input/files -o /target/folder/ -m ./pretrained_models/dtln_aec_512
It looks like the model needs both mic
and lpb
file to generate the processed file. Am I understand it right? Would it be possible to just generate the enhanced file the same way as the dtln
?
Thanks,
Hi, breizhn. thanks for this repository!
I am a beginner in machine learning.
following the readme I have trained DTLN model with 48K audio data and it sames work correctly by running real time processing test with TFlite. I can also run AEC with pretrained DTLN-aec model on 16K audio.
but when I running AEC with DTLN model (whether 16k or 48k) the error shown as below:
/run_aec.py", line 116, in process_file
interpreter_1. set_tensor(input_details_1[2]["index"], lpb_mag)
IndexError: list index out of range
it's caused by the different model params I guess.
I'm not sure if I was missing something.
my params is:
self. fs = 48000
self. batchsize = 22
self. len_samples = 15
self. activation = 'sigmoid'
self. numUnits = 128
self. numLayer = 2
self. blockLen = 512 * 3
self. block_shift = 128 * 3
self. dropout = 0.25
self. lr = 1e-3
self. max_epochs = 3 #just for test
self. encoder_size = 256
self. eps = 1e-7
The question is how can I get a DTLN model for AEC?
Thanks a lot!
Hi,
I am currently using DTLN-aec
for real-time acoustic echo cancellation testing.
For me, this model's performance is so lovely, but it needs to be lightweightening for using in real-time process. so, I'm trying to quantization the tflite file.
I want to change float32
to float16
through the dynamic range quantization
.
However, In the process of quantization, The tf.lite.TFLiteConverter.from_saved_model
function takes a tensorflow(.pb
) model as a parameter. so I needs .pb
file.
Therefore, can I get a .pb
file or quantized-tflite
file?
If I succeed in quantization, I can share my code and results with you.
Thank you for reading it. Have a nice day.
Hi
Do you know the NISQA (NON-INTRUSIVE SPEECH QUALITY ASSESSMENT) project? The current author model that is focused on distortions that occur in communication networks, and not focused on speech enhancment. Can you fine-tune this model with your data so that it can cover the front-end signal processing?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.