Giter Club home page Giter Club logo

dtln-aec's People

Contributors

breizhn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dtln-aec's Issues

can it be used in real time voice?

It seems that it takes long time to process data after I tried you demo.
Can it use in real time voice communication? or is it only used in offline voice process.

Problem in duplicating the training process

i am duplicating the training process proposed in your paper, and i'm new to AEC.

when i have

  • *_doubletalk_lpb/mic.wav
  • *_doubletalk_with_movement_lpb/mic.wav
  • *_farend_singletalk_lpb/mic.wav
  • *_farend_singletalk_with_movement_lpb/mic.wav
  • *_nearend_singletalk_mic.wav
  • *_sweep_lpb/mic.wav

which two should be the inputs of the model, and which should be the label. Thanks.

H

H

Looking forward to more updates about training details.

Thanks for your great job.
I checked your paper today, but still confused about some training details.
It's no doubt this is a wonderful aec job.
I am looking forward to some more updates about training details if you are free.
Best wishes.

The influence of the hyperparameter encoder_size

In your paper "Acoustic echo cancellation with the dual-signal transformation LSTM network", it is mentioned that the size of the learned feature representation is also 512. Is it means the encoder_size is 512? And in your DNS-Challenge paper, the encoder_size is 256. I want to know the reason you changing the encoder size.
encoder_size form 256 to 512, will it influence the model size, number of parameters, objective and subjective metric and execution time?
Thanks a lot!

The model based on 'DTLN' project and your paper about 'DTLN-aec' can't be trained as good as your pretrained one.

Hi, Nils, after carefully reading the paper and the code in 'DTLN' project, I modified the model of DTLN to DTLN-aec's.I check the model structure again and again to make sure that it's coinsident with your paper.Then I compose the trainging data as your describe in the paper except for 'Random spectral shaping' which I'm not sure how to implement.The dataset include farend speech and echo from 'synthetic' dataset and 'real' dataset.It also include my synthesized echo data with 'DNS-Challenge 3' speech data.But the model I got performed not so well as the pretraned one.Could you help me to verify the model structure or the trainging dataset composing process?Thanks in advance!

Some questions about concatenate operation in DTLN-aec model?

Hi, breizhn~
I have some questions about concatenate operation. I want to know that whether the features of the microphone and the loop-back signal are concatenated in the channel dimension or the time dimension?
I'm looking forward to your reply!
Good Luck!

Does DTLN-aec also contain the noise suppression?

I want to use DTLN-aec in real time communication. Does DTLN-aec also contain the noise suppression? or It can be combined with other ANS/AGC? the audio processing sequence just like: DTLN-aec->ANS(DTLN like)->AGC?

Best Regards

the SNR loss is inf or nan in the training stage

Thanks for your wonderful work in the Acoustic Echo Cancellation filed.

I'd like to reproduce the DTLN-aec training code with the DTLN repository code. But, the SNR loss is inf or nan in the training stage, when i use a small ammount of AEC-Challenge synthetic data to train

I'm looking forward to your reply. Thanks

speech is totolly removed.

I tested with a wav including clean speech. After running your code, I found the speech is totolly removed. Is it reasonble?

Failed on mobile device

Hi, i had try it run on mobile device, but it's so slowly. Maybe need a lite model for mobile. : )

About the training target: nearend speech

Thanks for your great job.
I have a problem with the training target. I do not know which I should take as the training target among nearend-speech with rir and noise, nearend-speech with rir and pure nearend-speech. After reading the paper, I did some tests but got terrible training losses when selecting pure nearend-speech as training target. And, I got some good results when when selecting nearend-speech with rir and noise as training target.
I will appreciate any advice.
Looking forward to your reply. @breizhn

Training problems in double talk situations

For echo cancellation systems, two input signals are usually used, namely
Input the near-end microphone signal and the far-end microphone signal, and output the near-end voice signal.

When the model trains the voice in the dual-talk situation, it should input the near-end microphone voice and the far-end voice, and output the near-end voice, but the AEC-Challenge data set only shares ***_doubletalk_lpb.wav (far-end voice) and ** *_doubletalk_mic.wav (Near-end microphone voice), because we need the near-end voice as a label, where is the near-end voice?

Loss decrease for a few and sudden explodes to inf

Hi Nils, i encounted a problem while training the DTLN-aec, when i train only on the real data it stable at the loss 0.03.
However when i trained on both real and synthetic data, the loss derease for some batch and explode to inf at a sudden.
i thought it is because the log in the snr. Whether or not it is normal? Or will the loss to be stable at a finite number after several epochs.

MODEL FOR aec

which model do you apply for the DTLN-aec? is it from the DTLN repo?

How to use the model to generate the echo cancelled file.

Hi,

I used your dtln repo to generate bunch of noisy suppressed sound file by simply following this
$ python run_evaluation.py -i in/folder/with/wav -o target/folder/processed/files -m ./pretrained_model/model.h5

I want to use your aec model to generate the echo suppressed files. It doesn't seem to work with
$ python run_aec.py -i /folder/with/input/files -o /target/folder/ -m ./pretrained_models/dtln_aec_512

It looks like the model needs both mic and lpb file to generate the processed file. Am I understand it right? Would it be possible to just generate the enhanced file the same way as the dtln?

Thanks,

How to training DTLN-aec model

Hi, breizhn. thanks for this repository!
I am a beginner in machine learning.
following the readme I have trained DTLN model with 48K audio data and it sames work correctly by running real time processing test with TFlite. I can also run AEC with pretrained DTLN-aec model on 16K audio.

but when I running AEC with DTLN model (whether 16k or 48k) the error shown as below:

/run_aec.py", line 116, in process_file
interpreter_1. set_tensor(input_details_1[2]["index"], lpb_mag)
IndexError: list index out of range

it's caused by the different model params I guess.
I'm not sure if I was missing something.

my params is:

self. fs = 48000
self. batchsize = 22
self. len_samples = 15
self. activation = 'sigmoid'
self. numUnits = 128
self. numLayer = 2
self. blockLen = 512 * 3
self. block_shift = 128 * 3
self. dropout = 0.25
self. lr = 1e-3
self. max_epochs = 3 #just for test
self. encoder_size = 256
self. eps = 1e-7

The question is how can I get a DTLN model for AEC?
Thanks a lot!

Can I get some files for quantization of the model?

Hi,
I am currently using DTLN-aec for real-time acoustic echo cancellation testing.

For me, this model's performance is so lovely, but it needs to be lightweightening for using in real-time process. so, I'm trying to quantization the tflite file.

I want to change float32 to float16 through the dynamic range quantization.

However, In the process of quantization, The tf.lite.TFLiteConverter.from_saved_model function takes a tensorflow(.pb) model as a parameter. so I needs .pb file.

Therefore, can I get a .pb file or quantized-tflite file?

If I succeed in quantization, I can share my code and results with you.

Thank you for reading it. Have a nice day.

Can you open source your crowdsourced test data and results?

Hi
Do you know the NISQA (NON-INTRUSIVE SPEECH QUALITY ASSESSMENT) project? The current author model that is focused on distortions that occur in communication networks, and not focused on speech enhancment. Can you fine-tune this model with your data so that it can cover the front-end signal processing?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.