Giter Club home page Giter Club logo

Comments (25)

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

By the way, if I point base_dir/ to a non-existent folder, it will cause another error
擷取

from deepfilternet.

Rikorose avatar Rikorose commented on August 29, 2024

Encounter some error as shown in the figure below,

This is related to the dataset length. I will try to provide a better error message.

In the command python df/train.py dataset.cfg ~/wav_folder/ ./base_dir/,
-Is data_dir a wav folder or an hdf5 folder? (I think is hdf5 folder)
-Can base_dir/ not exist? (But we need to give config.ini, so here I enter pertrained_model/ and delete .ckpt)

It should be a folder containing the prepared hdf5 files. At some point it worked without an initialized model base dir, but I haven't tested in a while. You could just copy the config from the pretrained model.

I found that the log says dataloader len:0, is this a problem?

Yes, you didn't setup the dataset correctly.

I remove the all 'df.' of each file while import (ex. from df.config import ... -> from config import ...), otherwise it will cause an import error.

Other options would be to set the python path or install df locally (e.g. as editable wheel).

By the way, if I point base_dir/ to a non-existent folder, it will cause another error

A config file was found but is not complete. You are running on an old version. This error is fixed in main.

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi Rikorose,
thanks for your reply.

I found that the log says dataloader len:0, is this a problem?

Yes, you didn't setup the dataset correctly.

Does "setup the dataset incorrectly" and "length of dataset" refer to the same thing?
So I think increasing dataset can solve this problem for me.
BTW, I guess the error is caused by the seed setting at 42?

I will update the version later.

Thanks,

from deepfilternet.

Rikorose avatar Rikorose commented on August 29, 2024

Since the data loader length is 0, your data set is not correctly prepared or smaller than the batch size. The seeding should not have any effect on errors.

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi Rikorose,

I follow your suggestion and increase the dataset length.
And it works now!

By the way, I really like the way you display the log.
I will keep to implement and try this masterpiece.

Thanks,

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi Rikorose,
I am tracing code for mixed signal, but I got lost at some point.

loader.iter_epoch(...) in line 231, df/train.py

self.loader = _FdDataLoader(...) in line 99, pyDF-data/libdfdata/torch_dataloader.py

impl _FdDataLoader{
fn new(...){...}
fn get_batch<'py>(...) {
Some(batch)=>{...}
None =>{...}
} in line 263, pyDF-data/src/lib.rs

I'm not sure where to found 'batch', and where to mix the speech and noise.
On the other hand, I found a lot of 'seed' in your code. What does this variable mean?

Now I am using your code to run the train.py with DNS-challenge files(only speech and noise). The log says that it takes 4 mins for 100 iterations, is this speed correct?

Thanks,

from deepfilternet.

Rikorose avatar Rikorose commented on August 29, 2024

The data loading stuff is implemented in Rust.
E.g. the mixing is done here: https://github.com/Rikorose/DeepFilterNet/blob/main/libDF/src/dataset.rs#L1078

Seed is way to control the randomness. This guarantees that I get the exact same noisy mixtures in a specific epoch, or that the network is initialized in the same way. This often done for reproducibility.

Speech could be correct, this highly depends on the hardware.

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi Rikorose,
Thanks for your reply.

Now I increase the batch size, workers and prefetch,but I didn't see a speed up obviously.

I check the workers is 24 via multiprocessing.cpu_count().
Set batch size to 64, prefetch from 4 to 10,and so on for config.ini

Now it takes 7 mins for 100 iterations, but the total number of iterations has become half of original. (I think it related with batch size.)

Thanks,

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi Rikorose,
I have encountered this issue twice when I train the model.

2021-12-23 09:48:23 | ERROR | DF | An error has been caught in function '', process 'MainProcess' (3266460), thread 'MainThread' (140391931664192):
Traceback (most recent call last):

File "df/train.py", line 428, in
main()
└ <function main at 0x7fae84e71430>

File "df/train.py", line 150, in main
train_loss = run_epoch(
└ <function run_epoch at 0x7fae84e770d0>

File "df/train.py", line 281, in run_epoch
raise e

File "df/train.py", line 274, in run_epoch
clip_grad_norm_(model.parameters(), 1.0, error_if_nonfinite=True)
│ │ └ <function Module.parameters at 0x7faf84116dc0>
│ └ RecursiveScriptModule(
│ original_name=DfNet
│ (enc): RecursiveScriptModule(
│ original_name=Encoder
│ (erb_conv0): Recur...
└ <function clip_grad_norm_ at 0x7fae84e435e0>

File "/home/myhsueh/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/df/utils.py", line 211, in clip_grad_norm_
raise RuntimeError(

RuntimeError: The total norm of order 2.0 for gradients from parameters is non-finite, so it cannot be clipped. To disable this error and scale the gradients by the non-finite norm anyway, set error_if_nonfinite=False

Can I modify 'error_if_nonfinite' from False to True as instructed?

Best regards,

from deepfilternet.

Rikorose avatar Rikorose commented on August 29, 2024

Hi there, yes you could try error_if_nonfinite not sure though if this works.
I got these issues as well from time to time. I just restarted the training when this happened.

I guess in some cases the gradient in the backward pass becomes NaN. Could be e.g. due to atan2. A fix would be appreciated.

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi,
I am trying to retrain and I have not encountered Nan at the moment.
By the way, I am tracing the mix_audio_signal function at the same time,
I found that we apply gain g for clean_out = & clean * g,
if let Some(atten_db) = atten_db( ...)
noise *=k

I am a little confused,

  1. What does the atten_db function do here? Did c and n change the values of clean_out and noise?
  2. The paper says that the clean speech is mixed with up to 5 noise signals under a signal-to-noise ratio of {-5,0,5,10,20,40}. Is the signal-to-noise ratio of each noise different?
  3. In the reference parameters, we have given snr_db. But I think g and k are not suitable for this snr_db, do they?

Thanks,

from deepfilternet.

Rikorose avatar Rikorose commented on August 29, 2024
  1. atten_db (e.g. 10dB) limits the algorithm attenuation by providing a noisy training target that has 10dB less noise than the noisy input. You network will learn to not remove all noise, but only 10dB.
  2. The SNR is computed over all noises. The different noises may have different energies though.

I don't understand question 3

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi,
The third question I want to ask is that we gave snr_db when using mix_audio_signal

fn mix_audio_signal(
    clean: Array2<f32>,
    clean_rev: Option<Array2<f32>>,
    mut noise: Array2<f32>,
    snr_db: f32,
    gain_db: f32,
    atten_db: Option<f32>,
    noise_resample: Option<LpParam>,
) -> Result<(Signal, Signal, Signal)>

And compute the k in
let k = mix_f(clean.view(), noise.view(), snr_db);
I don't know why clean.view() is used instead of clean_out.view().
I think here we have to calculate the "k" of the noise gain.
If we want to satisfy to snr_db, does clean.view() need to be changed to clean_out.view()?

Thanks,

from deepfilternet.

Rikorose avatar Rikorose commented on August 29, 2024

Good point, might be a bug. I will have a look. The mean expectation value of the resulting SNR does not change since gain can be one of 6,0,-6.

Indeed, however, this only affects models with an attenuation limit. By default no attenuation limit is applied. Fixed in 7f2120b.

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi,
Sorry to bother you again,
SORRY, I have not fully understood these code.
But I want to check,

The mean expectation value of the resulting SNR does not change since gain can be one of 6,0,-6.

Are the SNR and gain(for speech) independent ?

  1. Look from fn mix_audio_signal,
    (i.) Because I think your process is to set an SNR.
    (ii.). Calculate k (for noise gain) by original speech and original noise.
    (iii.) The speech gain 1 and noise gain k satisfy SNR.
    (iv.) Then choose a speech gain from {-6,0,6} dB
    (v.) mixture is equal to clean_mix + noise (equal to clean * g + noise * k)
    => if speech gain is 6dB, the SNR of mixture becomes (SNR+6) dB
    => if speech gain is -6dB, the SNR of mixture becomes (SNR-6) dB

  2. In my opinion,
    (i.) Set an SNR and choose a speech gain from {-6,0,6} dB.
    (ii.) Calculate the noise gain k from "speech with gain" and original noise.
    (iii.) The speech gain g and noise gain k satisfy SNR.
    (iv.) mixture equals to clean_mix + noise (equal to clean * g + noise * k)
    => No matter what the SNR is, the SNR of mixture is SNR dB

These two way get the different k and SNR.

Does you want to achieve the first result (SNR=SNR+gain_dB)?
Thanks,

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi,
I read the paper of "2.5 Data Preprocessing" again.

In paper,

" We mix a clean speech signal with up to 5 noise signals at SNR of {-5,0,5,10,20,40}."
" To further increase variability, we augmentation speech as well as noise signals with .... , random gains of {-6,0,6} dB."

So I think your purpose is same as increase SNR by random gains (the way 1 above)?

from deepfilternet.

Rikorose avatar Rikorose commented on August 29, 2024

Are the SNR and gain(for speech) independent ?

They should be independent. Gain should only modify the overall energy (i.e. loudness). I will have a look and maybe add some tests.

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi,
I am tracing the PyTorch code to construct the model, and I have some confused some code might not match the figure.
截圖 2021-12-30 下午1 15 12

In df/deepfilternet.py

class Encoder(nn.Module):
    def __init__(self):
        ...

    def forward(...):
        ...
        e3 = self.erb_conv3(e2)  # [B, C*4, T, F/4]
        c0 = self.df_conv0(feat_spec)  # [B, C, T, Fc]
        c1 = self.df_conv1(c0)  # [B, C*2, T, Fc]
        cemb = c1.permute(2, 0, 1, 3).reshape(t, b, -1)  # [T, B, C * Fc/4]
        cemb = self.df_fc_emb(cemb)  # [T, B, C * F/4]
        emb = e3.permute(2, 0, 1, 3).reshape(t, b, -1)  # [T, B, C * F/4]
        emb = emb + cemb
        emb, _ = self.emb_gru(emb)
        ...

The e0,e1,e2,e3,c0,c1 look like that I marked the part in red on figure.
The cemb is the output of GLinear in DF Net. (Is this correct?)

I am not sure why here emb = emb + cemb, is there a GLinear layer before self.emb_gru (GGRU in Encoder)?

Thanks,

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi Rikorose,

I am trying to figure out the model architecture, and I have traced the code in modules.py
I think the Fig.2 in paper maybe add some red connected-line as below, which satisfies to your code.
dfn

Other questions, I check the size for input features,
noisy: [batch,1,300,481], feat_erb: [batch,1,300,32], feat_spec: [batch,1,300,96]

  1. I think '300' is time axis, and 'lookahead' defaults as 2. (Is this two seconds?)
    2. [960 samples as a frame with a size of 480 hop size. After fft, we will get 99(frames)*480(complex) spectrum, and we can get 300 frames with lookahead as 2 seconds]?
  2. Is the noisy is spectrum? what is different between 'noisy' and 'feat_spec'?

Thanks for your reply. Have a nice day!

from deepfilternet.

Rikorose avatar Rikorose commented on August 29, 2024

I am not sure why here emb = emb + cemb, is there a GLinear layer before self.emb_gru (GGRU in Encoder)?

Here:

cemb = self.df_fc_emb(cemb)  # [T, B, C * F/4]

Fully connected is not a perfect name since it is grouped. GLinear also contains a skipt connection (i.e. emb = emb+cemb).

I think '300' is time axis, and 'lookahead' defaults as 2. (Is this two seconds?)

2 corresponds to lookahead in time steps which depends on the FFT hop size, e.g. 2*10ms=20ms.

Is the noisy is spectrum? what is different between 'noisy' and 'feat_spec'?

feat_spec is basically the noisy normalized spectrum. This is done in the rust source code.

I think the Fig.2 in paper maybe add some red connected-line as below, which satisfies to your code.

No I don't think this satisfies the code. Could you make an argument for that?

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi,
Here is my visualization of the PyTorch code, with black background blocks following the code divisions.
I think there are some differences, so I ask if I need to add the red line.
圖片2

Another point, I have a files including the broken glass noise, I follow the #38 to revise the DF_gamma to 0.3 and this approach make some help. what is this parameter influence?

If I want to do real time inference, need I queue a buffer and update the frame, inference the model every time (but only get the last frame output)? Because I think the enhance.py is worked offline.

Thanks again.

from deepfilternet.

Rikorose avatar Rikorose commented on August 29, 2024

Ah I see what you mean. True, there is another interconnection. I will think about how to include this in the figure while still keeping the figure simple and clear.
Also the linear layer for alpha is not shown for simplicity.

Another point, I have a files including the broken glass noise, I follow the #38 to revise the DF_gamma to 0.3 and this approach make some help. what is this parameter influence?

It's a compression factor similar to logarithmic compression to model loudness perception. E.g. PercepNet has a reference to why 0.3 was chosen.

Wrt. real-time: You need to setup a loop and call the model on each time step. Here is a python example of a previous project.
Also take a look at this PR #13 (e.g. bin/df-tract.rs line 277 following). Here, the whole processing loop is implemented in Rust. Note that there is a bug somewhere in the DF component (i.e. produces worse results than in the python implementation).
Overall the buffer handling might be the trickiest. For the most part, I used tract for this.

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi,
Thank you for your prompt response.
Ok, I will try to do real-time and maybe transfer to Keras.

Best Regards,

from deepfilternet.

stonelazy avatar stonelazy commented on August 29, 2024

@aaronhsueh0506 Your discussion here was educating, thanks.

Ok, I will try to do real-time and maybe transfer to Keras.

Jst wanted to know whether you have done the real-time and if you were able to reproduce the offline results. If yes, any plans of making it public ?

from deepfilternet.

aaronhsueh0506 avatar aaronhsueh0506 commented on August 29, 2024

Hi,

You can save GRU state and give it for next loop, so you can reduce your frames.

from deepfilternet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.