Comments (8)
Hi, I read your generator again. In your code, both w1 and w2 need to be at least 3 sec long. Then, you take the first 3 sec from them and add. So the resulting target utterance is fully interfered by the other utterance. Since they have the same volume, the SDR should be nearly 0 dB in this case. Why did you get a median SDR of 1.9 dB?
from voicefilter.
Hi, @weedwind
Thank you for letting me know! Yes, I was aware of that test list, but haven’t tried to measure the actual performance with that.
Considering the followings, I think the experimental result (1.5dB, which turned out to be far worser than Google’s) is not really wrong:
- my d-vector system shows larger EER than Google’s (due to lack of training time + data)
- I didn’t use the correct loss function: see #14 (this might be the main cause of the blurry spectrogram mask in high-frequency, according to my personal talk with Quan Wang at InterSpeech 2019)
Shall we leave this issue open, since this is somewhat critical issue? Thanks a lot!
from voicefilter.
TL; DR: (to the title of this issue)
No, I haven’t tried yet but I don’t think I can.
from voicefilter.
Hi, @seungwonpark
Thank you for your reply. I mean the SDR before applying the voice filter, not after. In Table 4 of their paper, this is the mean SDR in the first row, which is 10.1 dB. But I only got 1.5 dB. I used the same bss_eval python function as you did, just feed the function with the clean target utterance and the mixed utterance to compute the SDR before applying the voice filter. Do you have a clue why this SDR is so low?
from voicefilter.
Oh, looks like I had misunderstood your question. Sorry for that.
10.1dB is relatively high SDR for the mixed audios to have. The authors of VoiceFilter mentioned that the SDR before VoiceFilter got high due to silent part of utterances being sampled and mixed. (Note that fixed length of audio segments are sampled here)
But I’m not sure why you’re not getting 10.1dB. Perhaps we should review the preprocessing part and the SDR calculation code in bss_eval.
from voicefilter.
I noticed that your code used the first 3 sec and threw away the rest. I did not use fixed length. I used the entire length of the target clean signal, and truncate or zero pad the interference signal to the same length. Then I computed the SDR. Did you ever compute the mean SDR for your test set?
from voicefilter.
Did you ever compute the mean SDR for your test set?
Not yet.
Why did you get a median SDR of 1.9 dB?
Actually the value 1.9dB was not calculated from all datasets -- it was from a single dataset. I should fix the table in README accordingly.
from voicefilter.
@weedwind I'm getting the same results as you (1.5dB SDR over the google LibriSpeech test list), have you managed to solve this problem?
from voicefilter.
Related Issues (20)
- Question about normalize-resample.sh HOT 4
- Question when preprocessing wav files HOT 1
- Question when training VoiceFilter HOT 1
- Question about utils/evaluation.py HOT 2
- Need to try power-law compression loss HOT 2
- embedder.pt with new dataset HOT 4
- hop_length and win_length
- the model implementation comprehension HOT 1
- inference
- Can I get the pretrained model please! I so dearly need it for my project, here's my email just in case, [email protected] HOT 1
- Training setting problem HOT 6
- Is the VoiceFilter model checkpoint available to be used directly? HOT 1
- question about ffmpeg-normalize
- Question about wav2spec function in utils/audo.py
- Cannot reproduce reported SDR & retrain the speaker embedding
- how to work for multi noise
- What is the term spk refers to in the below code ? line 127
- how to create file embedder HOT 5
- Question about start point of SDR HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from voicefilter.