Comments (12)
So the window size is 32 ms, where the windows overlap by 16 ms. So there is a 32 ms window every 16 ms. The .mat file has the a priori SNR values. 10*log10( ) would give the a priori SNR values in dB. Averaging the 257 point vectors would give the average a priori SNR in dB for each of the frames. A value of 30 dB would indicate that the frame would be largely dominated by speech. A value of -10 dB would indicate that the frame is largely dominated by noise.
Hope this helps.
from deepxi.
from deepxi.
Thanks, will look into that direction. But we are also interested in "eliminating" the noise and have tried your tool with some success. We are considering to transfer/retrain it with our own data as we are already using DeepSpeech and know what chunks are of good quality. But first, we would like to see how good the current model is. Haven't looked too deep into your code and were therefore wondering what to do with the mt files.
from deepxi.
Did I reply to this?
from deepxi.
not yet :-) would be great, if you think a simple std variation of the included 257-item vectors yields sth useful
from deepxi.
You could simply use deepxi.py --infer 1 --out_type y --gain srwf
to save the enhanced speech .wav files, and then give them to DeepSpeech. This would be very easy to do.
A more complex alternative would be to include the enhanced speech magnitude spectrum produced by Deep Xi as part of the front-end of Deep Speech. Deep Speech utilises MFCCs as features, which are computed from the magnitude spectrum of the given wav file.
from deepxi.
Thanks, we already tried that with mixed results. We would therefore try to find out what type of background noise your algo detects better. Therefore it would great to have some sort of measurement that shows how noisy your algo rates a certain chunk. Do you see a way to do that?
from deepxi.
With the audio that you are using, do you have a reference version? i.e. and ideal version, or a version without noise?
from deepxi.
No, we have around 100 000 chunks and around a third are manually labelled as noisy with heavy or light noise labels. It would be great to see, whether your algo would label them the same way or where it differs. We could then label them automatically or clean them before feeding them to DeepSpeech to get better results
from deepxi.
you could use the a priori SNR in dB averaged over the frame to understand how much noise is in each time-region of a chunk, or averaged over the chunk if you just want to know the overall SNR of the chunk.
The overall SNR of the chunk could then be used as the label
from deepxi.
Great, so if I understand you correctly, I could average the vector output in the mat files as each 257-element vector represents a 16 ms window. And the mat-values are the normalized db values. Is there any indication of what values are noisy or clean?
from deepxi.
Perfect, thanks a lot mate and happy holidays
from deepxi.
Related Issues (20)
- Multi gpu trainin HOT 4
- Understanding the loss implementation HOT 9
- can i train this model without gpu? HOT 1
- How to run version resnet-1.0n? HOT 1
- Errors during training HOT 2
- What is the major consideration when choosing Hamming window with no periodic? HOT 11
- Clarity on file format reqruired in '/set' folder HOT 1
- no deepxi.network.attention.py file thus no MHANet class HOT 2
- Running Inference/Testing on Multiple GPUs HOT 7
- Training on other noise dataset gives resuts worse than unprocessed noisy HOT 5
- Denoise Live Microphone Feed HOT 1
- I can't find the implementation of addnoise function from deep_xi_test_set.m HOT 4
- Some questions about the data and the training process HOT 5
- Error loading pretrain model during inference HOT 3
- i can not find the freesound packs. does it mean Sound Ids HOT 2
- Some questions HOT 2
- Additional questions HOT 7
- mhanet loss results
- Can the MHANet run in real time HOT 4
- ValueError: No sample.npz file exists. (Inference)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepxi.