Comments (3)
In our other work on speech bandwidth extension, we used narrowband log-magnitude spectra as input, predicted the high-frequency log-magnitude spectra, and added them together to obtain the wideband log-magnitude spectra.
Since adding log-magnitude spectra is equivalent to multiplying magnitude spectra, we found that bandwidth extension can be achieved by applying an unbounded mask to the magnitude spectra.
Regarding your question, the high-frequency part of the magnitude spectrum of a speech waveform is a very small decimal close to zero after upsampling.
Therefore, a large-value mask can be used to predict the high-frequency magnitude spectrum.
Here, we also applied power-law compression to narrow the range of this mask, making it easier to predict.
Additionally, in the paper, the models for these three tasks were trained separately.
We also tried training a general model using all the data to handle these three tasks simultaneously.
We found that the performance of this model slightly decreased in the tasks of speech denoising and bandwidth extension, but it improved in the dereverberation task.
This improvement might be due to the inclusion of noisy data, which acts as data augmentation.
from mp-senet.
I have some questions about BWE task.
Currently, I am trying to apply the MP-SENet model to the BWE task. As written in the long version of the paper, I am conducting experiments with the VCTK dataset.
- I changed the lsigmoid() of the mask decoder to prelu(), but the loss becomes nan as soon as the training starts. Leakyrelu() also showed the same phenomenon. So, I am currently using relu for training. Can you provide any advice regarding this issue?
`class MaskDecoder(nn.Module):
def init(self, h, out_channel=1):
super(MaskDecoder, self).init()
self.dense_block = DenseBlock(h, depth=4)
self.mask_conv = nn.Sequential(
nn.ConvTranspose2d(h.dense_channel, h.dense_channel, (1, 3), (1, 2)),
nn.Conv2d(h.dense_channel, out_channel, (1, 1)),
nn.InstanceNorm2d(out_channel, affine=True),
nn.PReLU(out_channel),
nn.Conv2d(out_channel, out_channel, (1, 1))
)
self.lsigmoid = LearnableSigmoid_2d(h.n_fft//2+1, beta=h.beta)
self.prelu = nn.PReLU()
def forward(self, x):
x = self.dense_block(x)
x = self.mask_conv(x)
x = x.permute(0, 3, 2, 1).squeeze(-1)
# # lsigmoid for denoisig, dereverberation
# x = self.lsigmoid(x).permute(0, 2, 1).unsqueeze(1)
# PReLU for Bandwidth Extension
x = self.prelu(x).permute(0, 2, 1).unsqueeze(1)
return x
`
- When conducting experiments, aside from the metric scores, I found that the output samples contain audible artifacts (buzzing-like sound). I am curious if you have encountered the same issue.
from mp-senet.
In our other work on speech bandwidth extension, we used narrowband log-magnitude spectra as input, predicted the high-frequency log-magnitude spectra, and added them together to obtain the wideband log-magnitude spectra. Since adding log-magnitude spectra is equivalent to multiplying magnitude spectra, we found that bandwidth extension can be achieved by applying an unbounded mask to the magnitude spectra.
Regarding your question, the high-frequency part of the magnitude spectrum of a speech waveform is a very small decimal close to zero after upsampling. Therefore, a large-value mask can be used to predict the high-frequency magnitude spectrum. Here, we also applied power-law compression to narrow the range of this mask, making it easier to predict.
Additionally, in the paper, the models for these three tasks were trained separately. We also tried training a general model using all the data to handle these three tasks simultaneously. We found that the performance of this model slightly decreased in the tasks of speech denoising and bandwidth extension, but it improved in the dereverberation task. This improvement might be due to the inclusion of noisy data, which acts as data augmentation.
Great job!
As mentioned above, the "g_best" file in the "/best_ckpt" was trained for denoise? I found that it has no ability to do bandwidth extension.
Will you show your general model for three tasks? I am interested in its PESQ improvement.
from mp-senet.
Related Issues (20)
- 去混响数据集会公开吗? HOT 11
- Dereverberation HOT 1
- 请问有什么降低MetricLoss的好方法吗 HOT 2
- Weights trained on DNS HOT 2
- Performance with PCS HOT 1
- Questions with regards to reproducing training and inference result HOT 2
- NS and BE(SR) in one model design HOT 1
- Complex loss calculated using compressed magnitude HOT 2
- Error dividing by zero if noisy_audio is silence HOT 1
- About Time Loss And STFT Consistency Loss HOT 2
- conformer structure
- 在自己的数据集上训练学不到信号的高频部分,PESQ提升缓慢
- Out of Memory when inferencing with 60secs file HOT 2
- Gradio Demo App on HuggingFace Spaces with ZeroGPU Support HOT 1
- performance scores
- OutOfMemoryError HOT 3
- Use for Music? HOT 1
- 这个和DFN哪个效果好?
- Training on a more diverse dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mp-senet.