I have a question regarding the BWE. My apologies, if my question doesn't make sense.<

Question on Bandwidth extension task formulation about mp-senet HOT 3 OPEN

HimaJyothi17 commented on August 26, 2024

Question on Bandwidth extension task formulation

from mp-senet.

Comments (3)

yxlu-0102 commented on August 26, 2024 1

In our other work on speech bandwidth extension, we used narrowband log-magnitude spectra as input, predicted the high-frequency log-magnitude spectra, and added them together to obtain the wideband log-magnitude spectra.
Since adding log-magnitude spectra is equivalent to multiplying magnitude spectra, we found that bandwidth extension can be achieved by applying an unbounded mask to the magnitude spectra.

Regarding your question, the high-frequency part of the magnitude spectrum of a speech waveform is a very small decimal close to zero after upsampling.
Therefore, a large-value mask can be used to predict the high-frequency magnitude spectrum.
Here, we also applied power-law compression to narrow the range of this mask, making it easier to predict.

Additionally, in the paper, the models for these three tasks were trained separately.
We also tried training a general model using all the data to handle these three tasks simultaneously.
We found that the performance of this model slightly decreased in the tasks of speech denoising and bandwidth extension, but it improved in the dereverberation task.
This improvement might be due to the inclusion of noisy data, which acts as data augmentation.

from mp-senet.

JangyeonKim commented on August 26, 2024

I have some questions about BWE task.

Currently, I am trying to apply the MP-SENet model to the BWE task. As written in the long version of the paper, I am conducting experiments with the VCTK dataset.

I changed the lsigmoid() of the mask decoder to prelu(), but the loss becomes nan as soon as the training starts. Leakyrelu() also showed the same phenomenon. So, I am currently using relu for training. Can you provide any advice regarding this issue?

`class MaskDecoder(nn.Module):
def init(self, h, out_channel=1):
super(MaskDecoder, self).init()
self.dense_block = DenseBlock(h, depth=4)
self.mask_conv = nn.Sequential(
nn.ConvTranspose2d(h.dense_channel, h.dense_channel, (1, 3), (1, 2)),
nn.Conv2d(h.dense_channel, out_channel, (1, 1)),
nn.InstanceNorm2d(out_channel, affine=True),
nn.PReLU(out_channel),
nn.Conv2d(out_channel, out_channel, (1, 1))
)
self.lsigmoid = LearnableSigmoid_2d(h.n_fft//2+1, beta=h.beta)
self.prelu = nn.PReLU()

def forward(self, x):
    x = self.dense_block(x)
    x = self.mask_conv(x)
    x = x.permute(0, 3, 2, 1).squeeze(-1)
    
    # # lsigmoid for denoisig, dereverberation
    # x = self.lsigmoid(x).permute(0, 2, 1).unsqueeze(1)
    
    # PReLU for Bandwidth Extension
    x = self.prelu(x).permute(0, 2, 1).unsqueeze(1)
    
    return x
    `

When conducting experiments, aside from the metric scores, I found that the output samples contain audible artifacts (buzzing-like sound). I am curious if you have encountered the same issue.

from mp-senet.

jeffery-work commented on August 26, 2024

In our other work on speech bandwidth extension, we used narrowband log-magnitude spectra as input, predicted the high-frequency log-magnitude spectra, and added them together to obtain the wideband log-magnitude spectra. Since adding log-magnitude spectra is equivalent to multiplying magnitude spectra, we found that bandwidth extension can be achieved by applying an unbounded mask to the magnitude spectra.

Regarding your question, the high-frequency part of the magnitude spectrum of a speech waveform is a very small decimal close to zero after upsampling. Therefore, a large-value mask can be used to predict the high-frequency magnitude spectrum. Here, we also applied power-law compression to narrow the range of this mask, making it easier to predict.

Additionally, in the paper, the models for these three tasks were trained separately. We also tried training a general model using all the data to handle these three tasks simultaneously. We found that the performance of this model slightly decreased in the tasks of speech denoising and bandwidth extension, but it improved in the dereverberation task. This improvement might be due to the inclusion of noisy data, which acts as data augmentation.

Great job!
As mentioned above, the "g_best" file in the "/best_ckpt" was trained for denoise? I found that it has no ability to do bandwidth extension.
Will you show your general model for three tasks? I am interested in its PESQ improvement.

from mp-senet.

Question on Bandwidth extension task formulation about mp-senet HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent