https://arxiv.org/abs/2108.0

An initial attempt: Conv1dResnet + skip connection <a href="ht

PR is up <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

Now I merged <a class="issue-link js-issue-link" data-error-text="Failed to load title

fixed by <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

Incorporate residual F0 prediction into acoustic model about nnsvs HOT 5 CLOSED

nnsvs commented on June 12, 2024 2

Incorporate residual F0 prediction into acoustic model

from nnsvs.

Comments (5)

r9y9 commented on June 12, 2024 2

An initial attempt:

Conv1dResnet + skip connection https://soundcloud.com/r9y9/20220309-nit-song070-svs-world-conv-sine-based-vibrato-modeling-skip-connection-nitech-jp-song070-f001-003

I think the pitch trajectory gets smoother compared to the following samples:

Conv1dResnet: https://soundcloud.com/r9y9/20220308-nit-song070-svs-world-conv-vibrato-modeling-nitech-jp-song070-f001-003
Conv1dResnetMDN (w/ vibrato modeling): https://soundcloud.com/r9y9/20201116-nit-song070-svs-world-conv-mdn-dim-wise-mdn-nitech-jp-song070-f001-003

from nnsvs.

r9y9 commented on June 12, 2024 2

Here's a prototype of Conv1dRenet + skip connection for the record:

class ResConv1dResnet(BaseModel):
    def __init__(
        self,
        in_dim, hidden_dim, out_dim, num_layers=4,
        in_lf0_idx=300,
        in_lf0_min=5.3936276,
        in_lf0_max=6.491111,
        out_lf0_idx=180,
        out_lf0_mean=5.953093881972361,
        out_lf0_scale=0.23435173188961034,
    ):
        super().__init__()
        self.in_lf0_idx = in_lf0_idx
        self.in_lf0_min = in_lf0_min
        self.in_lf0_max = in_lf0_max
        self.out_lf0_idx = out_lf0_idx
        self.out_lf0_mean = out_lf0_mean
        self.out_lf0_scale = out_lf0_scale

        model = [
            nn.ReflectionPad1d(3),
            WNConv1d(in_dim, hidden_dim, kernel_size=7, padding=0),
        ]
        for n in range(num_layers):
            model.append(ResnetBlock(hidden_dim, dilation=2 ** n))
        model += [
            nn.LeakyReLU(0.2),
            nn.ReflectionPad1d(3),
            WNConv1d(hidden_dim, out_dim, kernel_size=7, padding=0),
        ]
        self.model = nn.Sequential(*model)

    def forward(self, x, lengths=None):
        out = self.model(x.transpose(1, 2)).transpose(1, 2)

        # denormalized lf0 from the input musical score
        lf0_score = x[:, :, self.in_lf0_idx].unsqueeze(-1)
        lf0_score_denorm = lf0_score * (self.in_lf0_max - self.in_lf0_min) + self.in_lf0_min


        # TODO: must be careful about dynamic features
        # Residual connection in denormalized f0
        lf0_res = out[:, :, self.out_lf0_idx].unsqueeze(-1)
        lf0_res = 0.693 * torch.tanh(lf0_res)
        lf0_pred_denorm = lf0_res + lf0_score_denorm
        # Back to normalized f0
        lf0_pred = (lf0_pred_denorm - self.out_lf0_mean) / self.out_lf0_scale

        out[:, :, self.out_lf0_idx] = lf0_pred.squeeze(-1)

        return out

from nnsvs.

r9y9 commented on June 12, 2024 2

PR is up #79. A heuristic parameter 0.693 was replaced with a better value. Also added some more comments in the code. Will add the sinsy's acoustic model soon.

Here's the distribution of residual log-F0 for nit-song070 database:

The most of data (>99.7%) is in the range of [-0.35 ~ 0.35] (i.e. [-600, 600] (in cent))

from nnsvs.

r9y9 commented on June 12, 2024 1

Now I merged #73. Next, I will revise my local implementation for the new acoustic model and make a PR soon.

from nnsvs.

r9y9 commented on June 12, 2024

fixed by #81

from nnsvs.

Incorporate residual F0 prediction into acoustic model about nnsvs HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent