Giter Club home page Giter Club logo

Comments (5)

r9y9 avatar r9y9 commented on June 12, 2024 2

An initial attempt:

I think the pitch trajectory gets smoother compared to the following samples:

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024 2

Here's a prototype of Conv1dRenet + skip connection for the record:

class ResConv1dResnet(BaseModel):
    def __init__(
        self,
        in_dim, hidden_dim, out_dim, num_layers=4,
        in_lf0_idx=300,
        in_lf0_min=5.3936276,
        in_lf0_max=6.491111,
        out_lf0_idx=180,
        out_lf0_mean=5.953093881972361,
        out_lf0_scale=0.23435173188961034,
    ):
        super().__init__()
        self.in_lf0_idx = in_lf0_idx
        self.in_lf0_min = in_lf0_min
        self.in_lf0_max = in_lf0_max
        self.out_lf0_idx = out_lf0_idx
        self.out_lf0_mean = out_lf0_mean
        self.out_lf0_scale = out_lf0_scale

        model = [
            nn.ReflectionPad1d(3),
            WNConv1d(in_dim, hidden_dim, kernel_size=7, padding=0),
        ]
        for n in range(num_layers):
            model.append(ResnetBlock(hidden_dim, dilation=2 ** n))
        model += [
            nn.LeakyReLU(0.2),
            nn.ReflectionPad1d(3),
            WNConv1d(hidden_dim, out_dim, kernel_size=7, padding=0),
        ]
        self.model = nn.Sequential(*model)

    def forward(self, x, lengths=None):
        out = self.model(x.transpose(1, 2)).transpose(1, 2)

        # denormalized lf0 from the input musical score
        lf0_score = x[:, :, self.in_lf0_idx].unsqueeze(-1)
        lf0_score_denorm = lf0_score * (self.in_lf0_max - self.in_lf0_min) + self.in_lf0_min


        # TODO: must be careful about dynamic features
        # Residual connection in denormalized f0
        lf0_res = out[:, :, self.out_lf0_idx].unsqueeze(-1)
        lf0_res = 0.693 * torch.tanh(lf0_res)
        lf0_pred_denorm = lf0_res + lf0_score_denorm
        # Back to normalized f0
        lf0_pred = (lf0_pred_denorm - self.out_lf0_mean) / self.out_lf0_scale

        out[:, :, self.out_lf0_idx] = lf0_pred.squeeze(-1)

        return out

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024 2

PR is up #79. A heuristic parameter 0.693 was replaced with a better value. Also added some more comments in the code. Will add the sinsy's acoustic model soon.

Here's the distribution of residual log-F0 for nit-song070 database:
スクリーンショット 2022-03-09 23 21 17

The most of data (>99.7%) is in the range of [-0.35 ~ 0.35] (i.e. [-600, 600] (in cent))

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024 1

Now I merged #73. Next, I will revise my local implementation for the new acoustic model and make a PR soon.

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024

fixed by #81

from nnsvs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.