Giter Club home page Giter Club logo

Comments (12)

taroushirani avatar taroushirani commented on June 12, 2024 1

Thank you for your rapid response. This change seems to work well for me.

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024

@taroushirani I apologize if you are already working on this. It was easier to implement than I originally thought, so I did while working on #76. Could you check the commit 62d633d if the implementation looks correct?

from nnsvs.

taroushirani avatar taroushirani commented on June 12, 2024

Hello, In the latest paper of sinsy[1], timelag-adjusted note length are calculated as follows:

duration_paper

  1. https://arxiv.org/pdf/2108.02776.pdf

https://github.com/r9y9/nnsvs/blob/8e5a96967095d56e38f840cc828f506f3b3ea787/nnsvs/gen.py#L195-L201

I'm afraid that "L_hat = L - (lag[i - 1] - lag[i]) / 50000" is correct.

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024

Ah yes, you are right. Will fix it soon.

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024

Fixed 5a5d0ca

from nnsvs.

taroushirani avatar taroushirani commented on June 12, 2024

I found that when L_hat is smaller than mu.sum() (there may be too many phonemes in a short note), rho and d_norm can be negative value and then d_norm.sum() is bigger than L_hat because d_norm will be corrected as 1.

https://github.com/r9y9/nnsvs/blob/8e5a96967095d56e38f840cc828f506f3b3ea787/nnsvs/gen.py#L230-L232

The above code may generate negative d_norm and result in nnmnkwii/io/hts.py error.

from nnsvs.

taroushirani avatar taroushirani commented on June 12, 2024

I found the another issue. L_hat can be smaller than 1 as the result of application of timelag, and this may result in estimation error. Checking code of L_hat may be needed.

from nnsvs.

taroushirani avatar taroushirani commented on June 12, 2024

I think it may be permissible to use the conventional method to estimate duration in a short note, because the estimation error of duration of consonants in a short note may be less obvious than that in long note.

Sample code(replace with line 228)

    if is_mdn and np.any(d_norm <= 0):
        # eq (12) (using mu as d_hat)
        d_hat = pred_durations[0][note_indices[i - 1] : note_indices[i]]
        d_norm = L_hat * d_hat / d_hat.sum()

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024

Thank you very much for your comments.

I found that when L_hat is smaller than mu.sum() (there may be too many phonemes in a short note), rho and d_norm can be negative value and then d_norm.sum() is bigger than L_hat because d_norm will be corrected as 1.

I think negative rho would be okay as it just makes phoneme durations shorter, but negative d_norm is not expected behavior and we need to fix it.

I found the another issue. L_hat can be smaller than 1 as the result of application of timelag, and this may result in estimation error. Checking code of L_hat may be needed.

This means... our time-lag model is not well trained T.T

I think it may be permissible to use the conventional method to estimate duration in a short note, because the estimation error of duration of consonants in a short note may be less obvious than that in long note.

If I understand correctly, negative d_norm could happen even if we use the conventional uniform duration scaling. Do you think uniform scaling is better than variance-dependent scaling? Do you observe improvements?

To prevent negative phoneme duration, I wonder if we should set a minimum duration for each phoneme. I'm thinking about:

Perhaps similar to https://twitter.com/canon_73/status/1451517876247007239?

from nnsvs.

taroushirani avatar taroushirani commented on June 12, 2024

If I understand correctly, negative d_norm could happen even if we use the conventional uniform duration scaling. Do you think uniform scaling is better than variance-dependent scaling? Do you observe improvements?

I think the conventional uniform duration scailing would not generate negative d_norm because d_hat(=mu) >0, d_hat.sum() > 0, L_hat > 0. And it's merely the division by propotion, d_norm.sum() is always equal to L_hat and the adjustment of line 232 may never be executed.

I think that setting minimum duration uniformally or respectively may work well if we remove the codes for adjustment at line 230-232.

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024

I think I see your point. I thought L_hat could be negative L_hat = L - (lag[i - 1] - lag[i]) / 50000 depending on the estimated time-lags but that's a different issue and rarely happens.

from nnsvs.

r9y9 avatar r9y9 commented on June 12, 2024

If I understand right, ad6900f and 11ad72b address your concern. Are the changes look right?

from nnsvs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.