<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Fixed <a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https://

Duration modeling considering variances about nnsvs HOT 12 CLOSED

nnsvs commented on June 12, 2024

Duration modeling considering variances

from nnsvs.

Comments (12)

taroushirani commented on June 12, 2024 1

Thank you for your rapid response. This change seems to work well for me.

from nnsvs.

r9y9 commented on June 12, 2024

@taroushirani I apologize if you are already working on this. It was easier to implement than I originally thought, so I did while working on #76. Could you check the commit 62d633d if the implementation looks correct?

from nnsvs.

taroushirani commented on June 12, 2024

Hello, In the latest paper of sinsy[1], timelag-adjusted note length are calculated as follows:

https://arxiv.org/pdf/2108.02776.pdf

https://github.com/r9y9/nnsvs/blob/8e5a96967095d56e38f840cc828f506f3b3ea787/nnsvs/gen.py#L195-L201

I'm afraid that "L_hat = L - (lag[i - 1] - lag[i]) / 50000" is correct.

from nnsvs.

r9y9 commented on June 12, 2024

Ah yes, you are right. Will fix it soon.

from nnsvs.

r9y9 commented on June 12, 2024

Fixed 5a5d0ca

from nnsvs.

taroushirani commented on June 12, 2024

I found that when L_hat is smaller than mu.sum() (there may be too many phonemes in a short note), rho and d_norm can be negative value and then d_norm.sum() is bigger than L_hat because d_norm will be corrected as 1.

https://github.com/r9y9/nnsvs/blob/8e5a96967095d56e38f840cc828f506f3b3ea787/nnsvs/gen.py#L230-L232

The above code may generate negative d_norm and result in nnmnkwii/io/hts.py error.

from nnsvs.

taroushirani commented on June 12, 2024

I found the another issue. L_hat can be smaller than 1 as the result of application of timelag, and this may result in estimation error. Checking code of L_hat may be needed.

from nnsvs.

taroushirani commented on June 12, 2024

I think it may be permissible to use the conventional method to estimate duration in a short note, because the estimation error of duration of consonants in a short note may be less obvious than that in long note.

Sample code(replace with line 228)

    if is_mdn and np.any(d_norm <= 0):
        # eq (12) (using mu as d_hat)
        d_hat = pred_durations[0][note_indices[i - 1] : note_indices[i]]
        d_norm = L_hat * d_hat / d_hat.sum()

from nnsvs.

r9y9 commented on June 12, 2024

Thank you very much for your comments.

I found that when L_hat is smaller than mu.sum() (there may be too many phonemes in a short note), rho and d_norm can be negative value and then d_norm.sum() is bigger than L_hat because d_norm will be corrected as 1.

I think negative rho would be okay as it just makes phoneme durations shorter, but negative d_norm is not expected behavior and we need to fix it.

I found the another issue. L_hat can be smaller than 1 as the result of application of timelag, and this may result in estimation error. Checking code of L_hat may be needed.

This means... our time-lag model is not well trained T.T

I think it may be permissible to use the conventional method to estimate duration in a short note, because the estimation error of duration of consonants in a short note may be less obvious than that in long note.

If I understand correctly, negative d_norm could happen even if we use the conventional uniform duration scaling. Do you think uniform scaling is better than variance-dependent scaling? Do you observe improvements?

To prevent negative phoneme duration, I wonder if we should set a minimum duration for each phoneme. I'm thinking about:

As is: set minimum duration to 1 for all phonemes https://github.com/r9y9/nnsvs/blob/19acb79bd355f515016528be00d7c954a8b12783/nnsvs/gen.py#L218
To be: set minimum duration for each phoneme

from nnsvs.

taroushirani commented on June 12, 2024

If I understand correctly, negative d_norm could happen even if we use the conventional uniform duration scaling. Do you think uniform scaling is better than variance-dependent scaling? Do you observe improvements?

I think the conventional uniform duration scailing would not generate negative d_norm because d_hat(=mu) >0, d_hat.sum() > 0, L_hat > 0. And it's merely the division by propotion, d_norm.sum() is always equal to L_hat and the adjustment of line 232 may never be executed.

I think that setting minimum duration uniformally or respectively may work well if we remove the codes for adjustment at line 230-232.

from nnsvs.

r9y9 commented on June 12, 2024

I think I see your point. I thought L_hat could be negative L_hat = L - (lag[i - 1] - lag[i]) / 50000 depending on the estimated time-lags but that's a different issue and rarely happens.

from nnsvs.

r9y9 commented on June 12, 2024

If I understand right, ad6900f and 11ad72b address your concern. Are the changes look right?

from nnsvs.

Duration modeling considering variances about nnsvs HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent