Giter Club home page Giter Club logo

Comments (6)

w11wo avatar w11wo commented on August 18, 2024 3

I've bumped into this issue a couple of times while training 44.1kHz TTS models. To fix your issue, you have to modify the config such that the product of upsample_rates == hop_length.

At the moment, you have "upsample_rates": [4,4], whose product is 16 and != "hop_length": 256.

This 44.1kHz config here is correct; it has "upsample_rates": [8,8,2,2,2] whose product is 512 == "hop_length": 512.

So there are a couple of ways to fix your issue. The easiest option would be to just follow the config linked above, but if you still want to use a smaller hop length of 256, then I'd suggest you use:

"upsample_rates": [8,8,2,2],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [16,16,4,4],

which is the default setup from here.

Hope this helps!

from mb-istft-vits2.

w11wo avatar w11wo commented on August 18, 2024 2

To be honest, I'm not too sure which of the changes you've suggested will work in the end, though they seem reasonable. It's probably a good idea to have assertion checks in place to avoid cases like this, i.e. ensuring that the config used by the user makes sense calculation-wise.

from mb-istft-vits2.

p0p4k avatar p0p4k commented on August 18, 2024

@ggpid Please let me know the outcome of the debug test, because I am interested in KSS dataset performance as well. Thanks.
(cc. p0p4k/vits2_pytorch#49)

from mb-istft-vits2.

FENRlR avatar FENRlR commented on August 18, 2024

Alongside with p0p4k's suggestion, I found there was an issue of getting slower duration by reducing the sampling rate to 16000Hz.
MasayaKawamura/MB-iSTFT-VITS#7 (comment)

Also, there are repos especially targeting the 44100 Hz sampling rate.
https://github.com/tonnetonne814/MB-iSTFT-VITS-44100-Ja/blob/main/configs/jsut_44100.json
https://github.com/tonnetonne814/unofficial-vits2-44100-Ja/blob/main/configs/vits2_jsut_nosdp.json

Which share doubled segment size, fft sizes, etc. in common. (changes stated as below)
(+ subbands which is mb/ms-istft exclusive)

{
  "train": {
    "segment_size": 16384,
    "fft_sizes": [768, 1366, 342],
    "hop_sizes": [60, 120, 20], 
    "win_lengths": [300, 600, 120],
  },
  "data": {
    "sampling_rate": 44100,
    "filter_length": 2048,
    "hop_length": 512, 
    "win_length": 2048, 
    "add_blank": false, 
  },
  "model": {
      "subbands": 8,
      "upsample_initial_channel": 512,
    }
}

from mb-istft-vits2.

FENRlR avatar FENRlR commented on August 18, 2024

@w11wo Thank you for the detailed explanation. But I still wonder if it is okay to go with those settings from original vits because "upsample_rates": [4,4]([8,8] without "subbands":4) and "upsample_kernel_sizes": [16,16] are the changes introduced from the default setting of MB-iSTFT-VITS compared to the default setting of original vits while having the same hop_length. Perhaps with doubled hop_length, the calculation should start from that default "upsample_rates": [4,4] like this "upsample_rates": [4,4,2](it's from the same author though) or having doubled subbands.

from mb-istft-vits2.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.