Giter Club home page Giter Club logo

Comments (6)

yqzhishen avatar yqzhishen commented on September 6, 2024

The depth input is introduced by shallow diffusion mechanism, and you can read the documentation for this. Briefly speaking, it equals to K_step in the configuration file for training.

from diffsinger.

loct824 avatar loct824 commented on September 6, 2024

I got another error:

[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'/fs2/txt_embed/Gather' Status Message: indices element out of data bounds, idx=50 must be within the inclusive range [-50,49]

Should I use the ONNX model for code deployement (e.g. building an API)? Does it require significant effort to refactor the code given the latest changes?

from diffsinger.

yqzhishen avatar yqzhishen commented on September 6, 2024

Seems like your model have a different phoneme set comparing to the default one in MiniEngine. You should use the right dictionary to infer the model.

However, MiniEngine is no longer maintained. If you do not have strong demand on running models in CLI or on remote host, please consider using OpenUTAU for modern user experience. Also I recommend referring the whole inference procedure from it.

from diffsinger.

loct824 avatar loct824 commented on September 6, 2024

I managed to make inference after doing below changes:

  1. Changing the reserved tokens to 2 in the config file:
  filename: assets/dictionaries/dictionary.txt
  reserved_tokens: 2

I understand the reserved tokens mean tokens like AP and SP that are not in the phoneme dictionary? Is that correct?

  1. Adding the depth parameter in the acoustic_infer method:
def acoustic_infer(model: str, providers: list, tokens, durations, f0, speedup):
    session = utils.create_session(model, providers)
    print(type(tokens))
    print(type(durations))
    print(type(f0))
    print(type(speedup))
    mel = session.run(['mel'], {'tokens': tokens, 'durations': durations, 'f0': f0, 'speedup': speedup, 'depth': np.array(1000)})[0]
    return mel

However, I noticed that there is a significant difference in outputted waveform quality compared to the results I obtained using infer.py in the DiffSinger repo. Would you give some advice on how we might refactor the code in DiffSingerMiniEngine to obtain similar performance in DiffSinger repo?

from diffsinger.

yqzhishen avatar yqzhishen commented on September 6, 2024

No, reserved tokens were padding tokens for some historical reasons, and most models nowadays have only 1 reserved token. AP and SP are real tokens. You should make sure the phoneme IDs are correct to get reasonable results.

Are you sure you are using the correct dictionary of the model?

from diffsinger.

loct824 avatar loct824 commented on September 6, 2024

Yes I am using the exact same dictionary as the one used for training.

I just changed the reserved_token to 1, and it can now give the same results and quality as in infer.py. I guess it is the reserved_token that affected the indices used for phonemes.

Thank you so much for your help!

from diffsinger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.