Comments (6)
The depth
input is introduced by shallow diffusion mechanism, and you can read the documentation for this. Briefly speaking, it equals to K_step
in the configuration file for training.
from diffsinger.
I got another error:
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'/fs2/txt_embed/Gather' Status Message: indices element out of data bounds, idx=50 must be within the inclusive range [-50,49]
Should I use the ONNX model for code deployement (e.g. building an API)? Does it require significant effort to refactor the code given the latest changes?
from diffsinger.
Seems like your model have a different phoneme set comparing to the default one in MiniEngine. You should use the right dictionary to infer the model.
However, MiniEngine is no longer maintained. If you do not have strong demand on running models in CLI or on remote host, please consider using OpenUTAU for modern user experience. Also I recommend referring the whole inference procedure from it.
from diffsinger.
I managed to make inference after doing below changes:
- Changing the reserved tokens to 2 in the config file:
filename: assets/dictionaries/dictionary.txt
reserved_tokens: 2
I understand the reserved tokens mean tokens like AP
and SP
that are not in the phoneme dictionary? Is that correct?
- Adding the
depth
parameter in theacoustic_infer
method:
def acoustic_infer(model: str, providers: list, tokens, durations, f0, speedup):
session = utils.create_session(model, providers)
print(type(tokens))
print(type(durations))
print(type(f0))
print(type(speedup))
mel = session.run(['mel'], {'tokens': tokens, 'durations': durations, 'f0': f0, 'speedup': speedup, 'depth': np.array(1000)})[0]
return mel
However, I noticed that there is a significant difference in outputted waveform quality compared to the results I obtained using infer.py
in the DiffSinger repo. Would you give some advice on how we might refactor the code in DiffSingerMiniEngine
to obtain similar performance in DiffSinger repo?
from diffsinger.
No, reserved tokens were padding tokens for some historical reasons, and most models nowadays have only 1 reserved token. AP and SP are real tokens. You should make sure the phoneme IDs are correct to get reasonable results.
Are you sure you are using the correct dictionary of the model?
from diffsinger.
Yes I am using the exact same dictionary as the one used for training.
I just changed the reserved_token
to 1, and it can now give the same results and quality as in infer.py
. I guess it is the reserved_token
that affected the indices used for phonemes.
Thank you so much for your help!
from diffsinger.
Related Issues (20)
- Support tension and voicing
- TypeError running variance inference (previously working) HOT 1
- onnx exports to incorrect folder HOT 1
- Strange humming sound during `SP` & `AP` HOT 3
- Inference from OpenUTAU USTx -> DiffSinger DS not Carrying Over Parameters HOT 1
- AttributeError on ReFlow HOT 1
- Tracking: development around Rectified Flow HOT 3
- Export Acoustic Model Error:"size mismatch for fs2.txt_embed.weight" HOT 1
- Custom Trained DiffSinger Render Failed HOT 4
- 是否可以更改模型架构或者其他方式提升合成音质? HOT 6
- Is removing background noise from audio beneficial to the quality of DiffSinger? HOT 2
- Question regarding pitch models (Reflow vs DDPM) HOT 3
- 关于唱法模型数据集 HOT 1
- Effects of transitioning mel_base from '10' to 'e' HOT 2
- In automatic optimization, `training_step` must return a Tensor, a dict, or None (where the step will be skipped). HOT 7
- ONNX Inference Scripts Documentation HOT 5
- Error training variance model HOT 3
- DiffSinger 制作合唱 HOT 2
- Inference DiffSinger HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diffsinger.