Comments (3)
In summary, with different batch sizes, the outcome codecs should be very similar, different tokens should be less than 3 for each quantizer. In my case, I test 10 utterance from Librispeech test-clean subset under the batch_size of 1, 4 and 8, and the codec outputs are the same. There are some insights may help you figure out your problem:
-
To make the bachified inference, utterances in a mini-batch are padded at the end of each utterance with
wrap
mode in numpy. You can find more details at https://github.com/alibaba-damo-academy/FunCodec/blob/master/funcodec/bin/codec_inference.py#L260 and https://github.com/alibaba-damo-academy/FunCodec/blob/master/funcodec/modules/nets_utils.py#L65 -
To speed up the data loading at the inference stage, the multi-thread torch Dataloader worker is employed. Therefore, if you set the
num_workers
larger than 0 in theencoding_decoding.sh
script (default value is 4), the utterance order of outputs may be different due to the random of Dataloader worker. If you want to mantain the utterance order, please set thenum_workers
parameter to 0.
If your test cases are still much different after you check the above mentioned things, please provide an reproducible recipe, I will check it. Thanks.
from funcodec.
The variations in outcomes are not significantly noticeable in terms of effects. I am just keen to understand the reasons behind the differences when using various batch sizes. From my understanding, proper masking should avoid inconsistencies caused by different batch sizes
from funcodec.
Since there are only convolutions and uni-directional LSTM layers in the VAE-RVQ model, I didn't implement batchified inference with masking, instead, I use porper padding, wrap mode of numbpy. I think different padding length may cause the very limited inconsistencies for the ending codes, and the other codes should be identical for various batch sizes.
from funcodec.
Related Issues (20)
- Training Funcodec: Data Sources and Recommendations for Starting From Scratch HOT 2
- Difference between Encodec and Funcodec HOT 1
- Relation between bitrate and token ratio HOT 2
- LauraTTS: _pickle.UnpicklingError: invalid load key, 'v'. HOT 5
- Low-complexity FreqCodec requires a lot of VRAM HOT 3
- ERROR Generating with prompt text and prompt audio HOT 8
- Questions about training from scratch HOT 2
- TKR? HOT 5
- TypeError: 'NoneType' object is not callable HOT 4
- [bug] encoding阶段生成的codec.txt, 无法直接读取? HOT 1
- Discriminator loss? HOT 5
- LauraTTS模型的训练花了多长时间? HOT 5
- 如何仅用funcodec的4层或者8层量化器进行推理 HOT 2
- zipfile.BadZipFile: File is not a zip file HOT 1
- run.sh: 34: utils/parse_options.sh: Syntax error: Bad for loop variable
- How to check progress? HOT 1
- 运行run.sh在stage 4时报错 HOT 2
- Stage 3 HOT 1
- Feature to resume training after stopping?
- NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from funcodec.