I have noticed that when using different batch sizes for the encode inference, the sam

Inconsistency in Encode Results with Different Batch Sizes,about alibaba-damo-academy/funcodec

Comments (3)

ZhihaoDU commented on May 29, 2024

In summary, with different batch sizes, the outcome codecs should be very similar, different tokens should be less than 3 for each quantizer. In my case, I test 10 utterance from Librispeech test-clean subset under the batch_size of 1, 4 and 8, and the codec outputs are the same. There are some insights may help you figure out your problem:

To make the bachified inference, utterances in a mini-batch are padded at the end of each utterance with wrap mode in numpy. You can find more details at https://github.com/alibaba-damo-academy/FunCodec/blob/master/funcodec/bin/codec_inference.py#L260 and https://github.com/alibaba-damo-academy/FunCodec/blob/master/funcodec/modules/nets_utils.py#L65
To speed up the data loading at the inference stage, the multi-thread torch Dataloader worker is employed. Therefore, if you set the num_workers larger than 0 in the encoding_decoding.sh script (default value is 4), the utterance order of outputs may be different due to the random of Dataloader worker. If you want to mantain the utterance order, please set the num_workers parameter to 0.

If your test cases are still much different after you check the above mentioned things, please provide an reproducible recipe, I will check it. Thanks.

from funcodec.

hertz-pj commented on May 29, 2024

The variations in outcomes are not significantly noticeable in terms of effects. I am just keen to understand the reasons behind the differences when using various batch sizes. From my understanding, proper masking should avoid inconsistencies caused by different batch sizes

from funcodec.

ZhihaoDU commented on May 29, 2024

Since there are only convolutions and uni-directional LSTM layers in the VAE-RVQ model, I didn't implement batchified inference with masking, instead, I use porper padding, wrap mode of numbpy. I think different padding length may cause the very limited inconsistencies for the ending codes, and the other codes should be identical for various batch sizes.

from funcodec.

Recommend Projects

Inconsistency in Encode Results with Different Batch Sizes about funcodec HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent