Hi Thanks for great work and prompt feedback. I am trying to finetun

Question for commons.slice_segments(wav_padded, ids_slice * 480, wav_seglen) about freevc HOT 4 CLOSED

olawod commented on July 23, 2024

Question for commons.slice_segments(wav_padded, ids_slice * 480, wav_seglen)

from freevc.

Comments (4)

OlaWod commented on July 23, 2024

This may be caused by data preprocessing, the audio length doesn't match spectrogram length.
librosa.effects.trim can have different trimed segment with different sampling rates. For example, wav, _ = librosa.effects.trim(<24k_wav>, top_db=20) might trim out the first 0.3 seconds, while wav, _ = librosa.effects.trim(<22k_wav>, top_db=20) might trim out the first 0.33 seconds.
The 24k wav and 16k spectrogram and 16k wavlm feature should be from the same segment.
And btw I just found that I forgot librosa loads 22k wav by default, the original downsampling code upsamples 22k wav to 24k wav. Just fixed it.

from freevc.

lsw5835 commented on July 23, 2024

Thanks for your kindly answering.
Then, if there is no leading and trailing silence, is there no difference in length between 16k spectrogram and 24k wav?

from freevc.

OlaWod commented on July 23, 2024

If you didn't trim both 16k wav (where the 16k spectrogram come from) and 24k wav, then yes.
librosa.effects.trim might have the possibility to trim a little even if there is no apparent leading and trailing silence.

from freevc.

lsw5835 commented on July 23, 2024

Thanks very much. After matching the length of 16k and 24k it works fine.

from freevc.

Question for commons.slice_segments(wav_padded, ids_slice * 480, wav_seglen) about freevc HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent