Hi, Thank you for quick replies and kindness. While testing through

s-o-p pronunciation , high-low tone distortion about freevc HOT 5 CLOSED

olawod commented on July 23, 2024

s-o-p pronunciation , high-low tone distortion

from freevc.

Comments (5)

OlaWod commented on July 23, 2024

I didn't notice the distortion in s-o-p pronunciation before, it's an interesting finding. Is it occurred in words like 'Ok', 'appleS', 'helP', etc.? Maybe that's because these pronunciations are harder to model?
I don't quite understand what the second problem is, what does 'rather than target speaker data' mean?

Maybe provide extra knowledges like pitch to the model can solve these problems?
So far the only distortion-related objective metric I know is WER/CER/PER.

from freevc.

steven850 commented on July 23, 2024

I have Noticed the same, when a model is fine tuned to a specific speaker, the S sounds are really bad, so anything ending in S especially if its a longer S. I have only noticed it with S sounds though, I havent seen a problem with P or O.

Its a metallic sound, like phase offset from the vocoder.

I think he meant the output is not that of the target speaker?

from freevc.

lsw5835 commented on July 23, 2024

I didn't notice the distortion in s-o-p pronunciation before, it's an interesting finding. Is it occurred in words like 'Ok', 'appleS', 'helP', etc.? Maybe that's because these pronunciations are harder to model?

I don't quite understand what the second problem is, what does 'rather than target speaker data' mean?

Maybe provide extra knowledges like pitch to the model can solve these problems? So far the only distortion-related objective metric I know is WER/CER/PER.

Yes that error is generated from time to time.
That means if the pitch or pronunciation in the source wav is not in the target, distortion occurs.

While looking for other augmentation methods, I found that there is a 'stretch function in utils.py'. Have you ever tested horizontal augmentation?

from freevc.

OlaWod commented on July 23, 2024

it'll be harder to convert if the voice of source and target are very different.
i did not use horizontal augmentation because it does not change the speaker information of source wav.
btw i'd like to share some files: ↓

original.mp4

change_duration.mp4

change_pitch.mp4

change_volume.mp4

i think, by applying different vertical sr rates in a wav, the augmentation can be stronger and help the disentanglement better.

from freevc.

lsw5835 commented on July 23, 2024

Thanks very much for answering.

from freevc.

Recommend Projects

s-o-p pronunciation , high-low tone distortion about freevc HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent