Giter Club home page Giter Club logo

Comments (5)

OlaWod avatar OlaWod commented on July 23, 2024
  1. I didn't notice the distortion in s-o-p pronunciation before, it's an interesting finding. Is it occurred in words like 'Ok', 'appleS', 'helP', etc.? Maybe that's because these pronunciations are harder to model?
  2. I don't quite understand what the second problem is, what does 'rather than target speaker data' mean?

Maybe provide extra knowledges like pitch to the model can solve these problems?
So far the only distortion-related objective metric I know is WER/CER/PER.

from freevc.

steven850 avatar steven850 commented on July 23, 2024

I have Noticed the same, when a model is fine tuned to a specific speaker, the S sounds are really bad, so anything ending in S especially if its a longer S. I have only noticed it with S sounds though, I havent seen a problem with P or O.

Its a metallic sound, like phase offset from the vocoder.

I think he meant the output is not that of the target speaker?

from freevc.

lsw5835 avatar lsw5835 commented on July 23, 2024
  1. I didn't notice the distortion in s-o-p pronunciation before, it's an interesting finding. Is it occurred in words like 'Ok', 'appleS', 'helP', etc.? Maybe that's because these pronunciations are harder to model?
  2. I don't quite understand what the second problem is, what does 'rather than target speaker data' mean?

Maybe provide extra knowledges like pitch to the model can solve these problems? So far the only distortion-related objective metric I know is WER/CER/PER.

  1. Yes that error is generated from time to time.
  2. That means if the pitch or pronunciation in the source wav is not in the target, distortion occurs.

While looking for other augmentation methods, I found that there is a 'stretch function in utils.py'. Have you ever tested horizontal augmentation?

from freevc.

OlaWod avatar OlaWod commented on July 23, 2024

it'll be harder to convert if the voice of source and target are very different.
i did not use horizontal augmentation because it does not change the speaker information of source wav.
btw i'd like to share some files: ↓

original.mp4
change_duration.mp4
change_pitch.mp4
change_volume.mp4

i think, by applying different vertical sr rates in a wav, the augmentation can be stronger and help the disentanglement better.

from freevc.

lsw5835 avatar lsw5835 commented on July 23, 2024

Thanks very much for answering.

from freevc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.