Giter Club home page Giter Club logo

Comments (7)

yl4579 avatar yl4579 commented on June 11, 2024 1

Thanks for letting me know this work, though I actually didn’t know this work beforehand so I didn’t compare to this one. I think it’s still quite different from Vocos because in this work we optimize quality over speed, while Vocos optimizes speed over quality.

In the paper, the author shows that Vocos is four times faster than iSTFTNet with comparable performance to BigVGAN-base (I believe the BigVGAN in the paper refers to BigVGAN-base because BigVGAN has 114M parameters though the paper shows it only has 14M parameters), while our work is nearly twice slower than iSTFTNet but significantly outperforms BiGVGAN-base with comparable performance to BigVGAN.

I tried Vocos myself and perceptually it sounds slightly worse than HiFTNet, but it’s indeed much much faster. I think one big advantage of HiFTNet is it works well for singing synthesis while Vocos lags behind because it does not have the hn-NSF. But overall if you care more about the speed Vocos is definitely a much better choice.

Vocos: https://drive.google.com/file/d/1GTZaNlukv0jkNStPJ644oD1s2RJ2GEZW/view?usp=sharing
HiFTNet: https://drive.google.com/file/d/1Phu9Z3Q55L08uWd3RKw9q3rVT3DrczWe/view?usp=sharing
BigVGAN (not base): https://drive.google.com/file/d/1r-qYcRqk7Qt90Ik55msVlwKhyjcsL787/view?usp=sharing

from hiftnet.

yl4579 avatar yl4579 commented on June 11, 2024 1

I think it’s a good idea. I’ll try to combine these two and test its performance against vocos and see if it’s better but with significant speed improvements. If it works well I’ll add it to the paper later.

from hiftnet.

yl4579 avatar yl4579 commented on June 11, 2024 1

I have tried to incorporate hn-NSF to vocos but the quality is worse than without it. I think it could be related to how the source should be fed into the model (like STFT before feeding it). It doesn't seem a trivial task so more experiments are needed. If anyone else has time please take a look at it.

from hiftnet.

yl4579 avatar yl4579 commented on June 11, 2024

I will leave this issue open if someone is interested in comparing it to Vocos. Probably for singing synthesis, someone can combine these two together to make a fast high-quality singing vocoder. I developed this vocoder primarily for singing voice conversion with SLMGAN but there's no singing data to actually compare so I just compared on LJSpeech and LibriTTS instead.

from hiftnet.

Ryu1845 avatar Ryu1845 commented on June 11, 2024

Thank you for your quick reply! This is a great comparison. I can definitely see your work being better for singing synthesis, considering it uses NSF. I'm looking forward to an eventual fast HQ singing vocoder!

from hiftnet.

TechInterMezzo avatar TechInterMezzo commented on June 11, 2024

I think one big advantage of HiFTNet is it works well for singing synthesis while Vocos lags behind because it does not have the hn-NSF.

The 22kHz of the pretrained HiFTNet models are a bit low for my purpose. I think 32kHz is what I would need. Vocos also only supports 24kHz. Would you recommend retraining with different parameters or using some kind of upsampling model at the end? Speed is not so important in my case.

from hiftnet.

yl4579 avatar yl4579 commented on June 11, 2024

@TechInterMezzo If speed is not a concern, I would recommend you just train an NSF-BigVGAN with the current setup (i.e., a pre-trained F0 network to extract F0). Basically you add NSF to BigVGAN with F0 extracted using a pre-trained F0 network on mel-spectrograms.

from hiftnet.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.