Giter Club home page Giter Club logo

Comments (14)

sainathadapa avatar sainathadapa commented on May 26, 2024 5

Fortunately, MMdnn (https://github.com/microsoft/MMdnn) worked perfectly for my needs. You can also use that. If you want to see the commands I used to port the model, look here: https://github.com/sainathadapa/urban-sound-tagging/tree/after_challenge/nbs/openl3

from openl3.

auroracramer avatar auroracramer commented on May 26, 2024 2

Hi! We haven't ported the model to PyTorch; feel free to do so!

from openl3.

turian avatar turian commented on May 26, 2024 1

Hi @adrienchaton we have been working on a pytorch port. It was a little more fiddly than we expected but seems to be relatively stable now. We are prepping it for more general release: https://github.com/turian/torchopenl3

Please feel free to email me, [email protected]. I'm a huge fan of your work, and Philippe is on the committee for a NeuroIPS audio representation competition I am proposing. I'd love to talk more.

from openl3.

sainathadapa avatar sainathadapa commented on May 26, 2024

I'm going to try porting the model to PyTorch. Before I work on that, can you tell me if you already have a PyTorch version of the model right now? This is so that I do not waste effort in doing something that is already done.

from openl3.

janaal1 avatar janaal1 commented on May 26, 2024

Where can the Pytorch models be found?
Thanks in advance!

from openl3.

adrienchaton avatar adrienchaton commented on May 26, 2024

@sainathadapa thank you for sharing your codes to convert openl3 to pytorch.

I see you took the mel128 embedding, are the pytorch weights available anywhere please ?

from openl3.

turian avatar turian commented on May 26, 2024

The main issue was the difference in Kapre 0.1.3 STFTs is in high frequencies. This means that on the chirp audio, our MAE was maybe 2e-3 versus tfopenl3 when using mels (I'd have to double check). On FSD50K 100 random sounds, it was far lower.

from openl3.

adrienchaton avatar adrienchaton commented on May 26, 2024

great, thank you for sharing your port and your awesome research too !
right now it is for an art project I would use it, so it is not needed to be perfectly reproduced
I am following up by email

from openl3.

turian avatar turian commented on May 26, 2024

@adrienchaton great looking forward to talking. And any issues, questions, snags, etc. file an issue on github

from openl3.

justinsalamon avatar justinsalamon commented on May 26, 2024

this is awesome, thanks for putting this together!

Heads up - we're working on an update to openl3 that will include:

  • Support for TF 2.x (using an updated version of kapre)
  • Support for using Librosa instead of Kapre as the audio front-end

@turian if you think it makes sense it would be awesome to merge torchopenl3 into openl3 eventually, such that a single library provides support for both TF and PyTorch backends.

from openl3.

turian avatar turian commented on May 26, 2024

@justinsalamon thank you, I wanted to reach out and make sure this is all copacetic before doing any public move. Happy to integrate. TBH getting MAE low with kapre old version was quite gnarly and we had to reimplement a lot of the Mel stuff ourselves. (We still get high error on high frequencies like chirp).

BTW my email inbox is open. [email protected]

I have talked with Zeyu---who is on the committee for the accepted NeuroIPS 2021 competition I'm organizing, learning general purpose audio representations. If possible I'd love to confirm that your model and weights could potentially be included as pretraining for the dev-kit. Let me know if you'd like to sync over email or chat on Zoom for 30 minutes.

from openl3.

turian avatar turian commented on May 26, 2024

@justinsalamon my one request would be that if numpy librosa is used, we make sure to find a compatible GPU spectrogram / mel implementation. Matching the kapre spectrograms was quite hellacious. I'd want to sanity check torchlibrosa etc

I think this is of interest to people who are synthesizing audio on the GPU.

from openl3.

justinsalamon avatar justinsalamon commented on May 26, 2024

What we've done is the following:

  • Update Kapre (lets call it v2) and try to match the old version (v1) as closely as possible
  • Implement a librosa front-end and try to match it as closely as possible to v1 as well
  • Compare the embeddings
  • Compare performance on downstream classification of the UrbanSound8K dataset

As you might expect, the embeddings don't match perfectly when we replace the audio front-end. However, performance on the downstream classification task was the same (or within the margin of error), which we hope is good enough.

So, the updated version of OpenL3 will let you choose between the Kapre and Librosa front-end, but they are not interchangeable. Models trained on embeddings from a specific front-end should continue to use the same front-end for inference. The same would apply if we incorporated a pytorch version - it would be close but probably not interchangeable with the TF versions.

Yes, I owe you an email. Coming soon.

from openl3.

justinsalamon avatar justinsalamon commented on May 26, 2024

p.s - @turian happy to find a time for a quick chat if that would be helpful.

from openl3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.