marl / openl3 Goto Github PK

View Code? Open in Web Editor NEW

417.0 11.0 57.0 703.62 MB

OpenL3: Open-source deep audio and image embeddings

License: MIT License

Python 9.78% Jupyter Notebook 90.22%

audio embedding embedding-models deep-learning machine-listening image-embeddings image

openl3's Introduction

OpenL3

OpenL3 is an open-source Python library for computing deep audio and image embeddings.

Please refer to the documentation for detailed instructions and examples.

UPDATE: Openl3 now has Tensorflow 2 support!

NOTE: Whoops! A bug was reported in the training code, with the effect that positive audio-image pairs that come from the same video do not necessarily overlap in time. Nonetheless, the embedding still seems to capture useful semantic information.

The audio and image embedding models provided here are published as part of [1], and are based on the Look, Listen and Learn approach [2]. For details about the embedding models and how they were trained, please see:

Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Aurora Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.

Installing OpenL3

Dependencies

libsndfile

OpenL3 depends on the pysoundfile module to load audio files, which depends on the non-Python library libsndfile. On Windows and macOS, these will be installed via pip and you can therefore skip this step. However, on Linux this must be installed manually via your platform's package manager. For Debian-based distributions (such as Ubuntu), this can be done by simply running

apt-get install libsndfile1

Alternatively, if you are using conda, you can install libsndfile simply by running

conda install -c conda-forge libsndfile

For more detailed information, please consult the pysoundfile installation documentation.

Tensorflow

Starting with openl3>=0.4.0, Openl3 has been upgraded to use Tensorflow 2. Because Tensorflow 2 and higher now includes GPU support, tensorflow>=2.0.0 is included as a dependency and no longer needs to be installed separately.

If you are interested in using Tensorflow 1.x, please install using pip install 'openl3<=0.3.1'.

Tensorflow 1x & OpenL3 <= v0.3.1

Because Tensorflow 1.x comes in CPU-only and GPU variants, we leave it up to the user to install the version that best fits their usecase.

On most platforms, either of the following commands should properly install Tensorflow:

pip install "tensorflow<1.14" # CPU-only version
pip install "tensorflow-gpu<1.14" # GPU version

For more detailed information, please consult the Tensorflow installation documentation.

Installing OpenL3

The simplest way to install OpenL3 is by using pip, which will also install the additional required dependencies if needed. To install OpenL3 using pip, simply run

pip install openl3

To install the latest version of OpenL3 from source:

Clone or pull the latest version, only retrieving the main branch to avoid downloading the branch where we store the model weight files (these will be properly downloaded during installation).
```
 git clone [email protected]:marl/openl3.git --branch main --single-branch
```
Install using pip to handle python dependencies. The installation also downloads model files, which requires a stable network connection.
```
 cd openl3
 pip install -e .
```

Using OpenL3

To help you get started with OpenL3 please see the tutorial.

Acknowledging OpenL3

Please cite the following papers when using OpenL3 in your work:

[1] Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Aurora Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.

[2] Look, Listen and Learn
Relja Arandjelović and Andrew Zisserman
IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017.

Model Weights License

The model weights are made available under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

openl3's People

Contributors

Stargazers

Watchers

openl3's Issues

PyTorch models

Any chance you have PyTorch model files saved, in addition to the Keras models present in this repo?

Supporting multiple GPU models

Should supporting running the embedding models on multiple-GPUs be prioritized? Here are the pros/cons as I see it (not necessarily equally weighted in terms of importance):

Pros

Allows users to take advantage of multiple GPUs for faster running time

Cons

Adds an extra parameter to most API calls, though this can be optional
Adds meat to the codebase (though we already have it)
Can we test this on Travis?

All in all, I think that if we believe that using multiple GPUs will be a common use case, then we should include it. But if it's something that will be rarely used, if at all, we shouldn't prioritize it (at least for an MVP).

Implement image embedding API

Add the image embedding API to the library. This should be fairly similar to the existing audio API. I'll add a candidate interface once I've given it more thought.

Set up readthedocs support

Set up Sphinx, readthedocs account

About pre-trained model and paper

Hello there, I would like to have more details of pre-trained model used by OpenL3 project:

On which dataset the pre-trained model is trained? AudioSet or SoundNet-Flickr dataset or another dataset? And what is the number of examples used?
Did you reproduce the results described in the original L^3 Net paper or the latest version AVE-Net (Object that sound, ECCV18)?

Besides, congrats to the authors for paper acceptance, but it seems that the paper is not available yet, could you provide some way to download this paper?

Thank you!

API reference in documentation missing

When going to https://openl3.readthedocs.io/en/latest/api.html I only see the headers

Core functionality
Models functionality

with nothing under each header. Expected would be a list of classes and functions and the associated documentation. At least those APIs that are mentioned in the tutorial.

CPU only support

I imagine plenty of people will want to run this on machines without GPUs, and as a result will need to use the CPU-only version of tensorflow, so I think we should aim to support CPU-only installations of this package. That being said, what's the best way to handle this when people are installing via pip? Rather than creating an entirely separate PyPI repo.

Handling silence

There are a few different ways we could handle audio with (true) silence, i.e. all zero samples. We could throw an error, but there could be reasons that people want to process silence. We might want to raise a warning so the user is aware. Or we could just do nothing, which could also be reasonable. What are your thoughts?

Drop support for Python <=3.5 and add support for Python 3.7, and 3.8

Python 3.5 has reached EOL as of October 2020, so we should drop support for it. Since 3.6 is now the oldest version, and 3.7 and 3.8 should be pretty robust by now, we should also add explicit support for Python 3.7 and 3.8. We can probably avoid 3.9 since it was just released last month and may be somewhat unstable.

[Q] Audio chunk duration greater than 1 second

Is there a provision to process audio chunks larger than 1 second, say 5 second audio chunks? I would imagine this will be fine.

Can I have your opinion?

Streaming / real-time usage?

For many applications it is desirable to be able to classify a real-time audio stream. Assuming that a delay by 1 analysis window size (1 second) is acceptable. Is this use-case something you aim to support with OpenL3, and maybe have tested? If so it would be great to have an example or documentation for it.

Looking briefly at the API docs and source code, it seems this could be possible by combining.
model = load_embedding_model(...) with get_embeddings(chunk, sr, model=model, ...), where chunk is the just-received 1 second of audio. Any caveats?

Also, any estimates on what the hardware requirements would be to reach real-time performance?

error in the documentation of the image embedding sizes ?

Hi,

In the API documentation it is written that openl3.models.load_image_embedding_model accepts 6144 and 512 as embedding_size. It seems that it is not the same as openl3.models.load_audio_embedding_model and that in fact it accepts 8192 and 512 as sizes.

It does not seem specified in your paper, shall I assume that the (image,audio) models have been trained as pairs of embedding sizes (8192,6144) and (512,512) for the different configurations of input_repr and content_type ?

Thanks !

Write high level docs

Add intro, descriptions, and examples

Drop support for Python 2.x

Since Python 2.x has reached EoL, we should probably plan on dropping support for at some point as we develop further.

Add function to API for loading custom model weights

This is motivated by the HEAR2021 challenge, which prefers that model weights are loaded by filename (downloaded to the evaluation machine) rather than loaded internally using configurations. This also allows for users to more safely load any custom weights if they want to for whatever reason.

API docs broken

ReadTheDocs build passed, but silently failed 😰

WARNING: autodoc: failed to import module u'core' from module u'openl3'; the following exception was raised:
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/openl3/envs/stable/lib/python2.7/site-packages/sphinx/ext/autodoc/importer.py", line 154, in import_module
    __import__(modname)
  File "/home/docs/checkouts/readthedocs.org/user_builds/openl3/checkouts/stable/openl3/__init__.py", line 3, in <module>
    from .core import (
  File "/home/docs/checkouts/readthedocs.org/user_builds/openl3/checkouts/stable/openl3/core.py", line 10, in <module>
    from .models import load_audio_embedding_model, load_image_embedding_model, _validate_audio_frontend
  File "/home/docs/checkouts/readthedocs.org/user_builds/openl3/checkouts/stable/openl3/models.py", line 42
    def get_spectrogram(*a, return_decibel=False, **kw):
                                         ^
SyntaxError: invalid syntax

It looks to be an issue of ReadTheDocs using Python 2.x instead of 3.x.

WARNING: autodoc: failed to import module 'core' from module 'openl3'; the following exception was raised:
No module named 'tensorflow.keras'; 'tensorflow' is not a package

^ Also seems to be because it's not mocked in docs/conf.py

Add support for Python 3.4

We get the following build error in Travis when building for 3.4:

Running setup.py bdist_wheel for llvmlite ... error
  Complete output from command /home/travis/virtualenv/python3.4.6/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-6ji57j2r/llvmlite/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmptmrhzy74pip-wheel- --python-tag cp34:
  running bdist_wheel
  /home/travis/virtualenv/python3.4.6/bin/python /tmp/pip-build-6ji57j2r/llvmlite/ffi/build.py
  LLVM version... 5.0.0git-929163d
  
  Traceback (most recent call last):
    File "/tmp/pip-build-6ji57j2r/llvmlite/ffi/build.py", line 167, in <module>
      main()
    File "/tmp/pip-build-6ji57j2r/llvmlite/ffi/build.py", line 157, in main
      main_posix('linux', '.so')
    File "/tmp/pip-build-6ji57j2r/llvmlite/ffi/build.py", line 119, in main_posix
      raise RuntimeError(msg)
  RuntimeError: Building llvmlite requires LLVM 6.0.x. Be sure to set LLVM_CONFIG to the right executable path.
  Read the documentation at http://llvmlite.pydata.org/ for more information about building llvmlite.
  
  error: command '/home/travis/virtualenv/python3.4.6/bin/python' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for llvmlite
  Running setup.py clean for llvmlite
Successfully built kapre resampy librosa future numba audioread pycparser
Failed to build llvmlite
Installing collected packages: scipy, keras, audioread, scikit-learn, joblib, decorator, llvmlite, numba, resampy, librosa, future, kapre, pycparser, cffi, PySoundFile, openl3
  Running setup.py install for llvmlite ... error
    Complete output from command /home/travis/virtualenv/python3.4.6/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-6ji57j2r/llvmlite/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ocnebo16-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/travis/virtualenv/python3.4.6/include/site/python3.4/llvmlite:
    running install
    running build
    got version from file /tmp/pip-build-6ji57j2r/llvmlite/llvmlite/_version.py {'version': '0.25.0', 'full': '9af98a608a49278dbc4ce5dc743152f2341b6a87'}
    running build_ext
    /home/travis/virtualenv/python3.4.6/bin/python /tmp/pip-build-6ji57j2r/llvmlite/ffi/build.py
    LLVM version... 5.0.0git-929163d
    
    Traceback (most recent call last):
      File "/tmp/pip-build-6ji57j2r/llvmlite/ffi/build.py", line 167, in <module>
        main()
      File "/tmp/pip-build-6ji57j2r/llvmlite/ffi/build.py", line 157, in main
        main_posix('linux', '.so')
      File "/tmp/pip-build-6ji57j2r/llvmlite/ffi/build.py", line 119, in main_posix
        raise RuntimeError(msg)
    RuntimeError: Building llvmlite requires LLVM 6.0.x. Be sure to set LLVM_CONFIG to the right executable path.
    Read the documentation at http://llvmlite.pydata.org/ for more information about building llvmlite.
    
    error: command '/home/travis/virtualenv/python3.4.6/bin/python' failed with exit status 1
    
    ----------------------------------------
Command "/home/travis/virtualenv/python3.4.6/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-6ji57j2r/llvmlite/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ocnebo16-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/travis/virtualenv/python3.4.6/include/site/python3.4/llvmlite" failed with error code 1 in /tmp/pip-build-6ji57j2r/llvmlite/
The command "pip install -e .[tests]" failed and exited with 1 during .
Your build has been stopped.

Towards getting an MVP out, we'll ignore 3.4 for now and revisit at some point in the future.

Refactor code and models to support TF 2.x and tf.keras

At some point in the somewhat near future, we should establish support for TF 2.x and tf.keras. The main reasons for this are:

To remain compatible with new releases of TF and Keras (the official version of which is now tf.keras) and make use of bug fixes, etc. some regression issues. As we have found (#42, #43), installing with newer versions of either package break installation and usage.
To address multiple vulnerabilities contained in tensorflow < 1.15.2.
To simplify the installation process; since TF 2.x now includes support for both CPU and GPU, we can now directly include tensorflow in the project dependencies, (as brought up in #39).

A priori, it seems like the main things to do are:

Updating the dependencies in setup.py to include tensorflow
Modifying the model definitions to be tf.keras compatible
Porting the model files to a format that can be loaded by tf.keras with TF 2.x

The main concern that comes to mind is the regression tests. We have already seen that tensorflow > 1.13 causes regression tests to fail. I imagine that this will only worsen as we introduce not only a new major release to TF, but also a divergence in Keras with tf.keras. @justinsalamon, what are your thoughts?

Expected new release

Hello community!

I've seen in the documentation and in some pre-releases that you already have the new version (supporting TF2) almost ready. I wonder when it will be available.

Best regards and thank you very much for your work!

Pretrained fusion layers

Hi !

Thanks for providing OpenL3 as a package.
I would like to ask if one could access the pretrained fusion layers, which would classify whether a pair of image/audio embeddings are matched or not.

Otherwise I would train it myself, on top of the pretrained fixed embedding extractors you provide.
But as a starting point, I would take your classifier if possible.

Best

Bug in CLI centering?

I am invoking openl3 on one second audio. Centering is enabled:

openl3 1secondsounds/ --content-type music --input-repr mel256 --audio-embedding-size 512

This should pad with 0.5 seconds of silence at the beginning, which means I should have timestamps at 0.0, 0.1, ... 0.5, correct?

Instead I only have (1, 512) embedding and timestamp only 0.0.

Why?

Failed to construct network. Tried both Tensorflow 2.0.0 and 1.9.5, Keras2.3.1 and 2.2.5

openl3.models._construct_mel128_audio_network()
WARNING:tensorflow:From /home/zhaos/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:66: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /home/zhaos/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:541: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/zhaos/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Traceback (most recent call last):
File "", line 1, in
File "/home/zhaos/anaconda3/lib/python3.7/site-packages/openl3/models.py", line 265, in _construct_mel128_audio_network
y_a = Melspectrogram(n_dft=n_dft, n_hop=n_hop, n_mels=n_mels, sr=asr, power_melgram=1.0, htk=True, return_decibel_melgram=True, padding='same')(x_a)
File "/home/zhaos/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 881, in call
inputs, outputs, args, kwargs)
File "/home/zhaos/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 2043, in set_connectivity_metadata
input_tensors=inputs, output_tensors=outputs, arguments=arguments)
File "/home/zhaos/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 2059, in _add_inbound_node
input_tensors)
File "/home/zhaos/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/util/nest.py", line 536, in map_structure
structure[0], [func(*x) for x in entries],
File "/home/zhaos/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/util/nest.py", line 536, in
structure[0], [func(*x) for x in entries],
File "/home/zhaos/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 2058, in
inbound_layers = nest.map_structure(lambda t: t._keras_history.layer,
AttributeError: 'tuple' object has no attribute 'layer'

pip error when installing

I'm getting an error during pip install:
urllib.error.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>

I'd appreciate the help! :)

console output:

(openl3_venv) hugoffg is at openl3: >:-)  pip3 install openl3
Collecting openl3
  Using cached openl3-0.3.1.tar.gz (16 kB)
    ERROR: Command errored out with exit status 1:
     command: /Users/hugoffg/Documents/notebooks/openl3_venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/vl/n_1wx0vx41s979274z7hctnr0000gn/T/pip-install-bg57tw39/openl3/setup.py'"'"'; __file__='"'"'/private/var/folders/vl/n_1wx0vx41s979274z7hctnr0000gn/T/pip-install-bg57tw39/openl3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/vl/n_1wx0vx41s979274z7hctnr0000gn/T/pip-pip-egg-info-jle_emrm
         cwd: /private/var/folders/vl/n_1wx0vx41s979274z7hctnr0000gn/T/pip-install-bg57tw39/openl3/
    Complete output (55 lines):
    Downloading weight file openl3_audio_linear_music-v0_2_0.h5.gz ...
    Traceback (most recent call last):
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
        encode_chunked=req.has_header('Transfer-encoding'))
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1252, in request
        self._send_request(method, url, body, headers, encode_chunked)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1298, in _send_request
        self.endheaders(body, encode_chunked=encode_chunked)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1247, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1026, in _send_output
        self.send(msg)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 966, in send
        self.connect()
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1414, in connect
        super().connect()
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 938, in connect
        (self.host,self.port), self.timeout, self.source_address)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 707, in create_connection
        for res in getaddrinfo(host, port, 0, SOCK_STREAM):
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 748, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno 8] nodename nor servname provided, or not known
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/vl/n_1wx0vx41s979274z7hctnr0000gn/T/pip-install-bg57tw39/openl3/setup.py", line 35, in <module>
        urlretrieve(base_url + compressed_file, compressed_path)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 247, in urlretrieve
        with contextlib.closing(urlopen(url, data)) as fp:
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
        return opener.open(url, data, timeout)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
        response = meth(req, response)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
        'http', request, response, code, msg, hdrs)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 563, in error
        result = self._call_chain(*args)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
        result = func(*args)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 755, in http_error_302
        return self.parent.open(new, timeout=req.timeout)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
        response = self._open(req, data)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
        '_open', req)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
        result = func(*args)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
        context=self._context, check_hostname=self._check_hostname)
      File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
        raise URLError(err)
    urllib.error.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

What is missing for Image embedding?

It seems like the rep includes a trained parameters for the image model. Does the missing part is API implementation, or the parameters are wrong?
Thanks

Pytorch version

I too would like a Pytorch version, as per #35

Our audio models are in Pytorch and we have no idea how to efficiently convert them in GPU to tensorflow tensors to get openl3 output

conda-forge packaging?

Shipping conda(forge) packages would be helpful for having a slightly smoother installation process with dependency on pysoundfile/libsndfile. I'm happy to help get this set up if yall are interested.

tensorflow 2.1 doesn't require separate pip installes for gpu and cpu

Thanks for this great package! We love to use it!

You state

Because Tensorflow comes in CPU-only and GPU variants, we leave it up to the user to install the version that best fits their usecase.

This is not the case anymore in 2.1 so you could (if 2.1 is supported) make tensorflow part of the standard requirements.

Unable to find models

Hello, thank you for this wonderful tool

I am unable to find the "h5" models when I call openl3.get_audio_embedding function.

The error is:

OSError: Unable to open file (unable to open file: name = '/content/openl3/openl3/openl3_audio_linear_env.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I checked the openl3 folder and there are indeed no model files there. Can you please confirm if the model links are active? Perhaps the models are not being downloaded due to a broken link.

Thank you

path of openl3_audio_linear_music-v0_2_0.h5.gz not exists

failed of downloading https://github.com/marl/openl3/raw/models/openl3_audio_linear_music-v0_2_0.h5.gz, the path is not exists?

Write unit tests

Write unit tests with assumed API structure to ensure correct behavior. Additionally, stub out functions.

urllib.error.URLError: <urlopen error [WinError 10060]

when i try to "Downloading weight file openl3_audio_linear_music-v0_2_0.h5.gz", it appears. it seems there are some wrong with the url you provide in your setup.py code

why I can't install openl3 now？

whether use pip or git clone github repository. When I use pip ，Then it will hang in this step.

When I git clone the repository, so did the same .

Disable logging

Is there any way to disable the logging messages?

21/21 [==============================] - 0s 1ms/step

I find no way to disable them.

Explicitly require tensorflow < 2.0

As brought up in #39, pip install openl3 force installs tensorflow 2.*, which is currently incompatible with the current package.

We'll need to ensure that keras < 2.3.0 (2.3.0 uses TF 2.0) and add a note to the README that TF 2.* is not yet supported and make a v0.3.1 release. We'll aim to move to TF 2.0 (and thus tf.keras) in a future release at some point.

Implement API

Implement all of the functions for API, make sure to add descriptive docstrings that conform with the Numpy docstring format.

cant isntall openl3 in my ubuntu system

I can't install it no matter what method I use. Such as pip, conda or git
the error was host connection refused

Output file format and naming convention

I have some questions about how to deal with embedding outputs:

Should we include the timestamps? If so do we save it in the same file?
What format should we use?
- h5: Nice compression options, but since these typically shouldn't be large, it might be more annoying to deal with than other options
- npy/npz: Standard approach, can easily load numpy arrays directly
- JAMS: Using JAMs would help expand its use and would have a natural way to associate the timestamps with each embedding, but storing all of the values as text might be cumbersome and make the files big, especially if they are long
Should we use the embedding type to name the embedding? e.g. example_audio_openl3_6144emb_linear_music.<ext> Or should we just keep it simple?
- It might be good if the user is comparing different embeddings, but it might be cumbersome if people just want to use a single type of embeddings. Of course we could add an option for this, but adding another option for something like this might be excessive.

git clone pulls weight files as well by default

I wonder if it would make sense to move the model files to their own repo or if there's a way to not have them pull by default.

Basically, this is a tad annoying haha.

Cloning into 'openl3'...
remote: Enumerating objects: 1460, done.
remote: Counting objects: 100% (44/44), done.
remote: Compressing objects: 100% (36/36), done.
Receiving objects:  24% (357/1460), 444.15 MiB | 2.55 MiB/s

Handling short audio

I'm thinking about how to handle these cases with short audio:

Empty audio array
- Proposed solution: Throw an error. We shouldn't try to process empty audio forms
Audio array has less than one second of audio
- Proposed solution: Pad audio to one second and process as usual. Maybe warn the user?

What are your thoughts?

memory continue growing when get_embedding

It seems that the model creates a new Tensorflow Variable each time when processing a new audio. Please clarify this.

[DOC] Tutorial could include a classification example

One of the common questions that I get is "how should i build an audio classifier?"

My canned response is "openl3 + sklearn RandomForest". The tutorial in the docs is great for the first half, but if it was expanded to show how to combine it with a classifier on some example dataset (eg urbansed or something), it would be much easier for novices to pick up and run with.

Upload models

Create a new branch and upload compressed models to it. Make sure we come up with a reasonable naming convention based on API.

Save separate embedding models for audio and video

Currently, we save the full L3 model, but it would probably make more sense to separate the embeddings.

Design API

Determine the structure of the API and function signatures

Make an object-oriented interface for embedding models

An object interface for extracting embeddings would make it simpler to extract multiple embeddings and would allow for the embedding model to just be loaded once instead of every time the get_embedding function is called.

At least to begin with, the interface could just be something like

m = EmbeddingModel(input_repr="mel256", content_type="music", 
                                    embedding_size=6144, center=True, hop_size=0.1):
emb, ts = m.get_embedding(audio, sr, verbose=1)
m.process_file(filepath, output_dir=None, suffix=None)

Basically, we'd just put all of the functions from https://github.com/marl/openl3/blob/master/openl3/core.py into a class.

Set up Travis and Coveralls

Add batch processing mode

Something else to consider is a batch processing mode. i.e. making more efficient use of the GPU by predicting multiple files at once.

Probably the least messy option would be to separate some of the interior code of get_audio_embedding for the case of audio into their own functions and make a get_audio_embedding_batch function that calls most of the same functions. We would also have a process_audio_file_batch function.

I thought about changing get_audio_embedding so that it can either take in a single audio array, or a list of audio arrays (and probably a list of corresponding sample rates). While this might consolidate multiple usecases into one function, it'd probably get pretty messy so it's probably best we don't do this.

Regarding the visual frame embedding extraction, we could ask the same question, though there might be more nuance depending on if we allow for individual images to be processed or not (I think we should). In the case of videos though, multiple frames are already being provided at once. So it raises a question (to me at least) whether we allow for get_vframe_embedding (as I'm currently calling it) should support both a single frame as well as multiple. This also raises the question of whether we allow for frames of multiple sizes or not.

Thoughts?

with tempfile.TemporaryDirectory() as tempdir:
    filename48 = os.path.join(tempdir, os.path.basename(os.path.splitext(file_path)[0]) + "_48.wav") 
    # resample to 48000 Hz
    tfm = sox.Transformer()
    tfm.convert(samplerate= 48000, n_channels=1)
    tfm.build(file_path, filename48)

    # load the audio file
    sr = 48000
    audio, srload = librosa.load(filename48, sr=None)
    assert srload == sr

NOTE: this example assumes wav input, and would have to be adapted to support various formats (in particular mp3)

marl / openl3 Goto Github PK

openl3's Introduction

OpenL3

Installing OpenL3

Dependencies

libsndfile

Tensorflow

Tensorflow 1x & OpenL3 <= v0.3.1

Installing OpenL3

Using OpenL3

Acknowledging OpenL3

Model Weights License

openl3's People

Contributors

Stargazers

Watchers

Forkers

openl3's Issues

Pros

Cons

Recommend Projects

Recommend Topics

Recommend Org