cpjku / madmom Goto Github PK

View Code? Open in Web Editor NEW

1.3K 43.0 201.0 6.42 MB

Python audio and music signal processing library

Home Page: https://madmom.readthedocs.io

License: Other

Python 97.21% Cython 2.79%

audio-analysis signal-processing machine-learning music-information-retrieval python numpy scipy cython

madmom's Introduction

madmom

Madmom is an audio signal processing library written in Python with a strong focus on music information retrieval (MIR) tasks.

The library is internally used by the Department of Computational Perception, Johannes Kepler University, Linz, Austria (http://www.cp.jku.at) and the Austrian Research Institute for Artificial Intelligence (OFAI), Vienna, Austria (http://www.ofai.at).

Possible acronyms are:

Madmom Analyzes Digitized Music Of Musicians
Mostly Audio / Dominantly Music Oriented Modules

It includes reference implementations for some music information retrieval algorithms, please see the References section.

Documentation

Documentation of the package can be found online http://madmom.readthedocs.org

License

The package has two licenses, one for source code and one for model/data files.

Source code

Unless indicated otherwise, all source code files are published under the BSD license. For details, please see the LICENSE file.

Model and data files

Unless indicated otherwise, all model and data files are distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license.

If you want to include any of these files (or a variation or modification thereof) or technology which utilises them in a commercial product, please contact Gerhard Widmer.

Installation

Please do not try to install from the .zip files provided by GitHub. Rather install it from package (if you just want to use it) or source (if you plan to use it for development) by following the instructions below. Whichever variant you choose, please make sure that all prerequisites are installed.

Prerequisites

To install the madmom package, you must have either Python 2.7 or Python 3.5 or newer and the following packages installed:

In order to test your installation, process live audio input, or have improved FFT performance, additionally install these packages:

If you need support for audio files other than .wav with a sample rate of 44.1kHz and 16 bit depth, you need ffmpeg (avconv on Ubuntu Linux has some decoding bugs, so we advise not to use it!).

Please refer to the requirements.txt file for the minimum required versions and make sure that these modules are up to date, otherwise it can result in unexpected errors or false computations!

Install from package

The instructions given here should be used if you just want to install the package, e.g. to run the bundled programs or use some functionality for your own project. If you intend to change anything within the madmom package, please follow the steps in the next section.

The easiest way to install the package is via pip from the PyPI (Python Package Index):

pip install madmom

This includes the latest code and trained models and will install all dependencies automatically.

You might need higher privileges (use su or sudo) to install the package, model files and scripts globally. Alternatively you can install the package locally (i.e. only for you) by adding the --user argument:

pip install --user madmom

This will also install the executable programs to a common place (e.g. /usr/local/bin), which should be in your $PATH already. If you installed the package locally, the programs will be copied to a folder which might not be included in your $PATH (e.g. ~/Library/Python/2.7/bin on Mac OS X or ~/.local/bin on Ubuntu Linux, pip will tell you). Thus the programs need to be called explicitely or you can add their install path to your $PATH environment variable:

export PATH='path/to/scripts':$PATH

Install from source

If you plan to use the package as a developer, clone the Git repository:

git clone --recursive https://github.com/CPJKU/madmom.git

Since the pre-trained model/data files are not included in this repository but rather added as a Git submodule, you either have to clone the repo recursively. This is equivalent to these steps:

git clone https://github.com/CPJKU/madmom.git
cd madmom
git submodule update --init --remote

Then you can simply install the package in development mode:

python setup.py develop --user

To run the included tests:

python setup.py pytest

Upgrade of existing installations

To upgrade the package, please use the same mechanism (pip vs. source) as you did for installation. If you want to change from package to source, please uninstall the package first.

Upgrade a package

Simply upgrade the package via pip:

pip install --upgrade madmom [--user]

If some of the provided programs or models changed (please refer to the CHANGELOG) you should first uninstall the package and then reinstall:

pip uninstall madmom
pip install madmom [--user]

Upgrade from source

Simply pull the latest sources:

git pull

To update the models contained in the submodule:

git submodule update

If any of the .pyx or .pxd files changed, you have to recompile the modules with Cython:

python setup.py build_ext --inplace

Package structure

The package has a very simple structure, divided into the following folders:

/bin: this folder includes example programs (i.e. executable algorithms)
/docs: package documentation
/madmom: the actual Python package
/madmom/audio: low level features (e.g. audio file handling, STFT)
/madmom/evaluation: evaluation code
/madmom/features: higher level features (e.g. onsets, beats)
/madmom/ml: machine learning stuff (e.g. RNNs, HMMs)
/madmom/models: pre-trained model/data files (see the License section)
/madmom/utils: misc stuff (e.g. MIDI and general file handling)
/tests: tests

Executable programs

The package includes executable programs in the /bin folder. If you installed the package, they were copied to a common place.

All scripts can be run in different modes: in single file mode to process a single audio file and write the output to STDOUT or the given output file:

DBNBeatTracker single [-o OUTFILE] INFILE

If multiple audio files should be processed, the scripts can also be run in batch mode to write the outputs to files with the given suffix:

DBNBeatTracker batch [-o OUTPUT_DIR] [-s OUTPUT_SUFFIX] FILES

If no output directory is given, the program writes the output files to the same location as the audio files.

Some programs can also be run in online mode, i.e. operate on live audio signals. This requires pyaudio to be installed:

DBNBeatTracker online [-o OUTFILE] [INFILE]

The pickle mode can be used to store the used parameters to be able to exactly reproduce experiments.

Please note that the program itself as well as the modes have help messages:

DBNBeatTracker -h

DBNBeatTracker single -h

DBNBeatTracker batch -h

DBNBeatTracker online -h

DBNBeatTracker pickle -h

will give different help messages.

Additional resources

Mailing list

The mailing list should be used to get in touch with the developers and other users.

Wiki

The wiki can be found here: https://github.com/CPJKU/madmom/wiki

FAQ

Frequently asked questions can be found here: https://github.com/CPJKU/madmom/wiki/FAQ

Citation

If you use madmom in your work, please consider citing it:

@inproceedings{madmom,
   Title = {{madmom: a new Python Audio and Music Signal Processing Library}},
   Author = {B{\"o}ck, Sebastian and Korzeniowski, Filip and Schl{\"u}ter, Jan and Krebs, Florian and Widmer, Gerhard},
   Booktitle = {Proceedings of the 24th ACM International Conference on
   Multimedia},
   Month = {10},
   Year = {2016},
   Pages = {1174--1178},
   Address = {Amsterdam, The Netherlands},
   Doi = {10.1145/2964284.2973795}
}

References

[1]	Florian Eyben, Sebastian Böck, Björn Schuller and Alex Graves, Universal Onset Detection with bidirectional Long Short-Term Memory Neural Networks, Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), 2010.

[2]	Sebastian Böck and Markus Schedl, Enhanced Beat Tracking with Context-Aware Neural Networks, Proceedings of the 14th International Conference on Digital Audio Effects (DAFx), 2011.

[3]	Sebastian Böck and Markus Schedl, Polyphonic Piano Note Transcription with Recurrent Neural Networks, Proceedings of the 37th International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.

[4]	Sebastian Böck, Andreas Arzt, Florian Krebs and Markus Schedl, Online Real-time Onset Detection with Recurrent Neural Networks, Proceedings of the 15th International Conference on Digital Audio Effects (DAFx), 2012.

[5]	Sebastian Böck, Florian Krebs and Markus Schedl, Evaluating the Online Capabilities of Onset Detection Methods, Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), 2012.

[6]	Sebastian Böck and Gerhard Widmer, Maximum Filter Vibrato Suppression for Onset Detection, Proceedings of the 16th International Conference on Digital Audio Effects (DAFx), 2013.

[7]	Sebastian Böck and Gerhard Widmer, Local Group Delay based Vibrato and Tremolo Suppression for Onset Detection, Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), 2013.

[8]	Florian Krebs, Sebastian Böck and Gerhard Widmer, Rhythmic Pattern Modelling for Beat and Downbeat Tracking in Musical Audio, Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR), 2013.

[9]	Sebastian Böck, Jan Schlüter and Gerhard Widmer, Enhanced Peak Picking for Onset Detection with Recurrent Neural Networks, Proceedings of the 6th International Workshop on Machine Learning and Music (MML), 2013.

[10]	Sebastian Böck, Florian Krebs and Gerhard Widmer, A Multi-Model Approach to Beat Tracking Considering Heterogeneous Music Styles, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), 2014.

[11]	Filip Korzeniowski, Sebastian Böck and Gerhard Widmer, Probabilistic Extraction of Beat Positions from a Beat Activation Function, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), 2014.

[12]	Sebastian Böck, Florian Krebs and Gerhard Widmer, Accurate Tempo Estimation based on Recurrent Neural Networks and Resonating Comb Filters, Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), 2015.

[13]	Florian Krebs, Sebastian Böck and Gerhard Widmer, An Efficient State Space Model for Joint Tempo and Meter Tracking, Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), 2015.

[14]	Sebastian Böck, Florian Krebs and Gerhard Widmer, Joint Beat and Downbeat Tracking with Recurrent Neural Networks, Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016.

[15]	Filip Korzeniowski and Gerhard Widmer, Feature Learning for Chord Recognition: The Deep Chroma Extractor, Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016.

[16]	Florian Krebs, Sebastian Böck, Matthias Dorfer and Gerhard Widmer, Downbeat Tracking Using Beat-Synchronous Features and Recurrent Networks, Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016.

[17]	Filip Korzeniowski and Gerhard Widmer, A Fully Convolutional Deep Auditory Model for Musical Chord Recognition, Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2016.

[18]	Filip Korzeniowski and Gerhard Widmer, Genre-Agnostic Key Classification with Convolutional Neural Networks, Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), 2018.

[19]	Rainer Kelz, Sebastian Böck and Gerhard Widmer, Deep Polyphonic ADSR Piano Note Transcription, Proceedings of the 44th International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.

Acknowledgements

Supported by the European Commission through the GiantSteps project (FP7 grant agreement no. 610591) and the Phenicx project (FP7 grant agreement no. 601166) as well as the Austrian Science Fund (FWF) project Z159.

madmom's People

Contributors

Stargazers

Watchers

Forkers

eq4 dafx craffel superbock ruohoruotsi sjsawyer winchell nullmightybofo chenyao808 georgid anshumanrai mike1993 stevenlol benjamesbabala liyong3forever carlthome pcmanticore scriptator kushshr guiwenming jpauwels xiao2mo sebastianpoell mxkrn tae-jun agangzz gregfriedland andrewtheiss petermitrano richard-vogl karinaflor ninonenni suldier totalgood telesoho dkim010 westhamkdk reichenbachian geoo993 vertgo vepkenez magdalenafuentes raviskumawat trevorprater jorgehatccrma nickgang templeblock maxkohlbrenner yefenyi leandroagudelo aakashysharma veqtor rtchen landraudio rainerkelz declension wendonggan yijiuzai lucienmorey dltaixlt bjourne debeat alexuserforva brlee313 sandiz beauburrows hpx7 liunian shayan-taheri teora mgrachten chengjingfeng bhaskar-c campmindshark hadryan bytecubed di music-apps pythonthings jchammons wade-young choupijiang rkelln gaocode wzj1988tv sinhasantos jarovaisanen nogtini simonmossmyr joe-nano guitarmonyz danishack exp-time-series-tools 00001101-xt zhiguangzhang benjamin-kirkbride geraldvm xiongmaoxia duswang renierts

madmom's Issues

Python 3 compatibility

remove `fref` attribute of `Filterbank`

Move it to the subclasses which actually use/need it.

Additionally, the PitchClassProfileFilterbank does neither have corner_frequencies nor center_frequencies. This should be refactored as well.

docstring fixes

_assemble_ffmpeg_call has an extra buf_size option.

move norm_observations out of observation models

If needed, the normalisation can be performed in the beat tracking classes before the observations are passed to the Viterbi algorithm.

remove obsolete class constants

Formerly, most of these class constants were needed to set the default values for both the __init__() and the add_arguments() method. Since the latter moved to use None as default for most arguments, the class constants are more or less obsolete. We should remove them before someone starts using them.

madmom.audio.spectrogram.tuning_frequency() needs to be tested

Namespaces

Clean up namespaces of all modules, i.e. delete all imports needed only during the loading of the module.

rename quantize_events 'fps' to 'resolution' or similar?

While we're at it, make it also 2D capable, i.e. work also with beats or notes

fix ParallelProcess.init() to use super()

features.notes.write_mirex_format overwrites the length of notes even if it was given

remove block_size from Spectrogram

This is a leftover without functionality. block_size should be move to the process() method of the Processor.

add convenience methods to MIDIFile to add notes, set tempo and time signature

It would be nice to have some convenience methods to:

add notes
set tempo
set time signature

of a MIDIFile, the method should take both take input given in seconds or beats.
These methods should be added to MIDIFile since the events need to be put into a track, but the tempo and time signature events can be in another track.

Suggestion: let the method accept an argument to indicate the unit to be used (seconds/beats), if none is given, it should use the (recently removed) instance attribute.

change README for PyPI

Right now, on PyPI the same README is displayed. It includes a lot of information not needed for PyPI users but lacks other stuff such as acks.

Signal class does not provide the same functionality as SignalProcessor

Usually, all Processors provide the same functionality as the underlaying class. However, the Signal class does not provide the same functionality as the SignalProcessor, namely it misses the norm and att parameters to normalise or attenuate the signal, respectively.

make sure that pip install works as desired

Commit a75b388 removed the install_requires list from setup.py because this was the easiest to get http://madmom.readthedocs.org working.

Before that change all builds failed due to missing atlas/blas libraries when upgrading numpy/scipy.
Now, building the docs works with the "Install your project inside a virtualenv using setup.py install" option checked works at least.

If pip install madmom fails, we must look into using conda, which readthedocs added support for recently.

new segment_axis default hop_size?

unify negative indices behaviour of FramedSignal

The behaviour of negative indices for the FramedSignal is not consistent:

if a single frame at position -1 is requested, the frame left of the first one is returned (as documented),
if a slice [-1:] is requested, the last frame is returned.

The idea of returning the frame left of the first one was to be able to calculate a correct first order difference, but it is somehow not really what people expect.

set default value for norm_observations in GMMDownBeatTrackingObservationModel

refactor beats_hmm.pyx

There are numerous glitches in this module:

no clear distinction of singular/plural; i.e. BeatTrackingStateSpace refers to a single beat to be modelled, whereas PatternTrackingStateSpace models multiple patterns.
no way to model a bar with tempo transitions at the beat level
very long class names, e.g. the "tracking" part could be removed completely

refactor CRFBeatDetectionProcessor.add_tempo_arguments()

add **kwargs to process()

Add/pass **kwargs to/from the process() methods of all processors. This is needed if we want to be able to set/change/overwrite some processing options during run time.

Downmixing integer signals clips loud signals

convert docstrings to numpydoc

add option to choose method to compute TempoEstimationProcessor.interval_histogram

Right now, it always uses self.method. Also propagate this option down to process() (see Issue #33).

Also refactor the 'dbn' method functionality to its own function.

batch processing stops if non-audio files are given

This does the trick, but I am not sure what kind of error to raise:

error_loading_file.txt

extend evaluation.beats to downbeat evaluation

refactor add_arguments of all FilteredSpectrogramProcessor and MultiBandSpectrogramProcessor

Most of the duplicated code could be refactored to audio.filters.

Pickling of Processors broken for Python 3

MultiBandSpectrogram behaviour if no spectrogram is given

Right now the MultiBandSpectrogram instantiates a FilteredSpectrogram, which is a bit surprising (at least), this should be a normal Spectrogram.

unify sample_rate type

Sometimes it's float, sometimes int.

Rewrite tests to use fixtures

http://pytest.org/latest/fixture.html#fixtures gives some nice examples, but the tests don't have to be rewritten for py.test necessarily.

Refactor the way the spectrograms and diffs are stacked

Right now it is a bit limited in how different settings can be used, e.g. it is not possible to use different filterbanks for various frame sizes.

remove TempoEstimator.dominant_interval() method

This is a kind of meaningless method, module-level function dominant_interval() can be used directly instead.

ImportError: No module named TempoDetector

when i run : python TempoDetector single test.wav
the error come up,some logs is:
File "D:\Program Files\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\lib\multip
rocessing\forking.py", line 489, in prepare
file, path_name, etc = imp.find_module(main_name, dirs)
ImportError: No module named TempoDetector

please help thx

What parameters were used to generate stereo_sample.notes in the tests?

When applying the PianoTranscriptor script to the tests/data/stereo_sample.wav sample I get note predictions much different than those currently present in the tests/data/stereo_sample.notes file. I'm wondering if the tests/data/stereo_sample.notes was generated by a human hand? If not it may be helpful to provide a concrete example as it seems some of the documentation for the scripts in /bin is sparse and out of date.

I'd be pitch in and help in updating some of the documentation if it's useful to others.

remove deprecated code

rename norm_bands of MultiBandSpectrogram to norm_filter?

This is a minor inconsistency which could be resolved easily by renaming the argument.
Any thoughts?

MIDI: note_ticks_to_beats broken

While refactoring the code to use enumerate instead of range(len()) (see attached patch), I discovered that the note_ticks_to_beats method does alter the notes if called multiple times. Maybe we should save the state (i.e. if ticks are given in beats or seconds) similar to what we do with make_ticks_abs and make_ticks_rel.
midi_enumerate.txt

refactor` DBNBeatTrackingProcessor` and `DownbeatTrackingProcessor` `add_arguments`

fix import positions/orders

While we're at it, also try to fix the cyclic-import warnings.

reorder DBNBeatTrackingProcessor arguments

To be more consistent, put correct to the end.

add documentation

rename DownBeatTracker to be more specific

set default window function for stft() function?

unify ParallelProcessor.add_arguments()

ParallelProcessor.add_arguments() is the only add_arguments method which does not follow the convention that an argument parser is not added to the group if it is None. The meaning of None and negative numbers for num_threads should be reverse.

refactor PropertyMixin

It would be nice to not have this imported at several places. Better make a private Mixin per module, this also helps to keep the namespace clean.

redo MFCCs

Set the filterbank and transform to MelFilterbank and dct, respectively? There's always the Cepstrogram class if other parameters are needed / wanted.

Move the FFCC_* constants into the class.

mm.audio.ffmpeg.get_file_info fails extracting sample_rate when using avprobe

avprobe (version 9.18) prints sample_rate as a float:

[streams.stream.0]
index=0
codec_name=flac
codec_long_name=FLAC (Free Lossless Audio Codec)
codec_type=audio
codec_time_base=1/44100
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
sample_rate=44100.000000
channels=1
bits_per_sample=0
avg_frame_rate=0/0
time_base=1/44100
start_time=N/A
duration=216.685714

which is why

info['sample_rate'] = int(line[len('sample_rate='):])

does not work.

Python 3 compatibility of evaluation.alignment.load_alignment

Possible solution:

if values is None:
        # return 'empty' alignment
        return np.array([[0, -1]])
    elif isinstance(values, (list, np.ndarray)):
        values = np.atleast_2d(values)
    else:
        values = np.loadtxt(values, ndmin=2)

Check if np.atleast_2d(values) is enough and unify the other loading functions (beats, etc.).

pickling / unpickling of data object

While working on issue #44, I discovered that not all information is recovered after pickling the data class objects. E.g. the Spectrogram does not save its stft sand frames attribute (which is totally ok, since it would require a lot of extra space), but in turn is not able to obtain the bin_frequencies, since it requires information about the sample_rate of the underlying audio. Possible solutions would be:

save the crucial information when pickling and use it after unpickling,
remove all the pickling of data classes functionality,
clearly state that not everything can be done after pickling data objects

Of course 1) is the desired solution, but if no-one uses the functionality right now (it is a leftover of how I prepared the data for training of neural networks) 2) would also be a valid solution. We can always re-add the functionality later if needed.

Any thoughts?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.