marl / medleydb Goto Github PK
View Code? Open in Web Editor NEWHome Page: http://medleydb.weebly.com/
License: MIT License
Home Page: http://medleydb.weebly.com/
License: MIT License
Installing a Python package with setuptools, that has medleydb as a dependency, results in the following issue:
warning: no files found matching 'medleydb/taxonomy.yaml'
Here's the setup.py for reference:
setup(
...
install_requires=["medleydb==1.2"],
dependency_links=["git+git://github.com/marl/medleydb.git@medleydb_v1.2#egg=medleydb-1.2"]
)
In medleydb/medleydb/__init__.py
, there's a tiny typo in the assertion, and it's not clear which it should be (L11):
AssertionError: The environment variable MEDLEYDB_PATH
is not set. Set the value of MEDLEYDB_DIR to your local path to MedeleyDB.
(Thank you to Yukara Ikemiya for reporting this mistake.)
The rankings for the two melody stems in TheDistricts_Vermont are both 1, and the resulting melody annotations are incorrect as a result.
Fix:
stem 05 should have rank 1, stem 07 should have rank 2
rerun generate melody annotations script
Acknowledging that this is probably a result of how I've "installed" the code, I'm having an issue with the auto-derived taxonomy path.
To install, I cloned the repository locally and moved the whole dealy into my site-packages folder so I could access it system-wide. After configuring where the dataset lives, I tried to get files for an instrument voice. This results in an error, reporting that it's looking for the taxonomy here:
/blah/blah/site-packages/medleydb/medleydb/taxonomy.yaml
while the file is actually in ...
/blah/blah/site-packages/medleydb/taxonomy.yaml
I made a symlink into the subdirectory where it's looking because I'm a hack (and proud of it), but I'd up-vote either more specific install instructions so that this is avoided, or a setup installer that makes this moot.
I re-read the medleydb paper, and it doesn't actually say how the instrument segment annotations were computed from the activation functions. Is it just threshold at 0.5 and then run-length encode samples to intervals? Or is there some smoothing involved?
MusicDelta_Country2_RAW_04_01 seems to be a (not exact) duplicate of
MusicDelta_Country2_RAW_03_01
Hi,
Some instrument activation confidence file names โ in Annotations/Instrument Activations/ACTIVATION_CONF โ have discrepancies with the audio files they refer to. I think this is due to the presence of non-alphanumeric characters: parentheses, hyphens, and apostrophes.
Here is the list of before / after names.
CroqueMadame_Pilot(Lakelot)
CroqueMadame_Pilot
JoelHelander_IntheAtticBedroom(SuitePartThree)
JoelHelander_IntheAtticBedroom
Phoenix_BrokenPledge-ChicagoReel
Phoenix_BrokendPledge
Phoenix_Elzic'sFarewell
Phoenix_ElzicsFarewell
Phoenix_LarkOnTheStrand-DrummondCastle
Phoenix_LarkOnTheStrandDrummodCastle
Phoenix_SeanCaughlin's-TheScartaglan
Phoenix_SeanCaughlinsTheScartaglen
There's no activation lab file for Wolf_DieBekherte here:
https://github.com/marl/medleydb/tree/master/medleydb/data/Annotations/Activation_Confidence
There's a file here though:
https://github.com/marl/medleydb/blob/master/medleydb/data/Annotations/Activation_Confidence/original_annotations/Wolf_DieBekherte_ACTIVATION_CONF.lab
This is breaking some code of mine that iterates over every mdb mix and does stuff with the activations, since track.stem_activations
for this track is empty.
Why is the file missing?
There doesn't seem to be an easy (documented) way to get a list of the tracks without calling load_all_multitracks
, which is expensive.
There is a TRACK_LIST
array in the package, but it's not documented.
It would be useful to have access to this if I want to load an arbitrary track, but don't have the names/indices pre-computed somewhere.
Hi,
The version 2 of the dataset contains 74 audio files, but only 23 of them have info about activation. Am I looking in the wrong place?
Thanks a lot!
PyYAML5.1 has deprecated the old yaml.load()
API. As the page https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation suggests, line
INST_TAXONOMY = yaml.load(fhandle)
MIXING_COEFFICIENTS = yaml.load(fhandle)
in medleydb's __init__.py
would be better change into yaml.load(input, Loader=yaml.FullLoader)
or something.
Dear,
I want to know where is the wav file,how to get the dataset?
I want the download link,not other web !
Thx
This way, the medleydb version is always available through the data without going via the python module.
#14 seems to break the track loading
a simple
mtrack_list = mdb.load_all_multitracks()
for track in mtrack_list:
print track
now results in:
(.env)$ py test.py
Traceback (most recent call last):
File "medleydb/utils.py", line 62, in load_multitracks
yield M.MultiTrack(multitrack)
File "medleydb/multitrack.py", line 75, in __init__
self.title = _path_basedir(mtrack_path).split('_')[1]
IndexError: list index out of range
this way one could easily sort the stems or load specific stem tracks. Also that way special stems like the predominant just be pointer to the stems dict and wouldn't need to create a new object.
mytrack.stems['1']
should output stem_idx 1 as a track object.
using a list is prone to errors since mytrack.stems[1]
does not necessarily exist.
Setting up a new machine with this and it seems there are some issues with the latest installer. It crashes on copying over the information:
Installing collected packages: medleydb
Running setup.py install for medleydb: started
Running setup.py install for medleydb: finished with status 'error'
Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-1gmlwvfa-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-0tsnxo8i-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib
creating build/lib/medleydb
copying medleydb/version.py -> build/lib/medleydb
copying medleydb/mix.py -> build/lib/medleydb
copying medleydb/download.py -> build/lib/medleydb
copying medleydb/multitrack.py -> build/lib/medleydb
copying medleydb/__init__.py -> build/lib/medleydb
copying medleydb/utils.py -> build/lib/medleydb
creating build/lib/medleydb/resources
copying medleydb/resources/taxonomy.yaml -> build/lib/medleydb/resources
copying medleydb/resources/tracklist_bach10.txt -> build/lib/medleydb/resources
copying medleydb/resources/mixing_coefficients_version2.yaml -> build/lib/medleydb/resources
copying medleydb/resources/tracklist_v1.txt -> build/lib/medleydb/resources
copying medleydb/resources/instrument_f0_type.json -> build/lib/medleydb/resources
copying medleydb/resources/pyin.n3 -> build/lib/medleydb/resources
copying medleydb/resources/artist_index.json -> build/lib/medleydb/resources
copying medleydb/resources/client_secrets.json -> build/lib/medleydb/resources
copying medleydb/resources/mixing_coefficients.yaml -> build/lib/medleydb/resources
copying medleydb/resources/tracklist_extra.txt -> build/lib/medleydb/resources
copying medleydb/resources/tracklist_v2.txt -> build/lib/medleydb/resources
creating build/lib/medleydb/data
error: can't copy 'medleydb/data/Annotations': doesn't exist or not a regular file
My guess is it is trying to copy the Annotations directory as a file, rather than the contents.
The latest release version installed fine though.
Currently, medleydb.utils.artist_conditional_split relies on the ShuffleLabelsOut class, which is using the soon to be deprecated sklearn.cross_validation. It needs to be updated to use sklearn.model_selection but things have changed enough that a simple switch breaks the code. @bmcfee I took a stab at this myself but don't understand what you did in the original version well enough to troubleshoot why my changes weren't working. Can you give it a go?
In trying to parse the data in the pitch annotations, I found read_csv_file
... it seems like what I want, but the docstring doesn't really give me an idea of what I'm going to get back.
In [18]: M.multitrack.read_csv_file?
Type: function
String Form:<function read_csv_file at 0x108e8c1b8>
File: /Library/Python/2.7/site-packages/medleydb/medleydb/multitrack.py
Definition: M.multitrack.read_csv_file(fpath, maxcols=None)
Docstring:
Read a csv file.
It seems tracklist_v1.txt moved to a subdirectory but __init__.py
wasn't updated to reflect this. See: https://github.com/marl/medleydb/blob/medleydb_v1.2/medleydb/__init__.py#L50
The init script expects metadata to be placed in a repository root, but if building with pip directly from git such as
pip install git+git://github.com/marl/medleydb.git
the directory structure becomes different and the MedleyDB Python tools won't work.
Since using pip with git is a fairly common way of installing from source it would be nice if the directory structure wasn't assumed. A neat and transparent fix would be to look for all resources (taxonomy.yaml, tracklist_v1.txt, Annotations/, Metadata/, etc.) in the MEDLEYDB_PATH as well, in the init script.
(Thanks @lostanlen for noting this)
The following stems have bleed, but are not labeled as having bleed:
BrandonWebster_DontHearAThing_STEM_02.wav
ClaraBerryAndWooldog_Boys_STEM_05.wav
LizNelson_ImComingHome_STEM_02.wav
TablaBreakbeatScience_WhoIsIt_RAW_03_01.wav
TablaBreakbeatScience_WhoIsIt_RAW_04_01.wav
TablaBreakbeatScience_WhoIsIt_RAW_04_02.wav
all have a lot of bleed.
They should probably be added to ERRATA.md
Not sure if the correct error code would be
1 Stem/Raw contains bleed, track not tagged as has_bleed
or
4 Raw does not match Stem
both seem to apply.
The initial release of MedleyDB contained human-generated melody annotations using the Tony tool [4]. However, the process was difficult to sustain in the long term, thus for this iteration of the dataset we rely primarily on automatic annotations. The automatic annotations include instrument activations and synthetic melody, multi-f0 and bass annotations.
I'm looking forward to MedleyDB 2.0 (when will it be available?) and intend to use it for various MIR tasks. However, I read that the new annotations are automatically generated and worry that this causes a chicken and egg problem. My hope was to use the annotations for training multi-f0 estimation models, but surely an upper bound on f-measure will be introduced by the fact that the annotations have been automatically generated themselves.
Could you expand a little on how the new annotations have been developed? How much do they differ compared to what human listeners would annotate? Particularly multi-f0 annotations are difficult to get right but I'm also concerned about onset annotations (and even melody annotations to some extent).
The sql submodule exists already, but this might be better suited within a pandas data frame.
Recently the annotations have been added to the data dir of this repo
In [1]: import medleydb
File "/home/bmcfee/data/medleydb/medleydb/__init__.py", line 20
the top level Audio folder for MedeleyDB."""
^
SyntaxError: Missing parentheses in call to 'print'
.. or at least do a from __future__ import print_function
. But I suspect there are other less obvious gotchas kicking around.
(also Medley
is misspelled :))
When running pip install .
inside the root medleydb directory I get this error
Installing collected packages: medleydb
Running setup.py install for medleydb ... error
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-3m5VSF/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-jCG1KO/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/medleydb
copying medleydb/mix.py -> build/lib.linux-x86_64-2.7/medleydb
copying medleydb/version.py -> build/lib.linux-x86_64-2.7/medleydb
copying medleydb/multitrack.py -> build/lib.linux-x86_64-2.7/medleydb
copying medleydb/__init__.py -> build/lib.linux-x86_64-2.7/medleydb
copying medleydb/utils.py -> build/lib.linux-x86_64-2.7/medleydb
copying medleydb/download.py -> build/lib.linux-x86_64-2.7/medleydb
creating build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/tracklist_bach10.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/taxonomy.yaml -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/instrument_f0_type.json -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/mixing_coefficients.yaml -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/mixing_coefficients_version2.yaml -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/tracklist_v1.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/client_secrets.json -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/tracklist_extra.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/artist_index.json -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/tracklist_v2.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
copying medleydb/resources/pyin.n3 -> build/lib.linux-x86_64-2.7/medleydb/resources
creating build/lib.linux-x86_64-2.7/medleydb/data
error: can't copy 'medleydb/data/Metadata': doesn't exist or not a regular file
----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-3m5VSF/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-jCG1KO/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-3m5VSF/
It feels like a similar error that is thrown when you attempt to cp
a directory without specifying --recursive
. I recommend a fix according to this Stackoverflow thread
--- Relevant Specifications ---
Ubuntu 16.04
Python 2.7.12
setuptools 20.7.0
Installs fine but when I run import medleydb
....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nick/Downloads/medleydb-1.1/medleydb/__init__.py", line 20
the top level Audio folder for MedeleyDB."""
^
SyntaxError: Missing parentheses in call to 'print'
Add the mix duration to the top level of the metadata files to make it possible to work with the annotations without the audio.
The MedleyDB website makes no mention of the ISMIR 2016 paper yet, and the link in the paper isn't working. I'm wondering where I can find the new MedleyDB 2.0 dataset?
I get that everyone has a lot on their plate which is great for MIR but I'd really like to hear the new tracks. ๐
Quick example:
... or maybe the reverse. I'm not sure.
>>> t = next(medleydb.load_all_multitracks())
>>> t.activation_conf_from_stem(0)[:10]
[[0.0, 0.0],
[0.0464, 0.0464],
[0.0929, 0.0929],
[0.1393, 0.1393],
[0.1858, 0.1858],
[0.2322, 0.2322],
[0.2786, 0.2786],
[0.3251, 0.3251],
[0.3715, 0.3715],
[0.418, 0.418]]
This seems to come from an error at this line, or possibly in the parsing of the confidence file.
I am iterating over all of the tracks to get their annotations and I get a FileNotFoundError. I double and tripple-checked and there are no annotation files for this particular track. Do the files exist somewhere?
Loading multitracks (e.g. via the load_multitracks generator) is quite slow, a random timing of loading 20 tracks took on average 0.7 s per track, with one track even taking 1.6 s to load. When you only need a single bit of information about each track this becomes quite penalizing (especially during a dev phase where you iterate over your code).
It would be helpful if either the loading time was improved across the board somehow (if possible?) or, alternatively, there was the option to only load partial information about each multitrack (e.g. via an optional parameter that takes a list of the things you want to load) so that loading can be made more agile when not all the multitrack info is needed.
AClassicEducation_NightOwl_RAW_13_01.wav
and AClassicEducation_NightOwl_RAW_13_02.wav
have some kind of modulation univibe-like effect applied to the vocals, and should not be considered raw.
I'm also guessing many raw tracks are dynamic range compressed and frequency equalized etc. (guessing artists tend to use channel-strips in their recording chain maybe) but that might be less of an issue in most applications. Are there any clear definitions on what constitutes a raw track in MedleyDB?
For reference, compare with AClassicEducation_NightOwl_RAW_13_04.wav
which doesn't have the modulation effect.
I've been rolling my own mixing code for MedleyDB but saw in #40 that there are branches with mixing functionality builtin that look very useful. What's the status on these? Could we expect the master branch to have these soon? https://github.com/marl/medleydb/tree/hanna/mix https://github.com/marl/medleydb/tree/mixing-tools
PS: It would be neat IMHO if master was https://github.com/marl/medleydb/tree/medleydb_v1.2 because it seems stable and also has Python 3 support.
only the scikit-learn
dependecy specifies a version. The other dependencies should have a version too.
The meta data entry for Creepoid_OldTree_RAW says it has no bleed, but
Creepoid_OldTree_RAW_02_01.wav
contains bass and drums instead of bass only
After discussing with @lostanlen, it makes sense to have has_bleed
annotations at the stem level.
Open questions
has_bleed
annotations, and if so, what is the criterion (at least one stem with bleed, majority of stems with bleed, etc.)I noticed that there is plenty more songs in the EXTRA dataset. I also noticed that there is a download script pointing to a private Google Drive. Is it possible to gain access to this extra data?
There are (rare) cases where the activation confidence annotations have only a subset of the stems annotated. Specifically, any stem labeled as instrument='Main System'
is not annotated with stem activations. Morevover, the subset might not be ordered numerically, breaking the assumption in activation_conf_from_stem
that all stems are listed and in order.
An example where this occurs:
>> import medleydb as mdb
>> mtrack = mdb.MultiTrack("Phoenix_ScotchMorris")
>> mtrack.stems.keys()
[1, 2, 3, 4]
>> mtrack.activation_conf_from_stem(4)
IndexError: list index out of range
Currently this package does not support to return the track activations. Is this of interest for you? I can implement this and send a PR.
Track objects corresponding to stems could report activation percentage.
Two points:
mtrack.stem_activations
is a list of lists, but the docs say it's an ndarray
The first point is an easy fix.
The second point is confusing, and it's generally not good style to mix indexing/addressing (ie timestamps) with observation data. (If you do so, it should definitely be documented.) I recommend refactoring this so that the time index is stored separately.
Also the command :
pip install -e .[sql]
gives the following:
medleydb 1.2.9 does not provide the extra 'sql'
although the medleydb gets installed fine by this command and medleydb can be imported without any error. Further the command
medleydb-export
gives the following error:
bash: medleydb-export: command not found
My OS is Centos 6.4 and Python version is 3.5.2.
Even though pip
can be used with GitHub it would be neat to have the Python tools available on PyPI for easier installation with conda
for example.
The repo tag indicate a 1.0.0
release of medleydb repo
The python package however is still at version='0.1.0'
I would suggest bumping the version here as well
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.