marl / medleydb Goto Github PK

View Code? Open in Web Editor NEW

176.0 176.0 41.0 262.21 MB

Home Page: http://medleydb.weebly.com/

License: MIT License

Python 99.17% Shell 0.83%

medleydb's People

Contributors

Stargazers

Watchers

medleydb's Issues

warning: no files found matching 'medleydb/taxonomy.yaml'

Installing a Python package with setuptools, that has medleydb as a dependency, results in the following issue:

warning: no files found matching 'medleydb/taxonomy.yaml'

Here's the setup.py for reference:

setup(
    ...
    install_requires=["medleydb==1.2"],
    dependency_links=["git+git://github.com/marl/medleydb.git@medleydb_v1.2#egg=medleydb-1.2"]
)

Environment variable name inconsistency

In medleydb/medleydb/__init__.py, there's a tiny typo in the assertion, and it's not clear which it should be (L11):

AssertionError: The environment variable MEDLEYDB_PATH
is not set. Set the value of MEDLEYDB_DIR to your local path to MedeleyDB.

incorrect RANKINGS file - TheDistricts_Vermont

(Thank you to Yukara Ikemiya for reporting this mistake.)

The rankings for the two melody stems in TheDistricts_Vermont are both 1, and the resulting melody annotations are incorrect as a result.

Fix:
stem 05 should have rank 1, stem 07 should have rank 2
rerun generate melody annotations script

Code has incorrect taxonomy.yaml path

Acknowledging that this is probably a result of how I've "installed" the code, I'm having an issue with the auto-derived taxonomy path.

To install, I cloned the repository locally and moved the whole dealy into my site-packages folder so I could access it system-wide. After configuring where the dataset lives, I tried to get files for an instrument voice. This results in an error, reporting that it's looking for the taxonomy here:

/blah/blah/site-packages/medleydb/medleydb/taxonomy.yaml

while the file is actually in ...

/blah/blah/site-packages/medleydb/taxonomy.yaml

I made a symlink into the subdirectory where it's looking because I'm a hack (and proud of it), but I'd up-vote either more specific install instructions so that this is avoided, or a setup installer that makes this moot.

Converting instrument activations to segments?

I re-read the medleydb paper, and it doesn't actually say how the instrument segment annotations were computed from the activation functions. Is it just threshold at 0.5 and then run-length encode samples to intervals? Or is there some smoothing involved?

Duplicate raw track

MusicDelta_Country2_RAW_04_01 seems to be a (not exact) duplicate of
MusicDelta_Country2_RAW_03_01

Non-alphanumerics in ACTIVATON_CONF file names

Hi,
Some instrument activation confidence file names — in Annotations/Instrument Activations/ACTIVATION_CONF — have discrepancies with the audio files they refer to. I think this is due to the presence of non-alphanumeric characters: parentheses, hyphens, and apostrophes.
Here is the list of before / after names.

CroqueMadame_Pilot(Lakelot)
CroqueMadame_Pilot

JoelHelander_IntheAtticBedroom(SuitePartThree)
JoelHelander_IntheAtticBedroom

Phoenix_BrokenPledge-ChicagoReel
Phoenix_BrokendPledge

Phoenix_Elzic'sFarewell
Phoenix_ElzicsFarewell

Phoenix_LarkOnTheStrand-DrummondCastle
Phoenix_LarkOnTheStrandDrummodCastle

Phoenix_SeanCaughlin's-TheScartaglan
Phoenix_SeanCaughlinsTheScartaglen

Missing activation for Wolf_DieBekherte

There's no activation lab file for Wolf_DieBekherte here:
https://github.com/marl/medleydb/tree/master/medleydb/data/Annotations/Activation_Confidence

There's a file here though:
https://github.com/marl/medleydb/blob/master/medleydb/data/Annotations/Activation_Confidence/original_annotations/Wolf_DieBekherte_ACTIVATION_CONF.lab

This is breaking some code of mine that iterates over every mdb mix and does stuff with the activations, since track.stem_activations for this track is empty.

Why is the file missing?

Expose/document interface to track list

There doesn't seem to be an easy (documented) way to get a list of the tracks without calling load_all_multitracks, which is expensive.

There is a TRACK_LIST array in the package, but it's not documented.

It would be useful to have access to this if I want to load an arbitrary track, but don't have the names/indices pre-computed somewhere.

Missing data for version 2 of the dataset

Hi,
The version 2 of the dataset contains 74 audio files, but only 23 of them have info about activation. Am I looking in the wrong place?
Thanks a lot!

PyYAML yaml.load(input) Deprecation

PyYAML5.1 has deprecated the old yaml.load() API. As the page https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation suggests, line

INST_TAXONOMY = yaml.load(fhandle)
MIXING_COEFFICIENTS = yaml.load(fhandle)

in medleydb's __init__.py would be better change into yaml.load(input, Loader=yaml.FullLoader) or something.

where is the wav dataset, not the Annotations

Dear,
I want to know where is the wav file,how to get the dataset?
I want the download link,not other web !
Thx

Embed version string in metadata.yaml

This way, the medleydb version is always available through the data without going via the python module.

File loading error

#14 seems to break the track loading

a simple

mtrack_list = mdb.load_all_multitracks()

for track in mtrack_list:
    print track

now results in:

(.env)$ py test.py
Traceback (most recent call last):
  File "medleydb/utils.py", line 62, in load_multitracks
    yield M.MultiTrack(multitrack)
  File "medleydb/multitrack.py", line 75, in __init__
    self.title = _path_basedir(mtrack_path).split('_')[1]
IndexError: list index out of range

stem track list might better be a track dict

this way one could easily sort the stems or load specific stem tracks. Also that way special stems like the predominant just be pointer to the stems dict and wouldn't need to create a new object.

Proposal

mytrack.stems['1'] should output stem_idx 1 as a track object.

using a list is prone to errors since mytrack.stems[1] does not necessarily exist.

Problem installing latest commit

Setting up a new machine with this and it seems there are some issues with the latest installer. It crashes on copying over the information:

Installing collected packages: medleydb
  Running setup.py install for medleydb: started
    Running setup.py install for medleydb: finished with status 'error'
    Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-1gmlwvfa-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-0tsnxo8i-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib
    creating build/lib/medleydb
    copying medleydb/version.py -> build/lib/medleydb
    copying medleydb/mix.py -> build/lib/medleydb
    copying medleydb/download.py -> build/lib/medleydb
    copying medleydb/multitrack.py -> build/lib/medleydb
    copying medleydb/__init__.py -> build/lib/medleydb
    copying medleydb/utils.py -> build/lib/medleydb
    creating build/lib/medleydb/resources
    copying medleydb/resources/taxonomy.yaml -> build/lib/medleydb/resources
    copying medleydb/resources/tracklist_bach10.txt -> build/lib/medleydb/resources
    copying medleydb/resources/mixing_coefficients_version2.yaml -> build/lib/medleydb/resources
    copying medleydb/resources/tracklist_v1.txt -> build/lib/medleydb/resources
    copying medleydb/resources/instrument_f0_type.json -> build/lib/medleydb/resources
    copying medleydb/resources/pyin.n3 -> build/lib/medleydb/resources
    copying medleydb/resources/artist_index.json -> build/lib/medleydb/resources
    copying medleydb/resources/client_secrets.json -> build/lib/medleydb/resources
    copying medleydb/resources/mixing_coefficients.yaml -> build/lib/medleydb/resources
    copying medleydb/resources/tracklist_extra.txt -> build/lib/medleydb/resources
    copying medleydb/resources/tracklist_v2.txt -> build/lib/medleydb/resources
    creating build/lib/medleydb/data
    error: can't copy 'medleydb/data/Annotations': doesn't exist or not a regular file

My guess is it is trying to copy the Annotations directory as a file, rather than the contents.

The latest release version installed fine though.

can not download the dataset

In order to run some code based on this dataset,i want to download this dataset,but the website seems can not be reached.

i wonder if the website has been changed?

Update ShuffleLabelsOut to use sklearn.model_selection

Currently, medleydb.utils.artist_conditional_split relies on the ShuffleLabelsOut class, which is using the soon to be deprecated sklearn.cross_validation. It needs to be updated to use sklearn.model_selection but things have changed enough that a simple switch breaks the code. @bmcfee I took a stab at this myself but don't understand what you did in the original version well enough to troubleshoot why my changes weren't working. Can you give it a go?

Docstring completeness

In trying to parse the data in the pitch annotations, I found read_csv_file... it seems like what I want, but the docstring doesn't really give me an idea of what I'm going to get back.

In [18]: M.multitrack.read_csv_file?
Type:       function
String Form:<function read_csv_file at 0x108e8c1b8>
File:       /Library/Python/2.7/site-packages/medleydb/medleydb/multitrack.py
Definition: M.multitrack.read_csv_file(fpath, maxcols=None)
Docstring:
Read a csv file.

FileNotFoundError: [Errno 2] No such file or directory: tracklist_v1.txt

It seems tracklist_v1.txt moved to a subdirectory but __init__.py wasn't updated to reflect this. See: https://github.com/marl/medleydb/blob/medleydb_v1.2/medleydb/__init__.py#L50

Installing directly from git with pip places metadata and annotations directories incorrectly

The init script expects metadata to be placed in a repository root, but if building with pip directly from git such as

pip install git+git://github.com/marl/medleydb.git

the directory structure becomes different and the MedleyDB Python tools won't work.

Since using pip with git is a fairly common way of installing from source it would be nice if the directory structure wasn't assumed. A neat and transparent fix would be to look for all resources (taxonomy.yaml, tracklist_v1.txt, Annotations/, Metadata/, etc.) in the MEDLEYDB_PATH as well, in the init script.

Bleed label issue

(Thanks @lostanlen for noting this)

The following stems have bleed, but are not labeled as having bleed:

BrandonWebster_DontHearAThing_STEM_02.wav
ClaraBerryAndWooldog_Boys_STEM_05.wav
LizNelson_ImComingHome_STEM_02.wav

Add TablaBreakbeatScience_WhoIsIt RAW files to Errata

TablaBreakbeatScience_WhoIsIt_RAW_03_01.wav
TablaBreakbeatScience_WhoIsIt_RAW_04_01.wav
TablaBreakbeatScience_WhoIsIt_RAW_04_02.wav
all have a lot of bleed.

They should probably be added to ERRATA.md

Not sure if the correct error code would be

1 Stem/Raw contains bleed, track not tagged as has_bleed
or
4 Raw does not match Stem

both seem to apply.

How much do the new automatic annotations differ from what a human listening test would conclude?

The initial release of MedleyDB contained human-generated melody annotations using the Tony tool [4]. However, the process was difficult to sustain in the long term, thus for this iteration of the dataset we rely primarily on automatic annotations. The automatic annotations include instrument activations and synthetic melody, multi-f0 and bass annotations.

(source)

I'm looking forward to MedleyDB 2.0 (when will it be available?) and intend to use it for various MIR tasks. However, I read that the new annotations are automatically generated and worry that this causes a chicken and egg problem. My hope was to use the annotations for training multi-f0 estimation models, but surely an upper bound on f-measure will be introduced by the fact that the annotations have been automatically generated themselves.

Could you expand a little on how the new annotations have been developed? How much do they differ compared to what human listeners would annotate? Particularly multi-f0 annotations are difficult to get right but I'm also concerned about onset annotations (and even melody annotations to some extent).

Pandas dataframe for searching the database

The sql submodule exists already, but this might be better suited within a pandas data frame.

Annotations in data dir

Recently the annotations have been added to the data dir of this repo

Could you elaborate (in the readme) why this has been done?
What about the annotations in the original medleydb folder? I guess the ones within the data folder are more up to date. So they will be discarded, correct?
If there are different version of the dataset around it would make sense to version (and add MD5 hashes) the tar.gz files as well and provide more information on the medleydb website.

Support python 3

In [1]: import medleydb
  File "/home/bmcfee/data/medleydb/medleydb/__init__.py", line 20
    the top level Audio folder for MedeleyDB."""

^
SyntaxError: Missing parentheses in call to 'print'

.. or at least do a from __future__ import print_function. But I suspect there are other less obvious gotchas kicking around.

(also Medley is misspelled :))

pip failed to build medleydb

When running pip install . inside the root medleydb directory I get this error

Installing collected packages: medleydb
  Running setup.py install for medleydb ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-3m5VSF/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-jCG1KO/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/mix.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/version.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/multitrack.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/__init__.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/utils.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/download.py -> build/lib.linux-x86_64-2.7/medleydb
    creating build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/tracklist_bach10.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/taxonomy.yaml -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/instrument_f0_type.json -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/mixing_coefficients.yaml -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/mixing_coefficients_version2.yaml -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/tracklist_v1.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/client_secrets.json -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/tracklist_extra.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/artist_index.json -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/tracklist_v2.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/pyin.n3 -> build/lib.linux-x86_64-2.7/medleydb/resources
    creating build/lib.linux-x86_64-2.7/medleydb/data
    error: can't copy 'medleydb/data/Metadata': doesn't exist or not a regular file
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-3m5VSF/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-jCG1KO/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-3m5VSF/

It feels like a similar error that is thrown when you attempt to cp a directory without specifying --recursive. I recommend a fix according to this Stackoverflow thread

--- Relevant Specifications ---
Ubuntu 16.04
Python 2.7.12
setuptools 20.7.0

version 1.2 docs out of date

Issue executing on python3.5.2

Installs fine but when I run import medleydb....

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nick/Downloads/medleydb-1.1/medleydb/__init__.py", line 20
    the top level Audio folder for MedeleyDB."""
                                               ^
SyntaxError: Missing parentheses in call to 'print'

Add duration to metadata

Add the mix duration to the top level of the metadata files to make it possible to work with the annotations without the audio.

Download links to MedleyDB 2.0, where?

The MedleyDB website makes no mention of the ISMIR 2016 paper yet, and the link in the paper isn't working. I'm wondering where I can find the new MedleyDB 2.0 dataset?

I get that everyone has a lot on their plate which is great for MIR but I'd really like to hear the new tracks. 😄

activation_conf_from_stem returns time twice, no confidence

Quick example:

... or maybe the reverse. I'm not sure.

>>> t = next(medleydb.load_all_multitracks())
>>> t.activation_conf_from_stem(0)[:10]
[[0.0, 0.0],
 [0.0464, 0.0464],
 [0.0929, 0.0929],
 [0.1393, 0.1393],
 [0.1858, 0.1858],
 [0.2322, 0.2322],
 [0.2786, 0.2786],
 [0.3251, 0.3251],
 [0.3715, 0.3715],
 [0.418, 0.418]]

This seems to come from an error at this line, or possibly in the parsing of the confidence file.

Missing Annotations for Allegria_MendelssohnMovement1

I am iterating over all of the tracks to get their annotations and I get a FileNotFoundError. I double and tripple-checked and there are no annotation files for this particular track. Do the files exist somewhere?

Partial loading of multitracks for speed

Loading multitracks (e.g. via the load_multitracks generator) is quite slow, a random timing of loading 20 tracks took on average 0.7 s per track, with one track even taking 1.6 s to load. When you only need a single bit of information about each track this becomes quite penalizing (especially during a dev phase where you iterate over your code).

It would be helpful if either the loading time was improved across the board somehow (if possible?) or, alternatively, there was the option to only load partial information about each multitrack (e.g. via an optional parameter that takes a list of the things you want to load) so that loading can be made more agile when not all the multitrack info is needed.

Some raw tracks still have clearly audible effects processing applied

AClassicEducation_NightOwl_RAW_13_01.wav and AClassicEducation_NightOwl_RAW_13_02.wav have some kind of modulation univibe-like effect applied to the vocals, and should not be considered raw.

I'm also guessing many raw tracks are dynamic range compressed and frequency equalized etc. (guessing artists tend to use channel-strips in their recording chain maybe) but that might be less of an issue in most applications. Are there any clear definitions on what constitutes a raw track in MedleyDB?

For reference, compare with AClassicEducation_NightOwl_RAW_13_04.wav which doesn't have the modulation effect.

Status on mixing tools?

I've been rolling my own mixing code for MedleyDB but saw in #40 that there are branches with mixing functionality builtin that look very useful. What's the status on these? Could we expect the master branch to have these soon? https://github.com/marl/medleydb/tree/hanna/mix https://github.com/marl/medleydb/tree/mixing-tools

PS: It would be neat IMHO if master was https://github.com/marl/medleydb/tree/medleydb_v1.2 because it seems stable and also has Python 3 support.

Use pysox instead of sox module maybe?

setup.py needs dependency versions

only the scikit-learn dependecy specifies a version. The other dependencies should have a version too.

Raw track has bleed

The meta data entry for Creepoid_OldTree_RAW says it has no bleed, but
Creepoid_OldTree_RAW_02_01.wav
contains bass and drums instead of bass only

Add has_bleed annotations to stems

After discussing with @lostanlen, it makes sense to have has_bleed annotations at the stem level.

Open questions

do we leave the multitrack level has_bleed annotations, and if so, what is the criterion (at least one stem with bleed, majority of stems with bleed, etc.)
Can we make a reliable semi-automatic method to estimate for the database which stems have bleed? (@lostanlen, any ideas here?)

Access to the EXTRA dataset audio

I noticed that there is plenty more songs in the EXTRA dataset. I also noticed that there is a download script pointing to a private Google Drive. Is it possible to gain access to this extra data?

activation_conf_from_stem errors if not all stems have annotation

There are (rare) cases where the activation confidence annotations have only a subset of the stems annotated. Specifically, any stem labeled as instrument='Main System' is not annotated with stem activations. Morevover, the subset might not be ordered numerically, breaking the assumption in activation_conf_from_stem that all stems are listed and in order.

An example where this occurs:

>> import medleydb as mdb
>> mtrack = mdb.MultiTrack("Phoenix_ScotchMorris")
>> mtrack.stems.keys()
    [1, 2, 3, 4]
>> mtrack.activation_conf_from_stem(4)
    IndexError: list index out of range

Stem Activations

Currently this package does not support to return the track activations. Is this of interest for you? I can implement this and send a PR.

add activation proportions to stems

Track objects corresponding to stems could report activation percentage.

stem_activations needs documentation

Two points:

mtrack.stem_activations is a list of lists, but the docs say it's an ndarray
The shape does not line up to the number of stems; it's off by one. It looks like the first column is reserved for a time index, not activations.

The first point is an easy fix.

The second point is confusing, and it's generally not good style to mix indexing/addressing (ie timestamps) with observation data. (If you do so, it should definitely be documented.) I recommend refactoring this so that the time index is stored separately.

Add all annotation generation helper scripts

melody annotation generator script
Instrument Activation script

ImportError: No module named 'medleydb.sql'

Also the command :

pip install -e .[sql]

gives the following:

medleydb 1.2.9 does not provide the extra 'sql'

although the medleydb gets installed fine by this command and medleydb can be imported without any error. Further the command

medleydb-export

gives the following error:

bash: medleydb-export: command not found

My OS is Centos 6.4 and Python version is 3.5.2.

Publish Python tools to PyPI?

Even though pip can be used with GitHub it would be neat to have the Python tools available on PyPI for easier installation with conda for example.

medleydb version number

The repo tag indicate a 1.0.0 release of medleydb repo

The python package however is still at version='0.1.0'

I would suggest bumping the version here as well

marl / medleydb Goto Github PK

medleydb's People

Contributors

Stargazers

Watchers

Forkers

medleydb's Issues

Proposal

Recommend Projects

Recommend Topics

Recommend Org